-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF/PERF: move np.errstate out of core array_ops up to higher level #40396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF/PERF: move np.errstate out of core array_ops up to higher level #40396
Conversation
pandas/core/ops/__init__.py
Outdated
@@ -433,7 +433,8 @@ def f(self, other, axis=default_axis, level=None, fill_value=None): | |||
|
|||
if isinstance(other, ABCDataFrame): | |||
# Another DataFrame | |||
new_data = self._combine_frame(other, na_op, fill_value) | |||
with np.errstate(all="ignore"): | |||
new_data = self._combine_frame(other, na_op, fill_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldnt we get this one for free bc it goes through _dispatch_frame_op which you've got setting np.errstaet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, indeed _combine_frame
eventually always goes through _dispatch_frame
as well (I got here because it's a place that uses get_array_op
)
@@ -265,8 +268,7 @@ def comparison_op(left: ArrayLike, right: Any, op) -> ArrayLike: | |||
with warnings.catch_warnings(): | |||
# suppress warnings from numpy about element-wise comparison | |||
warnings.simplefilter("ignore", DeprecationWarning) | |||
with np.errstate(all="ignore"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the catch_warnings here be done at a higher level too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly, but that might need to be done more targeted for the specific type of op (since we don't do anything similar in arithmetic_op), and thus this will be more complicated. So would like to start with np.errstate
.
This is unfortunate, since it is both a) uglier to do this at a higher level and b) a really nice perf improvement |
Is it that much uglier? I removed it in 5 places, and I added it in 4 places in |
@@ -376,7 +376,8 @@ def _cmp_method(self, other, op): | |||
other = other._ndarray | |||
|
|||
pd_op = ops.get_array_op(op) | |||
result = pd_op(self._ndarray, other) | |||
with np.errstate(all="ignore"): | |||
result = pd_op(self._ndarray, other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, we probably don't need to add this one here, since when going through a DataFrame/Series op, this will also already be catched on that level (and be don't consistently catch it in other EA ops). But this keeps current behaviour (and this not used in practice anyway)
@@ -6556,7 +6556,8 @@ def _dispatch_frame_op(self, right, func, axis: Optional[int] = None): | |||
right = lib.item_from_zerodim(right) | |||
if not is_list_like(right): | |||
# i.e. scalar, faster than checking np.ndim(right) == 0 | |||
bm = self._mgr.apply(array_op, right=right) | |||
with np.errstate(all="ignore"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could just do the np.errstate once at the top of the method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is possible (as mentioned in #40396 (comment)). Personally, I think I prefer putting it just around the mgr operate call / array_op calls, but in the end it doesn't matter much (it's not that the other checks here could potentially gives such warning that would be incorrectly suppressed). So either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im fine with this, we can clean it up later if needbe
@@ -5087,7 +5088,8 @@ def _arith_method(self, other, op): | |||
|
|||
lvalues = self._values | |||
rvalues = extract_array(other, extract_numpy=True) | |||
result = ops.arithmetic_op(lvalues, rvalues, op) | |||
with np.errstate(all="ignore"): | |||
result = ops.arithmetic_op(lvalues, rvalues, op) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we not need this for _logical_method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we don't suppress any warnings in ops.logical_op
, so I left the current behaviour as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks reasonable
LGTM, merging this afternoon unless someone chimes in to stop me |
thanks @jorisvandenbossche |
Currently, we suppress numpy warnings with
np.errstate(all="ignore")
inside theevaluate
function incore.computation.expressions
, which is is only used incore.ops.array_ops._na_arithmetic_op
, which in itself is only used incore.ops.array_ops
arithmetic_op()
andcomparison_op()
(where we actually callednp.errstate
again, duplicatively).So, in summary, we suppress the warnings at the level of the "array op". For the ArrayManager, we call this array op many times for each column, and repeatedly calling
np.errstate(all="ignore")
gives a big overhead. Luckily, it is easy to suppress the warnings once at a higher level, at the DataFrame/Series level, where those array ops are called.That's what this PR is doing: removing
np.errstate(all="ignore")
in the actual array ops, and adding it in all places where we currently call the array ops.With the benchmark case of an arithmetic op with two dataframes, this gives a considerable improvement:
For BM it doesn't matter for this case, since that's a single block and
np.errstate
would be called only once anyway. But for certain cases ofdf+s
ops where we potentially also work column-wise for BM, it should benefit there as well.