-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF: prepare (upcast) scalar before dispatching to arithmetic array ops #40479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF: prepare (upcast) scalar before dispatching to arithmetic array ops #40479
Conversation
How costly is this? I get that its a valid optimization, but it breaks the abstraction of the self-contained op. |
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
In the same
We already do other preparation steps before the actual array_op as well (extract_array, extracting zerodim, checking length, wrapping datetimelike, etc). So I don't think the abstraction holds right now, and there is certainly an opportunity to organize this better. For example, we can ensure to have a clear set of steps (factored out as consistent helper functions) that need to be taken in the different code paths where |
pandas/core/ops/array_ops.py
Outdated
@@ -422,7 +422,7 @@ def get_array_op(op): | |||
raise NotImplementedError(op_name) | |||
|
|||
|
|||
def _maybe_upcast_for_op(obj, shape: Shape): | |||
def prepare_scalar_for_op(obj, shape: Shape): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think to be pandas-idiomatic might need to keep a "maybe"?
Fair enough. Generally LGTM. Should this wait until after the ensure_wrapped_if_datetimelike PR so that can be folded into this helper? |
pandas/core/ops/__init__.py
Outdated
@@ -428,6 +429,7 @@ def f(self, other, axis=default_axis, level=None, fill_value=None): | |||
|
|||
axis = self._get_axis_number(axis) if axis is not None else 1 | |||
|
|||
other = maybe_prepare_scalar_for_op(other, (self.shape[axis],)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why self.shape[axis] instead of self.shape?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because self
is a dataframe, and we want a 1D shape (rows or columns depending on the axis
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
passing shape only matters when we have a scalar, in which case broadcasting to self.shape is simpler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, changed to self.shape
looks fine. this helps with perf? |
Yes, see #40479 (comment) |
thanks @jorisvandenbossche |
This moves the potential upcasting of python/numpy scalars to pandas scalars out of the actual array op (
arithmetic_op
), but moves it up to the Series / DataFrame level before calling the array op. This avoids calling it repeatedly on the same scalar for each column.