REF: prepare (upcast) scalar before dispatching to arithmetic array ops #40479

jorisvandenbossche · 2021-03-17T09:48:45Z

This moves the potential upcasting of python/numpy scalars to pandas scalars out of the actual array op (arithmetic_op), but moves it up to the Series / DataFrame level before calling the array op. This avoids calling it repeatedly on the same scalar for each column.

jbrockmendel · 2021-03-17T17:17:51Z

How costly is this? I get that its a valid optimization, but it breaks the abstraction of the self-contained op.

…are-scalar

github-actions · 2021-04-18T00:05:43Z

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

…are-scalar

jorisvandenbossche · 2021-04-22T21:17:28Z

How costly is this?

In the same FrameWithFrameWide benchmark (as mentioned in #39820 (comment)), calls to _maybe_upcast_for_op currently take around 8% on master. But I expect this number to become (relatively) bigger with other optimizations merged (like #39820). For reference, the actual add operator only takes around 18% in this benchmark, with everything else being overhead that can be partly optimized away.

I get that its a valid optimization, but it breaks the abstraction of the self-contained op.

We already do other preparation steps before the actual array_op as well (extract_array, extracting zerodim, checking length, wrapping datetimelike, etc). So I don't think the abstraction holds right now, and there is certainly an opportunity to organize this better. For example, we can ensure to have a clear set of steps (factored out as consistent helper functions) that need to be taken in the different code paths where array_op gets called (eg both in series op as frame op)

jbrockmendel · 2021-04-22T21:38:39Z

pandas/core/ops/array_ops.py

@@ -422,7 +422,7 @@ def get_array_op(op):
        raise NotImplementedError(op_name)


-def _maybe_upcast_for_op(obj, shape: Shape):
+def prepare_scalar_for_op(obj, shape: Shape):


i think to be pandas-idiomatic might need to keep a "maybe"?

jbrockmendel · 2021-04-22T21:39:33Z

We already do other preparation steps before the actual array_op

Fair enough.

Generally LGTM. Should this wait until after the ensure_wrapped_if_datetimelike PR so that can be folded into this helper?

…are-scalar

jbrockmendel · 2021-04-23T19:25:48Z

pandas/core/ops/__init__.py

@@ -428,6 +429,7 @@ def f(self, other, axis=default_axis, level=None, fill_value=None):

        axis = self._get_axis_number(axis) if axis is not None else 1

+        other = maybe_prepare_scalar_for_op(other, (self.shape[axis],))


why self.shape[axis] instead of self.shape?

Because self is a dataframe, and we want a 1D shape (rows or columns depending on the axis)

passing shape only matters when we have a scalar, in which case broadcasting to self.shape is simpler

OK, changed to self.shape

…are-scalar

jreback · 2021-04-26T11:56:38Z

looks fine. this helps with perf?

…are-scalar

jorisvandenbossche · 2021-04-28T14:46:04Z

this helps with perf?

Yes, see #40479 (comment)

jreback · 2021-04-28T14:49:54Z

thanks @jorisvandenbossche

…ps (pandas-dev#40479)

REF: prepare (upcast) scalar before dispatching to arithmetic array ops

3086c7b

jorisvandenbossche added Refactor Internal refactoring of code Performance Memory or execution speed performance Numeric Operations Arithmetic, Comparison, and Logical operations labels Mar 17, 2021

jorisvandenbossche requested a review from jbrockmendel March 17, 2021 09:49

Merge remote-tracking branch 'upstream/master' into ops-refactor-prep…

c55493c

…are-scalar

github-actions bot added the Stale label Apr 18, 2021

jorisvandenbossche removed the Stale label Apr 22, 2021

Merge remote-tracking branch 'upstream/master' into ops-refactor-prep…

4823538

…are-scalar

jbrockmendel reviewed Apr 22, 2021

View reviewed changes

jorisvandenbossche added 2 commits April 23, 2021 13:57

Merge remote-tracking branch 'upstream/master' into ops-refactor-prep…

6035009

…are-scalar

prepare_scalar_for_op -> maybe_prepare_scalar_for_op

a0dfcf1

jbrockmendel reviewed Apr 23, 2021

View reviewed changes

Merge remote-tracking branch 'upstream/master' into ops-refactor-prep…

204e40a

…are-scalar

jreback added this to the 1.3 milestone Apr 26, 2021

jorisvandenbossche added 2 commits April 28, 2021 15:48

Merge remote-tracking branch 'upstream/master' into ops-refactor-prep…

70cc734

…are-scalar

broadcast to full frame

489df40

jreback merged commit ce1843f into pandas-dev:master Apr 28, 2021

jorisvandenbossche deleted the ops-refactor-prepare-scalar branch April 28, 2021 14:50

yeshsurya pushed a commit to yeshsurya/pandas that referenced this pull request May 6, 2021

REF: prepare (upcast) scalar before dispatching to arithmetic array o…

f99ef5f

…ps (pandas-dev#40479)

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021

REF: prepare (upcast) scalar before dispatching to arithmetic array o…

52578d4

…ps (pandas-dev#40479)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REF: prepare (upcast) scalar before dispatching to arithmetic array ops #40479

REF: prepare (upcast) scalar before dispatching to arithmetic array ops #40479

jorisvandenbossche commented Mar 17, 2021

jbrockmendel commented Mar 17, 2021

github-actions bot commented Apr 18, 2021

jorisvandenbossche commented Apr 22, 2021 •

edited

Loading

jbrockmendel Apr 22, 2021

jbrockmendel commented Apr 22, 2021

jbrockmendel Apr 23, 2021

jorisvandenbossche Apr 25, 2021

jbrockmendel Apr 26, 2021

jorisvandenbossche Apr 28, 2021

jreback commented Apr 26, 2021

jorisvandenbossche commented Apr 28, 2021

jreback commented Apr 28, 2021

		@@ -428,6 +429,7 @@ def f(self, other, axis=default_axis, level=None, fill_value=None):

		axis = self._get_axis_number(axis) if axis is not None else 1

		other = maybe_prepare_scalar_for_op(other, (self.shape[axis],))

REF: prepare (upcast) scalar before dispatching to arithmetic array ops #40479

REF: prepare (upcast) scalar before dispatching to arithmetic array ops #40479

Conversation

jorisvandenbossche commented Mar 17, 2021

jbrockmendel commented Mar 17, 2021

github-actions bot commented Apr 18, 2021

jorisvandenbossche commented Apr 22, 2021 • edited Loading

jbrockmendel Apr 22, 2021

Choose a reason for hiding this comment

jbrockmendel commented Apr 22, 2021

jbrockmendel Apr 23, 2021

Choose a reason for hiding this comment

jorisvandenbossche Apr 25, 2021

Choose a reason for hiding this comment

jbrockmendel Apr 26, 2021

Choose a reason for hiding this comment

jorisvandenbossche Apr 28, 2021

Choose a reason for hiding this comment

jreback commented Apr 26, 2021

jorisvandenbossche commented Apr 28, 2021

jreback commented Apr 28, 2021

jorisvandenbossche commented Apr 22, 2021 •

edited

Loading