Skip to content

PERF: Try fast/slow paths only once in DataFrameGroupby.transform #42195

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 8, 2021

Conversation

rhshadrach
Copy link
Member

@rhshadrach rhshadrach commented Jun 23, 2021

Still need to run ASVs to measure performance, making sure CI passes first.

Edit: ASV results for 7b86fdb

       before           after         ratio
     [9f6a91ae]       [7b86fdb9]
     <groupby_transform_path~1>       <groupby_transform_path>
-     23.8±0.07ms         21.4±2ms     0.90  groupby.Nth.time_series_nth_all('float32')
-      11.8±0.1ms       10.4±0.6ms     0.88  groupby.Nth.time_frame_nth('object')
-      27.1±0.1ms       23.3±0.1ms     0.86  groupby.Nth.time_series_nth_all('object')
-      19.4±0.2ms       16.5±0.2ms     0.85  groupby.Nth.time_groupby_nth_all('float64')
-      21.7±0.2ms       18.4±0.3ms     0.85  groupby.Nth.time_frame_nth_any('float32')
-      24.1±0.1ms       20.2±0.2ms     0.84  groupby.Nth.time_series_nth_any('float64')
-      24.0±0.2ms      20.1±0.05ms     0.84  groupby.Nth.time_series_nth_all('float64')
-      56.4±0.4ms       20.1±0.4ms     0.36  groupby.TransformEngine.time_dataframe_cython(False)
-      56.5±0.4ms       20.1±0.4ms     0.36  groupby.TransformEngine.time_dataframe_cython(True)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

@rhshadrach rhshadrach added Groupby Performance Memory or execution speed performance Apply Apply, Aggregate, Transform, Map labels Jun 23, 2021
except ValueError as err:
msg = "transform must return a scalar value for each group"
raise ValueError(msg) from err
if path is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of doing this at each step in the loop, can we just do this outside the loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - updated PR to show what this would look like. Minor cleanups can be done to avoid the nested if statements in the newly added process_result, but it's a bit ugly. We're trading if path is None for a function call as I don't see another way to avoid code duplication.

columns=group.columns,
index=group.index,
)
applied.append(r)
else:
applied.append(res)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can process_result return res and do this append outside the func

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One path uses continue, so the most immediate ways I see are to (a) use a sentinel value to indicate "do not add" or (b) move the non-adding branch outside of process_result (duplicating code).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, the path using continue can almost be moved ahead of any computation - it doesn't really depend on the result except for whether the result is a Series or not. I'm going to see if I can do a precursor cleanup of this method. Marking this as a draft for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. if the answer to my question is "no" thats OK too.

@rhshadrach rhshadrach marked this pull request as draft June 23, 2021 23:14
msg = "transform must return a scalar value for each group"
raise ValueError(msg) from err

def process_result(applied, group, res):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be changed to return the object to be appended to applied? rather than mutating

@jreback
Copy link
Contributor

jreback commented Jul 4, 2021

@rhshadrach can you rebase

@rhshadrach rhshadrach marked this pull request as ready for review July 7, 2021 13:56
@rhshadrach
Copy link
Member Author

@jreback @jbrockmendel Updated and ready for review.

I also tried only using the slow path (instead of choosing between fast/slow); this resulted in a 0.7 ratio for the benchmark groupby.TransformEngine.time_dataframe_cython(False) as opposed to a 0.3 in this PR. While I'd prefer to find a way to not have to choose between different paths based on the first group, the performance seems worth it in my opinion.

@jbrockmendel
Copy link
Member

While I'd prefer to find a way to not have to choose between different paths based on the first group

IIUC you mean do some introspection on the function (or require users to specify somehow) to determine this instead of a try/except? That would be nice.

this resulted in a 0.7 ratio for the benchmark

What about for the groupby benchmarks more generally? The slow path has been optimized a bunch this year (xref #40263)

@jreback jreback added this to the 1.4 milestone Jul 7, 2021
@jreback
Copy link
Contributor

jreback commented Jul 7, 2021

I think its fine to determine how to call it based on the first group. we need some metric.

msg = "transform must return a scalar value for each group"
raise ValueError(msg) from err

def process_result(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be a module level function?

Copy link
Member Author

@rhshadrach rhshadrach Jul 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - can either make it a module-level by passing self.obj or make this into a class method. Do you see a reason to prefer one or the other?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think prefer a module level

@rhshadrach
Copy link
Member Author

rhshadrach commented Jul 7, 2021

@jbrockmendel

IIUC you mean do some introspection on the function (or require users to specify somehow) to determine this instead of a try/except? That would be nice.

Introspection seems fragile to me. I didn't consider user specification but I like the idea as long as it isn't required. Another option would be to restrict what UDFs are acceptable (e.g. must return Series, DataFrame, or NumPy array, and maybe some other things). For this latter option, we could then handle everything acceptable via the fastpath and remove the slowpath entirely. But perhaps the flexibility of current handling makes this an unwelcome trade-off.

What about for the groupby benchmarks more generally? The slow path has been optimized a bunch this year (xref #40263)

The OP has all the groupby ASVs. I don't expect any difference in performance with the updates to this PR but we re-run tonight to check. I misread. Will run groupby ASVs using only slowpath tonight.

index=group.index,
)
assert isinstance(r, DataFrame)
return r
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you call this e.g. res_frame? i.e. not 1 letter

@jreback
Copy link
Contributor

jreback commented Jul 8, 2021

comment by @jbrockmendel ping on green

@rhshadrach
Copy link
Member Author

@jbrockmendel - This is slowpath only (L1385, first line of _choose_path is return slow_path, slow_path(group)):

       before           after         ratio
     [9ffc924e]       [462500f7]
     <whatsnew_contributing~2>       <groupby_transform_path>
+      17.9±0.6ms         25.4±1ms     1.42  groupby.Cumulative.time_frame_transform('int64', 'cummin')
+     21.3±0.07ms       26.5±0.5ms     1.24  groupby.Nth.time_frame_nth_any('datetime')
+      25.9±0.7ms       31.4±0.4ms     1.21  groupby.Nth.time_frame_nth_any('object')
+     11.9±0.09ms       14.0±0.3ms     1.18  groupby.Nth.time_frame_nth('object')
+      23.5±0.1ms         26.7±3ms     1.14  groupby.Nth.time_series_nth_any('datetime')
-      71.1±0.5ms       57.4±0.5ms     0.81  groupby.TransformEngine.time_dataframe_cython(True)
-      71.0±0.5ms       56.9±0.2ms     0.80  groupby.TransformEngine.time_dataframe_cython(False)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

@rhshadrach
Copy link
Member Author

@jreback - green

@jreback jreback merged commit 57bb165 into pandas-dev:master Jul 8, 2021
@jreback
Copy link
Contributor

jreback commented Jul 8, 2021

thanks @rhshadrach very nice!

@rhshadrach rhshadrach deleted the groupby_transform_path branch July 8, 2021 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Groupby Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: Calling slowpath for every group in transform
3 participants