Skip to content

PREF: Fix regression from #58984 #59025

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 17, 2024
Merged

PREF: Fix regression from #58984 #59025

merged 3 commits into from
Jun 17, 2024

Conversation

luke396
Copy link
Contributor

@luke396 luke396 commented Jun 17, 2024

@@ -398,11 +398,12 @@ def group_cumsum(
for i in range(N):
lab = labels[i]

if lab < 0:
if uses_mask and lab < 0:
Copy link
Contributor Author

@luke396 luke396 Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the solution for the bug fix mentioned in #58984 is the only one available, I believe some regression may be unavoidable.

On my local machine, this change seems to yield some performance gains, but perhaps they are merely an illusion.

@luke396
Copy link
Contributor Author

luke396 commented Jun 17, 2024

| Change   | Before [dd87dd3e] <main>   | After [e5a23fb1] <gh59022>   |   Ratio | Benchmark (Parameter)                                       
                               |
|----------|----------------------------|------------------------------|---------|--------------------------------------------------------------------------------------------|
| +        | 24.0±0.7ms                 | 28.8±0.7ms                   |    1.2  | groupby.GroupByCythonAgg.time_frame_agg('float64', 'first')                                |
| +        | 73.8±1ms                   | 85.3±1ms                     |    1.16 | groupby.GroupByCythonAggEaDtypes.time_frame_agg('Int64', 'min')                            |
| +        | 1.53±0.07ms                | 1.68±0.05ms                  |    1.1  | groupby.GroupByMethods.time_dtype_as_group('int16', 'value_counts', 'direct', 1, 'cython') |
| -        | 11.4±0.4ms                 | 10.3±0.2ms                   |    0.91 | groupby.Cumulative.time_frame_transform('int64', 'cummax', False)                          |
| -        | 1.45±0.04ms                | 1.31±0.05ms                  |    0.9  | groupby.GroupByMethods.time_dtype_as_field('int', 'quantile', 'direct', 1, 'cython')       |
| -        | 67.9±3μs                   | 60.9±1μs                     |    0.9  | groupby.GroupByMethods.time_dtype_as_group('int', 'cumsum', 'direct', 1, 'cython')         |
| -        | 26.1±0.3ms                 | 23.1±0.2ms                   |    0.89 | groupby.Cumulative.time_frame_transform('Float64', 'cumsum', False)                        |
| -        | 74.0±2μs                   | 65.9±1μs                     |    0.89 | groupby.GroupByMethods.time_dtype_as_field('uint', 'sem', 'direct', 1, 'cython')           |
| -        | 2.34±0.1ms                 | 2.07±0.01ms                  |    0.89 | groupby.GroupByMethods.time_dtype_as_group('float', 'quantile', 'direct', 1, 'cython')     |
| -        | 660±40μs                   | 575±10μs                     |    0.87 | groupby.GroupByMethods.time_dtype_as_field('int16', 'pct_change', 'direct', 1, 'cython')   |
| -        | 23.1±0.1ms                 | 20.0±0.4ms                   |    0.86 | groupby.Cumulative.time_frame_transform('Int64', 'cumsum', False)                          |
| -        | 90.3±10ms                  | 75.6±1ms                     |    0.84 | groupby.GroupByCythonAggEaDtypes.time_frame_agg('Int64', 'sum')                            |
| -        | 742±40μs                   | 590±20μs                     |    0.8  | groupby.GroupByMethods.time_dtype_as_field('uint', 'pct_change', 'direct', 1, 'cython')    |
| -        | 104±30μs                   | 81.4±1μs                     |    0.78 | groupby.GroupByMethods.time_dtype_as_field('uint', 'median', 'direct', 1, 'cython')        |
| -        | 1.30±0.5ms                 | 548±30μs                     |    0.42 | groupby.GroupByMethods.time_dtype_as_field('uint', 'min', 'direct', 1, 'numba')            |

@luke396 luke396 marked this pull request as ready for review June 17, 2024 07:51
@luke396 luke396 requested a review from WillAyd as a code owner June 17, 2024 07:51
@mroeschke mroeschke added this to the 3.0 milestone Jun 17, 2024
@mroeschke mroeschke added Performance Memory or execution speed performance Groupby labels Jun 17, 2024
@mroeschke mroeschke merged commit 5d451fe into pandas-dev:main Jun 17, 2024
54 of 55 checks passed
@mroeschke
Copy link
Member

Thanks @luke396

clin1234 pushed a commit to clin1234/pandas that referenced this pull request Jun 18, 2024
* check uses_mask before

* backup gh

* backup gh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Potential performance regression with "BUG: Fix issue with negative labels in group_cumsum (#58984)"
2 participants