Skip to content

REF: split out grouped masked cummin/max algo #41853

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 9, 2021

Conversation

mzeitlin11
Copy link
Member

No user-facing change here, but I think makes logic clearer and future changes easier to understand

Benchmarks look unaffected:

before           after         ratio
[3f67dc33]       [51f8f9c8]
<master>         <masked_cummin_ref>
17.0±2ms         16.3±2ms     0.96  groupby.CumminMax.time_frame_transform('Float64', 'cummax')
17.7±2ms         18.4±2ms     1.04  groupby.CumminMax.time_frame_transform('Float64', 'cummin')
20.8±3ms       16.9±0.3ms    ~0.81  groupby.CumminMax.time_frame_transform('Int64', 'cummax')
20.7±3ms         18.2±3ms    ~0.88  groupby.CumminMax.time_frame_transform('Int64', 'cummin')
15.7±2ms       13.0±0.3ms    ~0.83  groupby.CumminMax.time_frame_transform('float64', 'cummax')
15.5±0.7ms       13.6±0.7ms    ~0.88  groupby.CumminMax.time_frame_transform('float64', 'cummin')
29.8±0.7ms       28.0±0.4ms     0.94  groupby.CumminMax.time_frame_transform('int64', 'cummax')
29.9±1ms         30.0±2ms     1.00  groupby.CumminMax.time_frame_transform('int64', 'cummin')
19.2±2ms         16.9±3ms    ~0.88  groupby.CumminMax.time_frame_transform_many_nulls('Float64', 'cummax')
22.6±2ms         15.6±2ms    ~0.69  groupby.CumminMax.time_frame_transform_many_nulls('Float64', 'cummin')
15.7±3ms         19.3±3ms    ~1.23  groupby.CumminMax.time_frame_transform_many_nulls('Int64', 'cummax')
19.5±2ms         14.7±1ms    ~0.75  groupby.CumminMax.time_frame_transform_many_nulls('Int64', 'cummin')
14.5±2ms       11.4±0.8ms    ~0.79  groupby.CumminMax.time_frame_transform_many_nulls('float64', 'cummax')
13.2±1ms       12.4±0.2ms     0.94  groupby.CumminMax.time_frame_transform_many_nulls('float64', 'cummin')
14.2±1ms       13.1±0.1ms     0.92  groupby.CumminMax.time_frame_transform_many_nulls('int64', 'cummax')
14.1±0.5ms       13.4±0.4ms     0.95  groupby.CumminMax.time_frame_transform_many_nulls('int64', 'cummin')

@mzeitlin11 mzeitlin11 added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Groupby Refactor Internal refactoring of code labels Jun 7, 2021
@jreback
Copy link
Contributor

jreback commented Jun 9, 2021

seems fine. IIRC we have these 2 kernels because the no mask one is >> faster?

@jreback jreback added this to the 1.3 milestone Jun 9, 2021
@mzeitlin11
Copy link
Member Author

seems fine. IIRC we have these 2 kernels because the no mask one is >> faster?

In an original implementation, I tried forcing a mask (so only use the masked kernel), but there was a slight slowdown for float data because of the isna needed to actually get the mask to pass into the algo.

But all other types where isna is fast (eg nullables, or normal ints) forcing a mask was just as fast. If a small perf hit (I want to say <20%) on float data is acceptable, I can dig up that code and see what the perf hit actually looks like.

Regardless of that choice, I think the split here into 2 clear kernels is helpful because it will make it much simpler if we ever do get rid of one.

@jreback jreback merged commit 384f414 into pandas-dev:master Jun 9, 2021
@jreback
Copy link
Contributor

jreback commented Jun 9, 2021

thanks @mzeitlin11

@mzeitlin11 mzeitlin11 deleted the masked_cummin_ref branch June 9, 2021 18:26
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Groupby Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants