Skip to content

REF: deduplicate group min/max #40584

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 23, 2021

Conversation

mzeitlin11
Copy link
Member

ASVs look unaffected:

before           after         ratio
     [05a0e98a]       [16d06e22]
     <master>         <ref/deduplicate_grp_min_max>
         33.0±3ms         33.3±1ms     1.01  gil.ParallelGroupbyMethods.time_loop(2, 'max')
       32.2±0.4ms         34.5±3ms     1.07  gil.ParallelGroupbyMethods.time_loop(2, 'min')
         64.1±1ms       65.6±0.5ms     1.02  gil.ParallelGroupbyMethods.time_loop(4, 'max')
         65.3±3ms         65.5±1ms     1.00  gil.ParallelGroupbyMethods.time_loop(4, 'min')
         20.2±2ms       20.2±0.8ms     1.00  gil.ParallelGroupbyMethods.time_parallel(2, 'max')
         21.3±1ms       20.1±0.8ms     0.95  gil.ParallelGroupbyMethods.time_parallel(2, 'min')
       25.1±0.7ms       25.3±0.5ms     1.01  gil.ParallelGroupbyMethods.time_parallel(4, 'max')
       25.0±0.4ms         26.5±2ms     1.06  gil.ParallelGroupbyMethods.time_parallel(4, 'min')
         125±50ms          104±1ms    ~0.83  groupby.GroupByCythonAgg.time_frame_agg('float64', 'max')
          105±2ms          107±2ms     1.02  groupby.GroupByCythonAgg.time_frame_agg('float64', 'min')

group_min_or_max[lab, j] = val
else:
if val < group_min_or_max[lab, j]:
group_min_or_max[lab, j] = val
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to be able to do something like if PyObject_RichCompareBool(val, ..., cmp_method): ... to simplify the conditional logic, but couldn't find a way to do something like this in Cython without the GIL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a perf impact here? its checking compute_max at each step in a nested loop

Copy link
Member Author

@mzeitlin11 mzeitlin11 Mar 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see any perf impact based on ASVs above. For a loop invariant like compute_max, I'd guess this check is optimized away by the compiler. (It could also just be negligible compared to the cost of the indexing operations)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. worth it to de-duplicate a bunch of code.

@mzeitlin11 mzeitlin11 added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Groupby Refactor Internal refactoring of code labels Mar 23, 2021
@jreback jreback added this to the 1.3 milestone Mar 23, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. @jbrockmendel if any comments

@jbrockmendel
Copy link
Member

LGTM

@jreback jreback merged commit 6ed2fe8 into pandas-dev:master Mar 23, 2021
@jreback
Copy link
Contributor

jreback commented Mar 23, 2021

thanks @mzeitlin11

@mzeitlin11 mzeitlin11 deleted the ref/deduplicate_grp_min_max branch March 23, 2021 20:31
vladu pushed a commit to vladu/pandas that referenced this pull request Apr 5, 2021
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Groupby Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants