Skip to content

PERF: Discrepancy in groupby methods #19165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
4 of 7 tasks
mroeschke opened this issue Jan 10, 2018 · 2 comments
Closed
4 of 7 tasks

PERF: Discrepancy in groupby methods #19165

mroeschke opened this issue Jan 10, 2018 · 2 comments
Labels
Groupby Master Tracker High level tracker for similar issues Performance Memory or execution speed performance

Comments

@mroeschke
Copy link
Member

mroeschke commented Jan 10, 2018

xref #8426 and comment

issue filter for groupby & perf

Some groupby methods (notably describe, mad, pct_change) are not as performant as others. Many of the less performant methods are pre-generated in a _common_apply_whitelist in pandas/core/groupby.py, so it may be worthwhile to revisit this implementation.

asv dev -b ^groupby.GroupByMethods
· Discovering benchmarks
· Running 1 total benchmarks (1 commits * 1 environments * 1 benchmarks)
[  0.00%] ·· Building for existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[  0.00%] ·· Benchmarking existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[100.00%] ··· Running groupby.GroupByMethods.time_method                     ok
[100.00%] ···· 
               ======= ============== ========
                dtype      method             
               ------- -------------- --------
                 int        all        256ms  
                 int        any        255ms  
                 int       count       925μs  
                 int      cumcount     1.15ms 
                 int       cummax      1.13ms 
                 int       cummin      1.15ms 
                 int      cumprod      1.58ms 
                 int       cumsum      1.16ms 
                 int      describe     3.25s  
                 int       first       1.12ms 
                 int        head       1.37ms 
                 int        last       1.12ms 
                 int        mad        1.42s  
                 int        max        1.12ms 
                 int        min        1.16ms 
                 int       median      1.53ms 
                 int        mean       1.43ms 
                 int      nunique      1.40ms 
                 int     pct_change    1.56s  
                 int        prod       1.53ms 
                 int        rank       380ms  
                 int        sem        414ms  
                 int       shift       974μs  
                 int        size       858μs  
                 int        skew       414ms  
                 int        std        1.46ms 
                 int        sum        1.50ms 
                 int        tail       1.45ms 
                 int       unique      289ms  
                 int    value_counts   2.35ms 
                 int        var        1.34ms 
                float       all        402ms  
                float       any        406ms  
                float      count       1.18ms 
                float     cumcount     1.33ms 
                float      cummax      1.40ms 
                float      cummin      1.40ms 
                float     cumprod      1.75ms 
                float      cumsum      1.40ms 
                float     describe     5.02s  
                float      first       1.37ms 
                float       head       1.58ms 
                float       last       1.36ms 
                float       mad        2.01s  
                float       max        1.38ms 
                float       min        1.37ms 
                float      median      1.80ms 
                float       mean       1.79ms 
                float     nunique      1.60ms 
                float    pct_change    2.17s  
                float       prod       1.75ms 
                float       rank       623ms  
                float       sem        416ms  
                float      shift       1.18ms 
                float       size       1.09ms 
                float       skew       646ms  
                float       std        1.51ms 
                float       sum        1.77ms 
                float       tail       1.63ms 
                float      unique      457ms  
                float   value_counts   2.63ms 
                float       var        1.43ms 
               ======= ============== ========
@jreback jreback added Groupby Performance Memory or execution speed performance Difficulty Intermediate labels Jan 10, 2018
@jreback jreback added this to the Next Major Release milestone Jan 10, 2018
@jreback jreback added the Master Tracker High level tracker for similar issues label Jan 10, 2018
@WillAyd
Copy link
Member

WillAyd commented Feb 12, 2018

You should be able to check rank off of this list with that change being closed. I'm going to take a look at fillna next

@mroeschke
Copy link
Member Author

These are fairly old benchmarks. Going to close in favor of regressions caught by our ASV box

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Master Tracker High level tracker for similar issues Performance Memory or execution speed performance
Projects
None yet
Development

No branches or pull requests

4 participants