Skip to content

ASV: add benchmarks for groupby cython aggregations #39846

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 16, 2021

Conversation

jorisvandenbossche
Copy link
Member

I noticed that we currently don't really have a groupby benchmark that specifically targets the cython aggregations. We have of course benchmarks that call those, but eg GroupByMethods is setup in such a way that it's mostly benchmarking the factorization and post-processing (also useful of course), while eg for sum only 7% is spent in the actual groupby_add algorithm.

So therefore this PR is adding an additional benchmark more targetted at the cython aggregation (where 30-50% of the time is spent in the actual aggregation function, so we will more easily catch regressions/improvements there)

For now I only added "float64". I could also add "float32", but not sure how useful that is (since they are all using fused types, it's the same implementation for both dtypes.

@jorisvandenbossche jorisvandenbossche added Groupby Benchmark Performance (ASV) benchmarks labels Feb 16, 2021
@jorisvandenbossche
Copy link
Member Author

And the output of a run of those benchmarks (to give an idea of timings):

$ asv dev -b GroupByCythonAgg
· Discovering benchmarks
· Running 1 total benchmarks (1 commits * 1 environments * 1 benchmarks)
[  0.00%] ·· Benchmarking existing
[ 50.00%] ··· groupby.GroupByCythonAgg.time_frame_agg                                                                                         ok
[ 50.00%] ··· ========= ========== ========== ========== ========== ========== ========= ========== ========== ==========
              --                                                      method                                             
              --------- -------------------------------------------------------------------------------------------------
                dtype      sum        prod       min        max        mean      median     var       first       last   
              ========= ========== ========== ========== ========== ========== ========= ========== ========== ==========
               float64   73.6±0ms   66.8±0ms   63.8±0ms   62.5±0ms   61.1±0ms   301±0ms   78.0±0ms   90.2±0ms   80.1±0ms 
              ========= ========== ========== ========== ========== ========== ========= ========== ========== ==========

Only median is quite a bit slower (300ms) (so if needed I can make the dataframe smaller just for this case if desired)

@WillAyd
Copy link
Member

WillAyd commented Feb 16, 2021

Do you see any way to refactor GroupByMethods to make it more useful? The benchmarks in this file take a very long time to run as is (in real time) so wondering if we can streamline them

@jorisvandenbossche
Copy link
Member Author

The issues with fitting it into GroupByMethods: it's also benchmarking different dtypes of the key column, which I am not interested in here. It also benchmarks more methods than just the cython aggregations. It also has a different number of groups / size of groups trade-off, and it is generally useful to have a bit varying benchmarks on this aspect.

To be clear, I agree that we should be careful about adding benchmarks as they already take a long time (that's also the reason that for now I didn't add float32, because I don't think it would add much value in addition to float64, as there is already a specific benchmark for float32 for a single case)

@jorisvandenbossche
Copy link
Member Author

What might be more useful to trim down the time to run the groupby benchmarks is to look a bit more into GroupByMethods itself (regardless of this PR), because I notice some of its parameter combinations are taking much longer than others.

@jreback jreback added this to the 1.3 milestone Feb 16, 2021
@jorisvandenbossche
Copy link
Member Author

To be more specific, this is the quick asv dev run (so timings not super reliable, but a good general idea) of that class:

[ 50.00%] ··· ========== ============== ========== ================
              --                                application        
              ------------------------- ---------------------------
                dtype        method       direct    transformation 
              ========== ============== ========== ================
                 int          all        1.22±0ms      1.22±0ms    
                 int          any        1.17±0ms      1.48±0ms    
                 int         bfill       1.27±0ms      1.34±0ms    
                 int         count       1.05±0ms      990±0μs     
                 int        cumcount     2.41±0ms      1.50±0ms    
                 int         cummax      1.62±0ms      1.23±0ms    
                 int         cummin      1.11±0ms      1.09±0ms    
                 int        cumprod      1.24±0ms      1.49±0ms    
                 int         cumsum      1.13±0ms      2.10±0ms    
                 int        describe     1.47±0s       1.50±0s     
                 int         ffill       1.23±0ms      1.19±0ms    
                 int         first       1.36±0ms      1.59±0ms    
                 int          head       1.10±0ms      1.28±0ms    
                 int          last       1.40±0ms      1.80±0ms    
                 int          mad        453±0ms       433±0ms     
                 int          max        1.82±0ms      1.62±0ms    
                 int          min        1.84±0ms      2.39±0ms    
                 int         median      1.98±0ms      2.17±0ms    
                 int          mean       2.20±0ms      2.77±0ms    
                 int        nunique      2.61±0ms      2.64±0ms    
                 int       pct_change    4.84±0ms      3.59±0ms    
                 int          prod       1.74±0ms      1.94±0ms    
                 int        quantile     1.73±0ms      3.59±0ms    
                 int          rank       1.78±0ms      1.65±0ms    
                 int          sem        6.12±0ms      2.09±0ms    
                 int         shift       1.15±0ms      2.45±0ms    
                 int          size       1.68±0ms      1.30±0ms    
                 int          skew       120±0ms       112±0ms     
                 int          std        1.22±0ms      2.34±0ms    
                 int          sum        2.07±0ms      2.17±0ms    
                 int          tail       1.23±0ms      1.30±0ms    
                 int         unique      74.8±0ms      70.7±0ms    
                 int      value_counts   1.93±0ms      4.07±0ms    
                 int          var        1.61±0ms      1.36±0ms    
                float         all        1.44±0ms      1.64±0ms    
                float         any        1.18±0ms      1.29±0ms    
                float        bfill       1.48±0ms      1.63±0ms    
                float        count       1.23±0ms      1.19±0ms    
                float       cumcount     3.46±0ms      1.23±0ms    
                float        cummax      1.62±0ms      2.50±0ms    
                float        cummin      1.39±0ms      2.28±0ms    
                float       cumprod      3.61±0ms      1.87±0ms    
                float        cumsum      2.43±0ms      1.44±0ms    
                float       describe     2.30±0s       2.77±0s     
                float        ffill       1.54±0ms      1.56±0ms    
                float        first       1.62±0ms      1.80±0ms    
                float         head       1.44±0ms      1.46±0ms    
                float         last       1.59±0ms      2.07±0ms    
                float         mad        965±0ms       893±0ms     
                float         max        1.73±0ms      3.33±0ms    
                float         min        1.41±0ms      2.58±0ms    
                float        median      3.02±0ms      1.71±0ms    
                float         mean       3.47±0ms      4.26±0ms    
                float       nunique      1.45±0ms      4.04±0ms    
                float      pct_change    2.89±0ms      2.74±0ms    
                float         prod       2.08±0ms      4.08±0ms    
                float       quantile     2.53±0ms      1.83±0ms    
                float         rank       2.95±0ms      4.36±0ms    
                float         sem        2.87±0ms      3.34±0ms    
                float        shift       1.91±0ms      1.09±0ms    
                float         size       1.33±0ms      1.99±0ms    
                float         skew       222±0ms       211±0ms     
                float         std        1.42±0ms      1.67±0ms    
                float         sum        1.70±0ms      2.15±0ms    
                float         tail       1.41±0ms      2.64±0ms    
                float        unique      105±0ms       103±0ms     
                float     value_counts   1.91±0ms      2.45±0ms    
                float         var        2.69±0ms      1.48±0ms    
                object        all        2.01±0ms      1.34±0ms    
                object        any        1.89±0ms      884±0μs     
                object       bfill       1.36±0ms      2.23±0ms    
                object       count       1.63±0ms      842±0μs     
                object      cumcount     1.41±0ms      1.35±0ms    
                object       cummax        n/a           n/a       
                object       cummin        n/a           n/a       
                object      cumprod        n/a           n/a       
                object       cumsum        n/a           n/a       
                object      describe       n/a           n/a       
                object       ffill       1.48±0ms      1.03±0ms    
                object       first       4.86±0ms      1.56±0ms    
                object        head       2.53±0ms      1.40±0ms    
                object        last       3.28±0ms      1.83±0ms    
                object        mad          n/a           n/a       
                object        max          n/a           n/a       
                object        min          n/a           n/a       
                object       median        n/a           n/a       
                object        mean         n/a           n/a       
                object      nunique      1.08±0ms      1.49±0ms    
                object     pct_change      n/a           n/a       
                object        prod         n/a           n/a       
                object      quantile       n/a           n/a       
                object        rank       3.51±0ms      1.69±0ms    
                object        sem          n/a           n/a       
                object       shift       970±0μs       2.50±0ms    
                object        size       2.39±0ms      1.12±0ms    
                object        skew         n/a           n/a       
                object        std          n/a           n/a       
                object        sum          n/a           n/a       
                object        tail       2.66±0ms      1.29±0ms    
                object       unique      1.71±0ms      3.25±0ms    
                object    value_counts   1.65±0ms      1.91±0ms    
                object        var          n/a           n/a       
               datetime       all        2.28±0ms      1.10±0ms    
               datetime       any        1.24±0ms      1.40±0ms    
               datetime      bfill       1.16±0ms      1.98±0ms    
               datetime      count       1.30±0ms      1.43±0ms    
               datetime     cumcount     1.27±0ms      2.19±0ms    
               datetime      cummax        n/a           n/a       
               datetime      cummin      1.38±0ms      2.66±0ms    
               datetime     cumprod        n/a           n/a       
               datetime      cumsum        n/a           n/a       
               datetime     describe       n/a           n/a       
               datetime      ffill       979±0μs       1.45±0ms    
               datetime      first       2.38±0ms      1.35±0ms    
               datetime       head       2.54±0ms      1.16±0ms    
               datetime       last       1.46±0ms      1.57±0ms    
               datetime       mad          n/a           n/a       
               datetime       max        1.71±0ms      2.12±0ms    
               datetime       min        1.58±0ms      1.52±0ms    
               datetime      median        n/a           n/a       
               datetime       mean         n/a           n/a       
               datetime     nunique      1.48±0ms      1.65±0ms    
               datetime    pct_change      n/a           n/a       
               datetime       prod         n/a           n/a       
               datetime     quantile     1.41±0ms      1.39±0ms    
               datetime       rank       1.51±0ms      1.31±0ms    
               datetime       sem          n/a           n/a       
               datetime      shift       887±0μs       901±0μs     
               datetime       size       1.06±0ms      1.01±0ms    
               datetime       skew         n/a           n/a       
               datetime       std          n/a           n/a       
               datetime       sum          n/a           n/a       
               datetime       tail       1.52±0ms      2.11±0ms    
               datetime      unique      121±0ms       105±0ms     
               datetime   value_counts   1.91±0ms      3.78±0ms    
               datetime       var          n/a           n/a       
              ========== ============== ========== ================

So most are small benchmarks, but eg the describe ones take more than a second. And also the mad ones take 500ms to 1 second.
Reducing the time for those two cases will probably already help a lot to get overall benchmark run time down.

@jreback
Copy link
Contributor

jreback commented Feb 16, 2021

So most are small benchmarks, but eg the describe ones take more than a second. And also the mad ones take 500ms to 1 second.
Reducing the time for those two cases will probably already help a lot to get overall benchmark run time down

+1

@jreback jreback merged commit f937909 into pandas-dev:master Feb 16, 2021
@jreback
Copy link
Contributor

jreback commented Feb 16, 2021

thanks @jorisvandenbossche (happy to have a followup for fixing those identified slow benchmarks). maybe even create an issue to generally reduce time on longish benchmarks (as a good first issue).

@jorisvandenbossche jorisvandenbossche deleted the bench-groupby branch February 17, 2021 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmark Performance (ASV) benchmarks Groupby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants