Skip to content

CLN: ASV stat_ops #19049

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 6, 2018
Merged

CLN: ASV stat_ops #19049

merged 4 commits into from
Jan 6, 2018

Conversation

mroeschke
Copy link
Member

There were some old pd.rolling_* methods being tested in stat_ops.py that I moved to rolling.py (or should they just be removed?), otherwise the usual cleanup:

$ asv dev -b ^stat_ops
· Discovering benchmarks
· Running 7 total benchmarks (1 commits * 1 environments * 7 benchmarks)
[  0.00%] ·· Building for existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[  0.00%] ·· Benchmarking existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[ 14.29%] ··· Running stat_ops.Correlation.time_corr                         ok
[ 14.29%] ···· 
               ========== ========
                 method           
               ---------- --------
                spearman   118ms  
                kendall    693ms  
                pearson    5.80ms 
               ========== ========

[ 28.57%] ··· Running stat_ops.FrameMultiIndexOps.time_op                    ok
[ 28.57%] ···· 
               ======== ======== ======== ========
               --                   op            
               -------- --------------------------
                level     mean     sum     median 
               ======== ======== ======== ========
                  0      8.40ms   8.51ms   21.8ms 
                  1      8.57ms   8.52ms   22.6ms 
                [0, 1]   17.3ms   17.0ms   31.9ms 
               ======== ======== ======== ========

[ 42.86%] ··· Running stat_ops.FrameOps.time_op                              ok
[ 42.86%] ···· 
               ======== ================ ======= ======== ========
               --                                       axis      
               --------------------------------- -----------------
                  op     use_bottleneck   dtype     0        1    
               ======== ================ ======= ======== ========
                 mean         True        float   1.16ms   2.07ms 
                 mean         True         int    1.27ms   2.10ms 
                 mean        False        float   11.8ms   12.2ms 
                 mean        False         int    10.2ms   11.0ms 
                 sum          True        float   11.6ms   11.6ms 
                 sum          True         int    7.41ms   8.42ms 
                 sum         False        float   11.7ms   11.6ms 
                 sum         False         int    7.41ms   8.19ms 
                median        True        float   6.86ms   6.05ms 
                median        True         int    4.53ms   5.67ms 
                median       False        float   23.4ms   7.45s  
                median       False         int    24.9ms   7.44s  
                 std          True        float   1.95ms   4.51ms 
                 std          True         int    3.42ms   6.06ms 
                 std         False        float   23.2ms   26.6ms 
                 std         False         int    24.5ms   25.8ms 
               ======== ================ ======= ======== ========

[ 57.14%] ··· Running stat_ops.Rank.time_average_old                         ok
[ 57.14%] ···· 
               ============= ======= =======
               --                  pct      
               ------------- ---------------
                constructor    True   False 
               ============= ======= =======
                 DataFrame    435ms   432ms 
                   Series     432ms   435ms 
               ============= ======= =======

[ 71.43%] ··· Running stat_ops.Rank.time_rank                                ok
[ 71.43%] ···· 
               ============= ======== ========
               --                   pct       
               ------------- -----------------
                constructor    True    False  
               ============= ======== ========
                 DataFrame    18.6ms   18.4ms 
                   Series     18.9ms   18.2ms 
               ============= ======== ========

[ 85.71%] ··· Running stat_ops.SeriesMultiIndexOps.time_op                   ok
[ 85.71%] ···· 
               ======== ======== ======== ========
               --                   op            
               -------- --------------------------
                level     mean     sum     median 
               ======== ======== ======== ========
                  0      21.2ms   20.3ms   23.6ms 
                  1      21.0ms   21.0ms   25.0ms 
                [0, 1]   15.2ms   15.6ms   18.9ms 
               ======== ======== ======== ========

[100.00%] ··· Running stat_ops.SeriesOps.time_op                             ok
[100.00%] ···· 
               ======== ================ ======== ========
               --                              dtype      
               ------------------------- -----------------
                  op     use_bottleneck   float     int   
               ======== ================ ======== ========
                 mean         True        421μs    388μs  
                 mean        False        2.01ms   2.20ms 
                 sum          True        2.02ms   2.09ms 
                 sum         False        2.01ms   2.04ms 
                median        True        2.23ms   1.30ms 
                median       False        6.25ms   6.73ms 
                 std          True        603μs    951μs  
                 std         False        3.26ms   3.67ms 
               ======== ================ ======== ========
$ asv dev -b ^rolling.DepreciatedRolling
· Discovering benchmarks
· Running 1 total benchmarks (1 commits * 1 environments * 1 benchmarks)
[  0.00%] ·· Building for existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[  0.00%] ·· Benchmarking existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[100.00%] ··· Running rolling.DepreciatedRolling.time_method                 ok
[100.00%] ···· 
               ================ ========
                    method              
               ---------------- --------
                rolling_median   88.8ms 
                 rolling_mean    11.0ms 
                 rolling_min     12.4ms 
                 rolling_max     12.1ms 
                 rolling_var     12.9ms 
                 rolling_skew    16.2ms 
                 rolling_kurt    16.0ms 
                 rolling_std     14.2ms 
               ================ ========

[100.00%] ····· 
                
                For parameters: 'rolling_median'
                /home/matt/Projects/pandas-mroeschke/asv_bench/benchmarks/rolling.py:56: FutureWarning: pd.rolling_median is deprecated for ndarrays and will be removed in a future version
                  getattr(pd, method)(self.arr, self.win)
                
                For parameters: 'rolling_mean'
                /home/matt/Projects/pandas-mroeschke/asv_bench/benchmarks/rolling.py:56: FutureWarning: pd.rolling_mean is deprecated for ndarrays and will be removed in a future version
                  getattr(pd, method)(self.arr, self.win)
                
                For parameters: 'rolling_min'
                /home/matt/Projects/pandas-mroeschke/asv_bench/benchmarks/rolling.py:56: FutureWarning: pd.rolling_min is deprecated for ndarrays and will be removed in a future version
                  getattr(pd, method)(self.arr, self.win)
                
                For parameters: 'rolling_max'
                /home/matt/Projects/pandas-mroeschke/asv_bench/benchmarks/rolling.py:56: FutureWarning: pd.rolling_max is deprecated for ndarrays and will be removed in a future version
                  getattr(pd, method)(self.arr, self.win)
                
                For parameters: 'rolling_var'
                /home/matt/Projects/pandas-mroeschke/asv_bench/benchmarks/rolling.py:56: FutureWarning: pd.rolling_var is deprecated for ndarrays and will be removed in a future version
                  getattr(pd, method)(self.arr, self.win)
                
                For parameters: 'rolling_skew'
                /home/matt/Projects/pandas-mroeschke/asv_bench/benchmarks/rolling.py:56: FutureWarning: pd.rolling_skew is deprecated for ndarrays and will be removed in a future version
                  getattr(pd, method)(self.arr, self.win)
                
                For parameters: 'rolling_kurt'
                /home/matt/Projects/pandas-mroeschke/asv_bench/benchmarks/rolling.py:56: FutureWarning: pd.rolling_kurt is deprecated for ndarrays and will be removed in a future version
                  getattr(pd, method)(self.arr, self.win)
                
                For parameters: 'rolling_std'
                /home/matt/Projects/pandas-mroeschke/asv_bench/benchmarks/rolling.py:56: FutureWarning: pd.rolling_std is deprecated for ndarrays and will be removed in a future version
                  getattr(pd, method)(self.arr, self.win)

@pep8speaks
Copy link

pep8speaks commented Jan 3, 2018

Hello @mroeschke! Thanks for updating the PR.

Line 21:9: E722 do not use bare except'
Line 59:9: E722 do not use bare except'

Comment last updated on January 06, 2018 at 05:37 Hours UTC

@codecov
Copy link

codecov bot commented Jan 3, 2018

Codecov Report

Merging #19049 into master will increase coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19049      +/-   ##
==========================================
+ Coverage   91.51%   91.53%   +0.01%     
==========================================
  Files         148      148              
  Lines       48807    48688     -119     
==========================================
- Hits        44667    44566     -101     
+ Misses       4140     4122      -18
Flag Coverage Δ
#multiple 89.9% <ø> (+0.01%) ⬆️
#single 41.63% <ø> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/interval.py 92.19% <0%> (-0.43%) ⬇️
pandas/core/indexes/timedeltas.py 90.92% <0%> (-0.17%) ⬇️
pandas/util/testing.py 84.41% <0%> (-0.04%) ⬇️
pandas/core/ops.py 91.89% <0%> (-0.02%) ⬇️
pandas/core/panel.py 96.83% <0%> (-0.01%) ⬇️
pandas/core/frame.py 97.62% <0%> (-0.01%) ⬇️
pandas/core/strings.py 98.46% <0%> (-0.01%) ⬇️
pandas/tseries/offsets.py 96.97% <0%> (ø) ⬆️
pandas/core/groupby.py 92.14% <0%> (ø) ⬆️
pandas/core/generic.py 95.9% <0%> (ø) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3198b9d...d2593a3. Read the comment docs.

class DepreciatedRolling(object):

sample_time = 0.2
params = ['rolling_median', 'rolling_mean', 'rolling_min', 'rolling_max',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would prob remove these, we go back pretty far with the newer stuff already.

param_names = ['op', 'use_bottleneck', 'dtype', 'axis']
params = [['mean', 'sum', 'median'],
params = [['mean', 'sum', 'median', 'std'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could make use_bottleneck a param here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use_bottleneck is a param here, over [True, False]

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments

@jreback jreback added Benchmark Performance (ASV) benchmarks Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jan 3, 2018
@mroeschke mroeschke force-pushed the asv_clean_stats_ops branch from c9ec227 to 5fd9420 Compare January 4, 2018 04:17
@mroeschke
Copy link
Member Author

Removed the older rolling benchmarks.


class stats_rank2d_axis1_average(object):
goal_time = 0.2
def setup(self, op, use_bottleneck, dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could put the bottleneck tests in a separate function / base class I think

@mroeschke
Copy link
Member Author

Created a new class for the bottleneck benchmarks:

[  0.00%] ·· Building for existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[  0.00%] ·· Benchmarking existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[ 12.50%] ··· Running stat_ops.Bottleneck.time_mean                          ok
[ 12.50%] ···· 
               ================ =========== ========
               --                   constructor     
               ---------------- --------------------
                use_bottleneck   DataFrame   Series 
               ================ =========== ========
                     True          2.43ms    2.37ms 
                    False          15.7ms    15.4ms 
               ================ =========== ========

[ 25.00%] ··· Running stat_ops.Correlation.time_corr                         ok
[ 25.00%] ···· 
               ========== ========
                 method           
               ---------- --------
                spearman   119ms  
                kendall    648ms  
                pearson    5.80ms 
               ========== ========

[ 37.50%] ··· Running stat_ops.FrameMultiIndexOps.time_op                    ok
[ 37.50%] ···· 
               ======== ======== ======== ========
               --                   op            
               -------- --------------------------
                level     mean     sum     median 
               ======== ======== ======== ========
                  0      8.42ms   8.51ms   22.2ms 
                  1      8.55ms   8.53ms   22.6ms 
                [0, 1]   17.2ms   17.1ms   31.9ms 
               ======== ======== ======== ========

[ 50.00%] ··· Running stat_ops.FrameOps.time_op                              ok
[ 50.00%] ···· 
               ======== =========== =========== ========= =========
               --                       dtype / axis               
               -------- -------------------------------------------
                  op     float / 0   float / 1   int / 0   int / 1 
               ======== =========== =========== ========= =========
                 mean      1.20ms      2.07ms     1.27ms    2.08ms 
                 sum       11.4ms      11.6ms     7.46ms    8.27ms 
                median     6.92ms      6.10ms     4.41ms    5.58ms 
                 std       1.95ms      4.49ms     3.42ms    6.10ms 
               ======== =========== =========== ========= =========

[ 62.50%] ··· Running stat_ops.Rank.time_average_old                         ok
[ 62.50%] ···· 
               ============= ======= =======
               --                  pct      
               ------------- ---------------
                constructor    True   False 
               ============= ======= =======
                 DataFrame    435ms   436ms 
                   Series     434ms   433ms 
               ============= ======= =======

[ 75.00%] ··· Running stat_ops.Rank.time_rank                                ok
[ 75.00%] ···· 
               ============= ======== ========
               --                   pct       
               ------------- -----------------
                constructor    True    False  
               ============= ======== ========
                 DataFrame    18.9ms   18.5ms 
                   Series     19.2ms   18.7ms 
               ============= ======== ========

[ 87.50%] ··· Running stat_ops.SeriesMultiIndexOps.time_op                   ok
[ 87.50%] ···· 
               ======== ======== ======== ========
               --                   op            
               -------- --------------------------
                level     mean     sum     median 
               ======== ======== ======== ========
                  0      20.7ms   20.7ms   24.9ms 
                  1      21.7ms   20.8ms   24.9ms 
                [0, 1]   15.5ms   15.5ms   19.1ms 
               ======== ======== ======== ========

[100.00%] ··· Running stat_ops.SeriesOps.time_op                             ok
[100.00%] ···· 
               ======== ======== ========
               --             dtype      
               -------- -----------------
                  op     float     int   
               ======== ======== ========
                 mean    415μs    432μs  
                 sum     2.03ms   2.11ms 
                median   2.19ms   1.33ms 
                 std     616μs    980μs  
               ======== ======== ========

@jreback jreback added this to the 0.23.0 milestone Jan 5, 2018
goal_time = 0.2
param_names = ['op', 'dtype']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I meant about bottleneck was make it a parameter in ops, e.g. here

goal_time = 0.2
param_names = ['op', 'dtype', 'axis']
params = [['mean', 'sum', 'median', 'std'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and use bottleneck here, also expand these to all of the stat ops (min, max, var, kurt, etc)

@jreback jreback removed this from the 0.23.0 milestone Jan 5, 2018
@mroeschke
Copy link
Member Author

mroeschke commented Jan 6, 2018

Defined a list of ops at the top of the file that each benchmark params over. Also, FrameOps and SeriesOps now params over use_bottleneck in [True, False]

@jreback jreback added this to the 0.23.0 milestone Jan 6, 2018
@jreback jreback merged commit d539bdd into pandas-dev:master Jan 6, 2018
@mroeschke mroeschke deleted the asv_clean_stats_ops branch January 7, 2018 01:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmark Performance (ASV) benchmarks Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants