Skip to content

removing kendall tests #29401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Dec 5, 2019
Merged

Conversation

deepandas11
Copy link

@deepandas11 deepandas11 commented Nov 4, 2019

closes #29270

Following is the output after removing "kendall"

· Creating environments
· Discovering benchmarks
·· Uninstalling from conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
·· Building cd59acf5 <fix-kendall-issues> for conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt................................................
·· Installing cd59acf5 <fix-kendall-issues> into conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt..
· Running 14 total benchmarks (2 commits * 1 environments * 7 benchmarks)
[  0.00%] · For pandas commit 165d5ee4 <master> (round 1/2):
[  0.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[  0.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[  7.14%] ··· Running (stat_ops.Correlation.time_corr--)....
[ 21.43%] ··· Running (stat_ops.Correlation.time_corrwith_cols--)..
[ 25.00%] · For pandas commit cd59acf5 <fix-kendall-issues> (round 1/2):
[ 25.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[ 25.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 32.14%] ··· Running (stat_ops.Correlation.time_corr--)....
[ 46.43%] ··· Running (stat_ops.Correlation.time_corrwith_cols--)..
[ 50.00%] · For pandas commit cd59acf5 <fix-kendall-issues> (round 2/2):
[ 50.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 53.57%] ··· stat_ops.Correlation.peakmem_corr_wide                                                                                                                                                      ok
[ 53.57%] ··· ========== ====== =======
              --         use_bottleneck
              ---------- --------------
                method    True   False 
              ========== ====== =======
               spearman   108M    108M 
               pearson    105M    105M 
              ========== ====== =======

[ 57.14%] ··· stat_ops.Correlation.time_corr                                                                                                                                                              ok
[ 57.14%] ··· ========== ============ ============
              --               use_bottleneck     
              ---------- -------------------------
                method       True        False    
              ========== ============ ============
               spearman    8.74±1ms     8.71±2ms  
               pearson    2.88±0.6ms   2.84±0.4ms 
              ========== ============ ============

[ 60.71%] ··· stat_ops.Correlation.time_corr_series                                                                                                                                                       ok
[ 60.71%] ··· ========== ============ ============
              --               use_bottleneck     
              ---------- -------------------------
                method       True        False    
              ========== ============ ============
               spearman   1.34±0.2ms   1.35±0.3ms 
               pearson     317±60μs     319±70μs  
              ========== ============ ============

[ 64.29%] ··· stat_ops.Correlation.time_corr_wide                                                                                                                                                         ok
[ 64.29%] ··· ========== ========== ==========
              --             use_bottleneck   
              ---------- ---------------------
                method      True      False   
              ========== ========== ==========
               spearman   286±40ms   273±10ms 
               pearson    176±10ms   178±10ms 
              ========== ========== ==========

[ 67.86%] ··· stat_ops.Correlation.time_corr_wide_nans                                                                                                                                                    ok
[ 67.86%] ··· ========== ============ ============
              --               use_bottleneck     
              ---------- -------------------------
                method       True        False    
              ========== ============ ============
               spearman   3.23±0.02s   3.20±0.01s 
               pearson     196±7ms      192±10ms  
              ========== ============ ============

[ 71.43%] ··· stat_ops.Correlation.time_corrwith_cols                                                                                                                                                     ok
[ 71.43%] ··· ========== ============ ============
              --               use_bottleneck     
              ---------- -------------------------
                method       True        False    
              ========== ============ ============
               spearman    24.2±1ms     25.5±1ms  
               pearson    36.1±0.3ms   36.2±0.4ms 
              ========== ============ ============

[ 75.00%] ··· stat_ops.Correlation.time_corrwith_rows                                                                                                                                                     ok
[ 75.00%] ··· ========== ========= ==========
              --            use_bottleneck   
              ---------- --------------------
                method      True     False   
              ========== ========= ==========
               spearman   549±4ms   551±3ms  
               pearson    884±7ms   890±20ms 
              ========== ========= ==========

[ 75.00%] · For pandas commit 165d5ee4 <master> (round 2/2):
[ 75.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[ 75.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 78.57%] ··· stat_ops.Correlation.peakmem_corr_wide                                                                                                                                                      ok
[ 78.57%] ··· ========== ====== =======
              --         use_bottleneck
              ---------- --------------
                method    True   False 
              ========== ====== =======
               spearman   108M    108M 
               pearson    106M    106M 
              ========== ====== =======

[ 82.14%] ··· stat_ops.Correlation.time_corr                                                                                                                                                              ok
[ 82.14%] ··· ========== ============ =============
              --               use_bottleneck      
              ---------- --------------------------
                method       True         False    
              ========== ============ =============
               spearman   10.2±0.2ms    10.1±0.2ms 
               pearson    3.51±0.2ms   3.54±0.09ms 
              ========== ============ =============

[ 85.71%] ··· stat_ops.Correlation.time_corr_series                                                                                                                                                       ok
[ 85.71%] ··· ========== ============= =============
              --                use_bottleneck      
              ---------- ---------------------------
                method        True         False    
              ========== ============= =============
               spearman   1.52±0.03ms   1.26±0.01ms 
               pearson      316±10μs     389±0.6μs  
              ========== ============= =============

[ 89.29%] ··· stat_ops.Correlation.time_corr_wide                                                                                                                                                         ok
[ 89.29%] ··· ========== ========== ==========
              --             use_bottleneck   
              ---------- ---------------------
                method      True      False   
              ========== ========== ==========
               spearman   359±20ms   288±10ms 
               pearson    186±6ms    184±7ms  
              ========== ========== ==========

[ 92.86%] ··· stat_ops.Correlation.time_corr_wide_nans                                                                                                                                                    ok
[ 92.86%] ··· ========== ============ ============
              --               use_bottleneck     
              ---------- -------------------------
                method       True        False    
              ========== ============ ============
               spearman   3.29±0.01s   3.26±0.03s 
               pearson     195±6ms      203±5ms   
              ========== ============ ============

[ 96.43%] ··· stat_ops.Correlation.time_corrwith_cols                                                                                                                                                     ok
[ 96.43%] ··· ========== ============ ============
              --               use_bottleneck     
              ---------- -------------------------
                method       True        False    
              ========== ============ ============
               spearman    24.4±2ms    24.8±0.9ms 
               pearson    35.8±0.5ms   36.1±0.5ms 
              ========== ============ ============

[100.00%] ··· stat_ops.Correlation.time_corrwith_rows                                                                                                                                                     ok
[100.00%] ··· ========== ========== ==========
              --             use_bottleneck   
              ---------- ---------------------
                method      True      False   
              ========== ========== ==========
               spearman   551±6ms    555±7ms  
               pearson    894±10ms   899±10ms 
              ========== ========== ==========

       before           after         ratio
     [165d5ee4]       [cd59acf5]
     <master>         <fix-kendall-issues>
-     3.54±0.09ms       2.84±0.4ms     0.80  stat_ops.Correlation.time_corr('pearson', False)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

@WillAyd
Copy link
Member

WillAyd commented Nov 4, 2019

Thanks for the PR. So we don't necessarily want to remove these as much as see if moving them somewhere else in the module saves a lot of time running the benchmarks. Do you know how long it took before / after to run this module?

@deepandas11
Copy link
Author

I did run the tests before making the changes, if that's what you mean by before / after.
The output is as follows:

· Discovering benchmarks
·· Uninstalling from conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
·· Building 165d5ee4 <fix-kendall-issues> for conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...............................................
·· Installing 165d5ee4 <fix-kendall-issues> into conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt..
· Running 14 total benchmarks (2 commits * 1 environments * 7 benchmarks)
[  0.00%] · For pandas commit 165d5ee4 <fix-kendall-issues> (round 1/2):
[  0.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[  7.14%] ··· Running (stat_ops.Correlation.time_corr--)...
[ 17.86%] ··· Running (stat_ops.Correlation.time_corr_wide_nans--).
[ 21.43%] ··· Running (stat_ops.Correlation.time_corrwith_cols--)..
[ 25.00%] · For pandas commit 165d5ee4 <fix-kendall-issues> (round 1/2):
[ 25.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 32.14%] ··· Running (stat_ops.Correlation.time_corr--)...
[ 42.86%] ··· Running (stat_ops.Correlation.time_corr_wide_nans--)^B^[[B^B^[[B.
[ 46.43%] ··· Running (stat_ops.Correlation.time_corrwith_cols--)..
[ 50.00%] · For pandas commit 165d5ee4 <fix-kendall-issues> (round 2/2):
[ 50.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 53.57%] ··· stat_ops.Correlation.peakmem_corr_wide                                               ok
[ 53.57%] ··· ========== ====== =======
              --         use_bottleneck
              ---------- --------------
                method    True   False 
              ========== ====== =======
               spearman   108M    107M 
               kendall    129M    129M 
               pearson    105M    105M 
              ========== ====== =======

[ 57.14%] ··· stat_ops.Correlation.time_corr                                                       ok
[ 57.14%] ··· ========== ============ =============
              --               use_bottleneck      
              ---------- --------------------------
                method       True         False    
              ========== ============ =============
               spearman   7.36±0.3ms     7.99±2ms  
               kendall     187±3ms       188±2ms   
               pearson    2.26±0.2ms   2.30±0.09ms 
              ========== ============ =============

[ 60.71%] ··· stat_ops.Correlation.time_corr_series                                                ok
[ 60.71%] ··· ========== ============= ==========
              --              use_bottleneck     
              ---------- ------------------------
                method        True       False   
              ========== ============= ==========
               spearman   1.05±0.08ms   932±80μs 
               kendall      608±20μs    671±50μs 
               pearson      226±20μs    220±9μs  
              ========== ============= ==========

[ 64.29%] ··· stat_ops.Correlation.time_corr_wide                                                  ok
[ 64.29%] ··· ========== ============ ===========
              --              use_bottleneck     
              ---------- ------------------------
                method       True        False   
              ========== ============ ===========
               spearman    247±7ms      254±10ms 
               kendall    8.47±0.06s   8.85±0.2s 
               pearson     155±4ms      190±30ms 
              ========== ============ ===========

[ 67.86%] ··· stat_ops.Correlation.time_corr_wide_nans                                             ok
[ 67.86%] ··· ========== ============ ============
              --               use_bottleneck     
              ---------- -------------------------
                method       True        False    
              ========== ============ ============
               spearman   3.31±0.04s   3.28±0.04s 
               kendall    9.11±0.2s    9.85±0.5s  
               pearson     246±30ms     212±20ms  
              ========== ============ ============

[ 71.43%] ··· stat_ops.Correlation.time_corrwith_cols                                              ok
[ 71.43%] ··· ========== ============ ============
              --               use_bottleneck     
              ---------- -------------------------
                method       True        False    
              ========== ============ ============
               spearman    29.4±4ms    30.3±0.4ms 
               kendall     17.6±1ms    16.4±0.5ms 
               pearson    35.6±0.8ms    36.1±1ms  
              ========== ============ ============

[ 75.00%] ··· stat_ops.Correlation.time_corrwith_rows                                              ok
[ 75.00%] ··· ========== ========== ==========
              --             use_bottleneck   
              ---------- ---------------------
                method      True      False   
              ========== ========== ==========
               spearman   553±20ms   558±20ms 
               kendall    972±10ms   957±10ms 
               pearson    896±90ms   901±20ms 
              ========== ========== ==========

[ 75.00%] · For pandas commit 165d5ee4 <fix-kendall-issues> (round 2/2):
[ 75.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 78.57%] ··· stat_ops.Correlation.peakmem_corr_wide                                               ok
[ 78.57%] ··· ========== ====== =======
              --         use_bottleneck
              ---------- --------------
                method    True   False 
              ========== ====== =======
               spearman   108M    108M 
               kendall    129M    129M 
               pearson    105M    105M 
              ========== ====== =======

[ 82.14%] ··· stat_ops.Correlation.time_corr                                                       ok
[ 82.14%] ··· ========== ============= ============
              --               use_bottleneck      
              ---------- --------------------------
                method        True        False    
              ========== ============= ============
               spearman    7.28±0.2ms    7.23±1ms  
               kendall      185±3ms      188±2ms   
               pearson    2.22±0.09ms   2.19±0.1ms 
              ========== ============= ============

[ 85.71%] ··· stat_ops.Correlation.time_corr_series                                                ok
[ 85.71%] ··· ========== ============= ==========
              --              use_bottleneck     
              ---------- ------------------------
                method        True       False   
              ========== ============= ==========
               spearman   1.05±0.09ms   912±50μs 
               kendall      607±30μs    651±50μs 
               pearson      226±20μs    228±9μs  
              ========== ============= ==========

[ 89.29%] ··· stat_ops.Correlation.time_corr_wide                                                  ok
[ 89.29%] ··· ========== ============ ===========
              --              use_bottleneck     
              ---------- ------------------------
                method       True        False   
              ========== ============ ===========
               spearman    252±8ms      248±10ms 
               kendall    8.56±0.06s   8.61±0.2s 
               pearson     156±4ms      199±20ms 
              ========== ============ ===========

[ 92.86%] ··· stat_ops.Correlation.time_corr_wide_nans                                                                                                                                                    ok
[ 92.86%] ··· ========== ============ ============
              --               use_bottleneck     
              ---------- -------------------------
                method       True        False    
              ========== ============ ============
               spearman   3.32±0.04s   3.26±0.04s 
               kendall    8.72±0.3s    8.89±0.6s  
               pearson     213±30ms     210±10ms  
              ========== ============ ============

[ 96.43%] ··· stat_ops.Correlation.time_corrwith_cols                                                                                                                                                     ok
[ 96.43%] ··· ========== ============ ==========
              --              use_bottleneck    
              ---------- -----------------------
                method       True       False   
              ========== ============ ==========
               spearman    28.0±3ms    30.6±5ms 
               kendall     19.0±1ms    16.9±1ms 
               pearson    35.6±0.6ms   37.7±3ms 
              ========== ============ ==========

[100.00%] ··· stat_ops.Correlation.time_corrwith_rows                                                                                                                                                     ok
[100.00%] ··· ========== ========== ==========
              --             use_bottleneck   
              ---------- ---------------------
                method      True      False   
              ========== ========== ==========
               spearman   556±4ms    558±10ms 
               kendall    971±10ms   946±10ms 
               pearson    894±40ms   890±10ms 
              ========== ========== ==========


BENCHMARKS NOT SIGNIFICANTLY CHANGED.

@WillAyd
Copy link
Member

WillAyd commented Nov 4, 2019

No not the benchmark timings themselves as those shouldn't change. Just how long it took to run from start to finish

@deepandas11
Copy link
Author

Ah! I am afraid I didn't time them.

@deepandas11
Copy link
Author

Before: 10m24s
After: 4m19s

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don’t think we want to remove e this entirely

just reduce the sizes so these take a reasonable total
time

also i don’t think we need then bottleneck flags anymore

@alimcmaster1 alimcmaster1 added the Benchmark Performance (ASV) benchmarks label Nov 4, 2019
@deepandas11
Copy link
Author

deepandas11 commented Nov 5, 2019

By halving the dimensions and removing the bottleneck flags, one could get a runtime of about 5m14s.

@@ -102,7 +102,7 @@ def time_average_old(self, constructor, pct):
class Correlation:

params = [["spearman", "kendall", "pearson"], [True, False]]
param_names = ["method", "use_bottleneck"]
param_names = ["method"]

def setup(self, method, use_bottleneck):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the parametrization of use_bottleneck here and in other methods? Also can remove the import / USE_BOTTLENECK assignment in setup

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@jreback jreback added this to the 1.0 milestone Nov 6, 2019
@jbrockmendel
Copy link
Member

@deepandas11 this looks close to ready. can you see why the CI is failing and update?

@deepandas11
Copy link
Author

@deepandas11 this looks close to ready. can you see why the CI is failing and update?

Sorry for not being able to resolve this sooner. Was a bit swamped.

@deepandas11 deepandas11 requested a review from WillAyd December 5, 2019 09:08
@jreback
Copy link
Contributor

jreback commented Dec 5, 2019

hmm this looks ok, I think just merge master and good to go.

@WillAyd WillAyd merged commit 91e284b into pandas-dev:master Dec 5, 2019
@WillAyd
Copy link
Member

WillAyd commented Dec 5, 2019

Thanks @deepandas11 !

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmark Performance (ASV) benchmarks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce Benchmark Times of stat_ops, particularly kendall correlation
5 participants