Skip to content

CLN: ASV frame_ctor benchmark #18499

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 26, 2017
Merged

Conversation

mroeschke
Copy link
Member

@mroeschke mroeschke commented Nov 26, 2017

  • Added np.random.seed(1234) in setup classes where random data is created xref BENCH: put in np.random.seed on vbenches #8144

  • Ran flake8 and replaced star imports (but from pandas.core.datetools import * might need to be kept for compat?)

  • time_frame_ctor_nested_dict_int64 was using self.data instead of self.data2

  • Moved the class frame_get_numeric_data to frame_methods.py

asv run -b ^frame_ctor
[  0.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 16.67%] ··· Running frame_ctor.FromDicts.time_frame_ctor_list_of_dict                                                        113±0.3ms
[ 33.33%] ··· Running frame_ctor.FromDicts.time_frame_ctor_nested_dict                                                        92.7±0.3ms
[ 50.00%] ··· Running frame_ctor.FromDicts.time_frame_ctor_nested_dict_int64                                                   243±0.5ms
[ 66.67%] ··· Running frame_ctor.FromDicts.time_series_ctor_from_dict                                                        6.17±0.01ms
[ 83.33%] ··· Running frame_ctor.FromDictwithTimestampOffsets.time_frame_ctor                                                2/76 failed
[100.00%] ··· Running frame_ctor.FromSeries.time_frame_from_mi_series                                                          243±0.7μs

The supposed offsets benchmark failures do not show up when running asv dev?

asv dev -b ^frame_ctor
· Discovering benchmarks
· Running 6 total benchmarks (1 commits * 1 environments * 6 benchmarks)
[  0.00%] ·· Building for existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[  0.00%] ·· Benchmarking existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[ 16.67%] ··· Running frame_ctor.FromDicts.time_frame_ctor_list_of_dict                                                            153ms
[ 33.33%] ··· Running frame_ctor.FromDicts.time_frame_ctor_nested_dict                                                             147ms
[ 50.00%] ··· Running frame_ctor.FromDicts.time_frame_ctor_nested_dict_int64                                                       217ms
[ 66.67%] ··· Running frame_ctor.FromDicts.time_series_ctor_from_dict                                                             6.09ms
[ 83.33%] ··· Running frame_ctor.FromDictwithTimestampOffsets.time_frame_ctor                                                         ok
[ 83.33%] ···· 
               ==================== ======== ========
               --                        n_steps     
               -------------------- -----------------
                      offset           1        2    
               ==================== ======== ========
                       Day           96.5ms   96.7ms 
                   BusinessDay       96.7ms   96.5ms 
                       BDay          99.8ms   96.8ms 
                CustomBusinessDay    98.8ms   97.5ms 
                       CDay          96.7ms   95.8ms 
                    CBMonthEnd       96.8ms   97.7ms 
                   CBMonthBegin      97.0ms   98.0ms 
                    MonthBegin       96.4ms   97.6ms 
                   BMonthBegin       98.1ms   96.4ms 
                     MonthEnd        95.8ms   96.6ms 
                    BMonthEnd        96.7ms   96.9ms 
                   SemiMonthEnd      96.2ms   97.3ms 
                  SemiMonthBegin     96.3ms   98.1ms 
                   BusinessHour      96.5ms   97.1ms 
                CustomBusinessHour   97.2ms   97.0ms 
                    YearBegin        33.4ms   17.5ms 
                    BYearBegin       32.3ms   17.6ms 
                     YearEnd         32.2ms   17.3ms 
                     BYearEnd        36.2ms   17.9ms 
                   QuarterBegin      98.0ms   64.9ms 
                  BQuarterBegin      99.5ms   70.6ms 
                    QuarterEnd       101ms    67.9ms 
                   BQuarterEnd       98.3ms   69.0ms 
                 LastWeekOfMonth     97.1ms   102ms  
                       Week          97.5ms   102ms  
                   WeekOfMonth       104ms    102ms  
                      Easter         35.4ms   18.0ms 
                       Hour          102ms    96.3ms 
                      Minute         95.8ms   100ms  
                      Second         95.8ms   97.2ms 
                      Milli          98.7ms   101ms  
                      Micro          94.7ms   93.5ms 
                       Nano          71.8ms   72.3ms 
                    DateOffset       95.9ms   96.1ms 
                     FY5253_1        36.9ms   18.1ms 
                     FY5253_2        35.8ms   18.2ms 
                 FY5253Quarter_1     98.7ms   66.2ms 
                 FY5253Quarter_2     96.3ms   65.4ms 
               ==================== ======== ========

[100.00%] ··· Running frame_ctor.FromSeries.time_frame_from_mi_series                                                              282μs

@pep8speaks
Copy link

pep8speaks commented Nov 26, 2017

Hello @mroeschke! Thanks for updating the PR.

Line 4:1: E265 block comment should start with '# '
Line 7:1: E302 expected 2 blank lines, found 1

Comment last updated on November 26, 2017 at 03:26 Hours UTC

@mroeschke
Copy link
Member Author

mroeschke commented Nov 26, 2017

Also, I am skeptical that FromDictwithTimestampOffset is benchmarking anything useful. Ultimately it's benchmarking a dict of Timestamp keys (with an offset) and float values. Given that it's the longest benchmark to run (xref #16803), I'd be great if this benchmark can be cut down.

Contructing a DataFrame from this dict of Timestamps with offsets/flaots is pretty constant regardless of offset at n=100.

asv dev -b ^frame_ctor.FromDictwithTimestampOffsets
· Discovering benchmarks
· Running 1 total benchmarks (1 commits * 1 environments * 1 benchmarks)
[  0.00%] ·· Building for existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[  0.00%] ·· Benchmarking existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[100.00%] ··· Running frame_ctor.FromDictwithTimestampOffsets.time_frame_ctor                                                         ok
[100.00%] ···· 
               ==================== ======== ========
               --                        n_steps     
               -------------------- -----------------
                      offset           1        2    
               ==================== ======== ========
                       Day           11.6ms   11.8ms 
                   BusinessDay       11.6ms   11.6ms 
                       BDay          11.6ms   11.6ms 
                CustomBusinessDay    11.6ms   11.6ms 
                       CDay          11.6ms   11.7ms 
                    CBMonthEnd       11.6ms   11.7ms 
                   CBMonthBegin      11.7ms   11.5ms 
                    MonthBegin       11.5ms   11.7ms 
                   BMonthBegin       11.5ms   11.5ms 
                     MonthEnd        11.8ms   11.5ms 
                    BMonthEnd        11.5ms   11.7ms 
                   SemiMonthEnd      11.6ms   11.5ms 
                  SemiMonthBegin     11.6ms   11.6ms 
                   BusinessHour      11.8ms   11.7ms 
                CustomBusinessHour   11.6ms   11.6ms 
                    YearBegin        11.4ms   11.8ms 
                    BYearBegin       11.5ms   11.6ms 
                     YearEnd         11.5ms   11.5ms 
                     BYearEnd        11.8ms   11.6ms 
                   QuarterBegin      11.5ms   11.5ms 
                  BQuarterBegin      11.5ms   11.6ms 
                    QuarterEnd       11.6ms   11.7ms 
                   BQuarterEnd       11.6ms   11.9ms 
                 LastWeekOfMonth     11.6ms   11.8ms 
                       Week          11.6ms   11.7ms 
                   WeekOfMonth       11.6ms   11.5ms 
                      Easter         11.6ms   11.6ms 
                       Hour          11.5ms   11.4ms 
                      Minute         11.6ms   11.7ms 
                      Second         11.7ms   11.5ms 
                      Milli          11.6ms   11.6ms 
                      Micro          11.4ms   11.4ms 
                       Nano          9.33ms   9.23ms 
                    DateOffset       11.7ms   11.8ms 
                     FY5253_1        11.6ms   11.5ms 
                     FY5253_2        11.8ms   11.7ms 
                 FY5253Quarter_1     11.7ms   11.7ms 
                 FY5253Quarter_2     11.7ms   12.2ms 
               ==================== ======== ========

The reason why the benchmark above is different is because there's logic to change n depending on the offset to avoid out-of-bounds timestamps.

@codecov
Copy link

codecov bot commented Nov 26, 2017

Codecov Report

Merging #18499 into master will increase coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18499      +/-   ##
==========================================
+ Coverage    91.3%   91.32%   +0.02%     
==========================================
  Files         163      163              
  Lines       49781    49781              
==========================================
+ Hits        45451    45463      +12     
+ Misses       4330     4318      -12
Flag Coverage Δ
#multiple 89.12% <ø> (+0.02%) ⬆️
#single 40.72% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/plotting/_converter.py 65.25% <0%> (+1.81%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 38f41e6...3567c85. Read the comment docs.

@jreback jreback added the Benchmark Performance (ASV) benchmarks label Nov 26, 2017
@jreback jreback added this to the 0.22.0 milestone Nov 26, 2017
@jreback jreback merged commit f1aac43 into pandas-dev:master Nov 26, 2017
@jreback
Copy link
Contributor

jreback commented Nov 26, 2017

thanks, can you open an issue about the freq benchmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmark Performance (ASV) benchmarks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants