CLN: ASV frame_ctor benchmark #18499

mroeschke · 2017-11-26T03:16:41Z

Added np.random.seed(1234) in setup classes where random data is created xref BENCH: put in np.random.seed on vbenches #8144
Ran flake8 and replaced star imports (but from pandas.core.datetools import * might need to be kept for compat?)
time_frame_ctor_nested_dict_int64 was using self.data instead of self.data2
Moved the class frame_get_numeric_data to frame_methods.py

asv run -b ^frame_ctor
[  0.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 16.67%] ··· Running frame_ctor.FromDicts.time_frame_ctor_list_of_dict                                                        113±0.3ms
[ 33.33%] ··· Running frame_ctor.FromDicts.time_frame_ctor_nested_dict                                                        92.7±0.3ms
[ 50.00%] ··· Running frame_ctor.FromDicts.time_frame_ctor_nested_dict_int64                                                   243±0.5ms
[ 66.67%] ··· Running frame_ctor.FromDicts.time_series_ctor_from_dict                                                        6.17±0.01ms
[ 83.33%] ··· Running frame_ctor.FromDictwithTimestampOffsets.time_frame_ctor                                                2/76 failed
[100.00%] ··· Running frame_ctor.FromSeries.time_frame_from_mi_series                                                          243±0.7μs

The supposed offsets benchmark failures do not show up when running asv dev?

asv dev -b ^frame_ctor
· Discovering benchmarks
· Running 6 total benchmarks (1 commits * 1 environments * 6 benchmarks)
[  0.00%] ·· Building for existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[  0.00%] ·· Benchmarking existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[ 16.67%] ··· Running frame_ctor.FromDicts.time_frame_ctor_list_of_dict                                                            153ms
[ 33.33%] ··· Running frame_ctor.FromDicts.time_frame_ctor_nested_dict                                                             147ms
[ 50.00%] ··· Running frame_ctor.FromDicts.time_frame_ctor_nested_dict_int64                                                       217ms
[ 66.67%] ··· Running frame_ctor.FromDicts.time_series_ctor_from_dict                                                             6.09ms
[ 83.33%] ··· Running frame_ctor.FromDictwithTimestampOffsets.time_frame_ctor                                                         ok
[ 83.33%] ···· 
               ==================== ======== ========
               --                        n_steps     
               -------------------- -----------------
                      offset           1        2    
               ==================== ======== ========
                       Day           96.5ms   96.7ms 
                   BusinessDay       96.7ms   96.5ms 
                       BDay          99.8ms   96.8ms 
                CustomBusinessDay    98.8ms   97.5ms 
                       CDay          96.7ms   95.8ms 
                    CBMonthEnd       96.8ms   97.7ms 
                   CBMonthBegin      97.0ms   98.0ms 
                    MonthBegin       96.4ms   97.6ms 
                   BMonthBegin       98.1ms   96.4ms 
                     MonthEnd        95.8ms   96.6ms 
                    BMonthEnd        96.7ms   96.9ms 
                   SemiMonthEnd      96.2ms   97.3ms 
                  SemiMonthBegin     96.3ms   98.1ms 
                   BusinessHour      96.5ms   97.1ms 
                CustomBusinessHour   97.2ms   97.0ms 
                    YearBegin        33.4ms   17.5ms 
                    BYearBegin       32.3ms   17.6ms 
                     YearEnd         32.2ms   17.3ms 
                     BYearEnd        36.2ms   17.9ms 
                   QuarterBegin      98.0ms   64.9ms 
                  BQuarterBegin      99.5ms   70.6ms 
                    QuarterEnd       101ms    67.9ms 
                   BQuarterEnd       98.3ms   69.0ms 
                 LastWeekOfMonth     97.1ms   102ms  
                       Week          97.5ms   102ms  
                   WeekOfMonth       104ms    102ms  
                      Easter         35.4ms   18.0ms 
                       Hour          102ms    96.3ms 
                      Minute         95.8ms   100ms  
                      Second         95.8ms   97.2ms 
                      Milli          98.7ms   101ms  
                      Micro          94.7ms   93.5ms 
                       Nano          71.8ms   72.3ms 
                    DateOffset       95.9ms   96.1ms 
                     FY5253_1        36.9ms   18.1ms 
                     FY5253_2        35.8ms   18.2ms 
                 FY5253Quarter_1     98.7ms   66.2ms 
                 FY5253Quarter_2     96.3ms   65.4ms 
               ==================== ======== ========

[100.00%] ··· Running frame_ctor.FromSeries.time_frame_from_mi_series                                                              282μs

pep8speaks · 2017-11-26T03:16:43Z

Hello @mroeschke! Thanks for updating the PR.

In the file asv_bench/benchmarks/frame_methods.py, following are the PEP8 issues :

Line 4:1: E265 block comment should start with '# '
Line 7:1: E302 expected 2 blank lines, found 1

Comment last updated on November 26, 2017 at 03:26 Hours UTC

mroeschke · 2017-11-26T03:24:48Z

Also, I am skeptical that FromDictwithTimestampOffset is benchmarking anything useful. Ultimately it's benchmarking a dict of Timestamp keys (with an offset) and float values. Given that it's the longest benchmark to run (xref #16803), I'd be great if this benchmark can be cut down.

Contructing a DataFrame from this dict of Timestamps with offsets/flaots is pretty constant regardless of offset at n=100.

asv dev -b ^frame_ctor.FromDictwithTimestampOffsets
· Discovering benchmarks
· Running 1 total benchmarks (1 commits * 1 environments * 1 benchmarks)
[  0.00%] ·· Building for existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[  0.00%] ·· Benchmarking existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[100.00%] ··· Running frame_ctor.FromDictwithTimestampOffsets.time_frame_ctor                                                         ok
[100.00%] ···· 
               ==================== ======== ========
               --                        n_steps     
               -------------------- -----------------
                      offset           1        2    
               ==================== ======== ========
                       Day           11.6ms   11.8ms 
                   BusinessDay       11.6ms   11.6ms 
                       BDay          11.6ms   11.6ms 
                CustomBusinessDay    11.6ms   11.6ms 
                       CDay          11.6ms   11.7ms 
                    CBMonthEnd       11.6ms   11.7ms 
                   CBMonthBegin      11.7ms   11.5ms 
                    MonthBegin       11.5ms   11.7ms 
                   BMonthBegin       11.5ms   11.5ms 
                     MonthEnd        11.8ms   11.5ms 
                    BMonthEnd        11.5ms   11.7ms 
                   SemiMonthEnd      11.6ms   11.5ms 
                  SemiMonthBegin     11.6ms   11.6ms 
                   BusinessHour      11.8ms   11.7ms 
                CustomBusinessHour   11.6ms   11.6ms 
                    YearBegin        11.4ms   11.8ms 
                    BYearBegin       11.5ms   11.6ms 
                     YearEnd         11.5ms   11.5ms 
                     BYearEnd        11.8ms   11.6ms 
                   QuarterBegin      11.5ms   11.5ms 
                  BQuarterBegin      11.5ms   11.6ms 
                    QuarterEnd       11.6ms   11.7ms 
                   BQuarterEnd       11.6ms   11.9ms 
                 LastWeekOfMonth     11.6ms   11.8ms 
                       Week          11.6ms   11.7ms 
                   WeekOfMonth       11.6ms   11.5ms 
                      Easter         11.6ms   11.6ms 
                       Hour          11.5ms   11.4ms 
                      Minute         11.6ms   11.7ms 
                      Second         11.7ms   11.5ms 
                      Milli          11.6ms   11.6ms 
                      Micro          11.4ms   11.4ms 
                       Nano          9.33ms   9.23ms 
                    DateOffset       11.7ms   11.8ms 
                     FY5253_1        11.6ms   11.5ms 
                     FY5253_2        11.8ms   11.7ms 
                 FY5253Quarter_1     11.7ms   11.7ms 
                 FY5253Quarter_2     11.7ms   12.2ms 
               ==================== ======== ========

The reason why the benchmark above is different is because there's logic to change n depending on the offset to avoid out-of-bounds timestamps.

codecov · 2017-11-26T11:43:37Z

Codecov Report

Merging #18499 into master will increase coverage by 0.02%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #18499      +/-   ##
==========================================
+ Coverage    91.3%   91.32%   +0.02%     
==========================================
  Files         163      163              
  Lines       49781    49781              
==========================================
+ Hits        45451    45463      +12     
+ Misses       4330     4318      -12

Flag	Coverage Δ
#multiple	`89.12% <ø> (+0.02%)`	⬆️
#single	`40.72% <ø> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/plotting/_converter.py	`65.25% <0%> (+1.81%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 38f41e6...3567c85. Read the comment docs.

jreback · 2017-11-26T15:02:08Z

thanks, can you open an issue about the freq benchmarks.

mroeschke added 2 commits November 25, 2017 18:40

CLN: ASV frame_ctor benchmarks

c8baf30

extra cleans

8bad471

flake 8

3567c85

jreback added the Benchmark Performance (ASV) benchmarks label Nov 26, 2017

jreback added this to the 0.22.0 milestone Nov 26, 2017

jreback merged commit f1aac43 into pandas-dev:master Nov 26, 2017

mroeschke deleted the asv_clean_frame_ctor branch November 27, 2017 03:21

mroeschke mentioned this pull request Nov 27, 2017

BENCH: FromDictwithTimestampOffset benchmark is not useful #18511

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLN: ASV frame_ctor benchmark #18499

CLN: ASV frame_ctor benchmark #18499

mroeschke commented Nov 26, 2017 •

edited

Loading

pep8speaks commented Nov 26, 2017 •

edited

Loading

mroeschke commented Nov 26, 2017 •

edited

Loading

codecov bot commented Nov 26, 2017

jreback commented Nov 26, 2017

CLN: ASV frame_ctor benchmark #18499

CLN: ASV frame_ctor benchmark #18499

Conversation

mroeschke commented Nov 26, 2017 • edited Loading

pep8speaks commented Nov 26, 2017 • edited Loading

Comment last updated on November 26, 2017 at 03:26 Hours UTC

mroeschke commented Nov 26, 2017 • edited Loading

codecov bot commented Nov 26, 2017

Codecov Report

jreback commented Nov 26, 2017

mroeschke commented Nov 26, 2017 •

edited

Loading

pep8speaks commented Nov 26, 2017 •

edited

Loading

mroeschke commented Nov 26, 2017 •

edited

Loading