CLN: ASV io_bench, parser_vb #18815

mroeschke · 2017-12-18T02:49:16Z

There are a lot of io benchmarks scattered among hdfstore_bench.py, io_bench.py, parser_vb.py, and packer.py. I think it would be a lot cleaner if each io method had it's own file (like io_sql.py which already exists)

This PR creates io_csv.py and io_json.py that consolidates the csv and json benchmarks that exist. Benchmarks were flake8'd, param'd, and simplified where available:

$ asv dev -b ^io_json
· Discovering benchmarks
· Running 15 total benchmarks (1 commits * 1 environments * 15 benchmarks)
[  0.00%] ·· Building for existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[  0.00%] ·· Benchmarking existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[  6.67%] ··· Running io_json.ReadJSON.time_read_json                                                              ok
[  6.67%] ···· 
               ========= ======= ==========
               --              index       
               --------- ------------------
                 orient    int    datetime 
               ========= ======= ==========
                 split    294ms    272ms   
                 index    7.61s    7.65s   
                records   614ms    613ms   
               ========= ======= ==========

[ 13.33%] ··· Running io_json.ReadJSONLines.peakmem_read_json_lines                                                ok
[ 13.33%] ···· 
               ========== ======
                 index          
               ---------- ------
                  int      192M 
                datetime   192M 
               ========== ======

[ 20.00%] ··· Running io_json.ReadJSONLines.peakmem_read_json_lines_concat                                         ok
[ 20.00%] ···· 
               ========== ======
                 index          
               ---------- ------
                  int      164M 
                datetime   164M 
               ========== ======

[ 26.67%] ··· Running io_json.ReadJSONLines.time_read_json_lines                                                   ok
[ 26.67%] ···· 
               ========== =======
                 index           
               ---------- -------
                  int      755ms 
                datetime   720ms 
               ========== =======

[ 33.33%] ··· Running io_json.ReadJSONLines.time_read_json_lines_concat                                            ok
[ 33.33%] ···· 
               ========== =======
                 index           
               ---------- -------
                  int      749ms 
                datetime   752ms 
               ========== =======

[ 40.00%] ··· Running io_json.ToJSON.time_delta_int_tstamp                                                         ok
[ 40.00%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    238ms 
                columns   234ms 
                 index    397ms 
               ========= =======

[ 46.67%] ··· Running io_json.ToJSON.time_delta_int_tstamp_lines                                                   ok
[ 46.67%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    593ms 
                columns   566ms 
                 index    540ms 
               ========= =======

[ 53.33%] ··· Running io_json.ToJSON.time_float_int                                                                ok
[ 53.33%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    237ms 
                columns   214ms 
                 index    371ms 
               ========= =======

[ 60.00%] ··· Running io_json.ToJSON.time_float_int_lines                                                          ok
[ 60.00%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    569ms 
                columns   570ms 
                 index    570ms 
               ========= =======

[ 66.67%] ··· Running io_json.ToJSON.time_float_int_str                                                            ok
[ 66.67%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    238ms 
                columns   216ms 
                 index    384ms 
               ========= =======

[ 73.33%] ··· Running io_json.ToJSON.time_float_int_str_lines                                                      ok
[ 73.33%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    598ms 
                columns   614ms 
                 index    617ms 
               ========= =======

[ 80.00%] ··· Running io_json.ToJSON.time_floats_with_dt_index                                                     ok
[ 80.00%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    182ms 
                columns   209ms 
                 index    208ms 
               ========= =======

[ 86.67%] ··· Running io_json.ToJSON.time_floats_with_dt_index_lines                                               ok
[ 86.67%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    442ms 
                columns   444ms 
                 index    449ms 
               ========= =======

[ 93.33%] ··· Running io_json.ToJSON.time_floats_with_int_idex_lines                                               ok
[ 93.33%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    442ms 
                columns   442ms 
                 index    441ms 
               ========= =======

[100.00%] ··· Running io_json.ToJSON.time_floats_with_int_index                                                    ok
[100.00%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    159ms 
                columns   171ms 
                 index    182ms 
               ========= =======

asv dev -b ^io_csv
· Discovering benchmarks
· Running 16 total benchmarks (1 commits * 1 environments * 16 benchmarks)
[  0.00%] ·· Building for existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[  0.00%] ·· Benchmarking existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[  6.25%] ··· Running io_csv.ReadCSVCategorical.time_convert_direct                                            99.1ms
[ 12.50%] ··· Running io_csv.ReadCSVCategorical.time_convert_post                                               139ms
[ 18.75%] ··· Running io_csv.ReadCSVComment.time_comment                                                       60.7ms
[ 25.00%] ··· Running io_csv.ReadCSVDInferDatetimeFormat.time_read_csv                                             ok
[ 25.00%] ···· 
               ======================= ======== ========= ========
               --                                 format          
               ----------------------- ---------------------------
                infer_datetime_format   custom   iso8601    ymd   
               ======================= ======== ========= ========
                         True           23.0ms    4.65ms   4.77ms 
                        False           619ms     3.49ms   3.18ms 
               ======================= ======== ========= ========

[ 31.25%] ··· Running io_csv.ReadCSVFloatPrecision.time_read_csv                                                   ok
[ 31.25%] ···· 
               ===== ========== ========== ================ ========== ========== ================
               --                              decimal / float_precision                          
               ----- -----------------------------------------------------------------------------
                sep   . / None   . / high   . / round_trip   _ / None   _ / high   _ / round_trip 
               ===== ========== ========== ================ ========== ========== ================
                 ,     3.89ms     3.73ms        5.12ms        4.17ms     4.10ms        4.16ms     
                 ;     3.87ms     3.71ms        5.23ms        4.11ms     4.18ms        4.23ms     
               ===== ========== ========== ================ ========== ========== ================

[ 37.50%] ··· Running io_csv.ReadCSVFloatPrecision.time_read_csv_python_engine                                     ok
[ 37.50%] ···· 
               ===== ========== ========== ================ ========== ========== ================
               --                              decimal / float_precision                          
               ----- -----------------------------------------------------------------------------
                sep   . / None   . / high   . / round_trip   _ / None   _ / high   _ / round_trip 
               ===== ========== ========== ================ ========== ========== ================
                 ,     7.84ms     7.76ms        7.80ms        6.14ms     6.24ms        6.25ms     
                 ;     7.76ms     7.80ms        8.41ms        6.10ms     6.25ms        6.14ms     
               ===== ========== ========== ================ ========== ========== ================

[ 43.75%] ··· Running io_csv.ReadCSVParseDates.time_baseline                                                   2.85ms
[ 50.00%] ··· Running io_csv.ReadCSVParseDates.time_multiple_date                                              2.84ms
[ 56.25%] ··· Running io_csv.ReadCSVSkipRows.time_skipprows                                                        ok
[ 56.25%] ···· 
               ========== ========
                skiprows          
               ---------- --------
                  None     44.8ms 
                 10000     30.9ms 
               ========== ========

[ 62.50%] ··· Running io_csv.ReadCSVThousands.time_thousands                                                       ok
[ 62.50%] ···· 
               ===== ======== ========
               --        thousands    
               ----- -----------------
                sep    None      ,    
               ===== ======== ========
                 ,    38.5ms   37.0ms 
                 |    37.8ms   39.6ms 
               ===== ======== ========

[ 68.75%] ··· Running io_csv.ReadUint64Integers.time_read_uint64                                               8.99ms
[ 75.00%] ··· Running io_csv.ReadUint64Integers.time_read_uint64_na_values                                     13.2ms
[ 81.25%] ··· Running io_csv.ReadUint64Integers.time_read_uint64_neg_values                                    12.9ms
[ 87.50%] ··· Running io_csv.S3.time_read_csv_10_rows                                                              ok
[ 87.50%] ···· 
               ============= ======== =======
               --                 engine     
               ------------- ----------------
                compression   python     c   
               ============= ======== =======
                    None      6.41s    6.13s 
                    gzip      6.20s    6.60s 
                    bz2       35.1s     n/a  
               ============= ======== =======

[ 93.75%] ··· Running io_csv.ToCSV.time_frame                                                                      ok
[ 93.75%] ···· 
               ======= ========
                 kind          
               ------- --------
                 wide   84.4ms 
                 long   164ms  
                mixed   37.4ms 
               ======= ========

codecov · 2017-12-18T03:53:26Z

Codecov Report

Merging #18815 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #18815      +/-   ##
==========================================
- Coverage   91.64%   91.62%   -0.02%     
==========================================
  Files         154      154              
  Lines       51408    51408              
==========================================
- Hits        47113    47104       -9     
- Misses       4295     4304       +9

Flag	Coverage Δ
#multiple	`89.49% <ø> (ø)`	⬆️
#single	`40.83% <ø> (-0.12%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.68% <0%> (-0.11%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b6a7cc9...e8eda6a. Read the comment docs.

jreback · 2017-12-18T12:22:34Z

+1 on this, does asv supported nested benchmarks, e.g. can we create a benchmarks/io/json..... and so on and have it work?

mroeschke · 2017-12-19T03:54:28Z

Yup. Created a new io folder with csv.py and json.py

asv dev -b ^io[^_]
· Discovering benchmarks
· Running 31 total benchmarks (1 commits * 1 environments * 31 benchmarks)
[  0.00%] ·· Building for existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[  0.00%] ·· Benchmarking existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[  3.23%] ··· Running io.csv.ReadCSVCategorical.time_convert_direct                                          98.1ms
[  6.45%] ··· Running io.csv.ReadCSVCategorical.time_convert_post                                             143ms
[  9.68%] ··· Running io.csv.ReadCSVComment.time_comment                                                     55.6ms
[ 12.90%] ··· Running io.csv.ReadCSVDInferDatetimeFormat.time_read_csv                                           ok
[ 12.90%] ···· 
               ======================= ======== ========= ========
               --                                 format          
               ----------------------- ---------------------------
                infer_datetime_format   custom   iso8601    ymd   
               ======================= ======== ========= ========
                         True           22.9ms    4.62ms   4.83ms 
                        False           615ms     3.46ms   3.22ms 
               ======================= ======== ========= ========

[ 16.13%] ··· Running io.csv.ReadCSVFloatPrecision.time_read_csv                                                 ok
[ 16.13%] ···· 
               ===== ========== ========== ================ ========== ========== ================
               --                              decimal / float_precision                          
               ----- -----------------------------------------------------------------------------
                sep   . / None   . / high   . / round_trip   _ / None   _ / high   _ / round_trip 
               ===== ========== ========== ================ ========== ========== ================
                 ,     3.97ms     4.01ms        5.18ms        4.20ms     4.17ms        4.14ms     
                 ;     4.08ms     3.75ms        5.27ms        4.12ms     4.09ms        4.24ms     
               ===== ========== ========== ================ ========== ========== ================

[ 19.35%] ··· Running io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine                                   ok
[ 19.35%] ···· 
               ===== ========== ========== ================ ========== ========== ================
               --                              decimal / float_precision                          
               ----- -----------------------------------------------------------------------------
                sep   . / None   . / high   . / round_trip   _ / None   _ / high   _ / round_trip 
               ===== ========== ========== ================ ========== ========== ================
                 ,     7.93ms     7.76ms        7.70ms        6.14ms     6.21ms        6.21ms     
                 ;     7.69ms     7.80ms        7.72ms        6.16ms     6.36ms        6.23ms     
               ===== ========== ========== ================ ========== ========== ================

[ 22.58%] ··· Running io.csv.ReadCSVParseDates.time_baseline                                                 2.91ms
[ 25.81%] ··· Running io.csv.ReadCSVParseDates.time_multiple_date                                            2.91ms
[ 29.03%] ··· Running io.csv.ReadCSVSkipRows.time_skipprows                                                      ok
[ 29.03%] ···· 
               ========== ========
                skiprows          
               ---------- --------
                  None     46.1ms 
                 10000     30.7ms 
               ========== ========

[ 32.26%] ··· Running io.csv.ReadCSVThousands.time_thousands                                                     ok
[ 32.26%] ···· 
               ===== ======== ========
               --        thousands    
               ----- -----------------
                sep    None      ,    
               ===== ======== ========
                 ,    38.3ms   35.9ms 
                 |    39.1ms   38.6ms 
               ===== ======== ========

[ 35.48%] ··· Running io.csv.ReadUint64Integers.time_read_uint64                                             8.76ms
[ 38.71%] ··· Running io.csv.ReadUint64Integers.time_read_uint64_na_values                                   13.1ms
[ 41.94%] ··· Running io.csv.ReadUint64Integers.time_read_uint64_neg_values                                  13.1ms
[ 45.16%] ··· Running io.csv.S3.time_read_csv_10_rows                                                            ok
[ 45.16%] ···· 
               ============= ======== =======
               --                 engine     
               ------------- ----------------
                compression   python     c   
               ============= ======== =======
                    None      6.15s    6.53s 
                    gzip      6.78s    6.15s 
                    bz2       34.0s     n/a  
               ============= ======== =======

[ 48.39%] ··· Running io.csv.ToCSV.time_frame                                                                    ok
[ 48.39%] ···· 
               ======= ========
                 kind          
               ------- --------
                 wide   107ms  
                 long   190ms  
                mixed   41.4ms 
               ======= ========

[ 51.61%] ··· Running io.csv.ToCSVDatetime.time_frame_date_formatting                                        21.9ms
[ 54.84%] ··· Running io.json.ReadJSON.time_read_json                                                            ok
[ 54.84%] ···· 
               ========= ======= ==========
               --              index       
               --------- ------------------
                 orient    int    datetime 
               ========= ======= ==========
                 split    254ms    270ms   
                 index    7.43s    7.55s   
                records   610ms    618ms   
               ========= ======= ==========

[ 58.06%] ··· Running io.json.ReadJSONLines.peakmem_read_json_lines                                              ok
[ 58.06%] ···· 
               ========== ======
                 index          
               ---------- ------
                  int      192M 
                datetime   192M 
               ========== ======

[ 61.29%] ··· Running io.json.ReadJSONLines.peakmem_read_json_lines_concat                                       ok
[ 61.29%] ···· 
               ========== ======
                 index          
               ---------- ------
                  int      164M 
                datetime   164M 
               ========== ======

[ 64.52%] ··· Running io.json.ReadJSONLines.time_read_json_lines                                                 ok
[ 64.52%] ···· 
               ========== =======
                 index           
               ---------- -------
                  int      715ms 
                datetime   711ms 
               ========== =======

[ 67.74%] ··· Running io.json.ReadJSONLines.time_read_json_lines_concat                                          ok
[ 67.74%] ···· 
               ========== =======
                 index           
               ---------- -------
                  int      748ms 
                datetime   748ms 
               ========== =======

[ 70.97%] ··· Running io.json.ToJSON.time_delta_int_tstamp                                                       ok
[ 70.97%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    236ms 
                columns   231ms 
                 index    381ms 
               ========= =======

[ 74.19%] ··· Running io.json.ToJSON.time_delta_int_tstamp_lines                                                 ok
[ 74.19%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    541ms 
                columns   541ms 
                 index    539ms 
               ========= =======

[ 77.42%] ··· Running io.json.ToJSON.time_float_int                                                              ok
[ 77.42%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    220ms 
                columns   213ms 
                 index    370ms 
               ========= =======

[ 80.65%] ··· Running io.json.ToJSON.time_float_int_lines                                                        ok
[ 80.65%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    577ms 
                columns   569ms 
                 index    567ms 
               ========= =======

[ 83.87%] ··· Running io.json.ToJSON.time_float_int_str                                                          ok
[ 83.87%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    237ms 
                columns   214ms 
                 index    387ms 
               ========= =======

[ 87.10%] ··· Running io.json.ToJSON.time_float_int_str_lines                                                    ok
[ 87.10%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    595ms 
                columns   598ms 
                 index    600ms 
               ========= =======

[ 90.32%] ··· Running io.json.ToJSON.time_floats_with_dt_index                                                   ok
[ 90.32%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    181ms 
                columns   206ms 
                 index    208ms 
               ========= =======

[ 93.55%] ··· Running io.json.ToJSON.time_floats_with_dt_index_lines                                             ok
[ 93.55%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    446ms 
                columns   447ms 
                 index    444ms 
               ========= =======

[ 96.77%] ··· Running io.json.ToJSON.time_floats_with_int_idex_lines                                             ok
[ 96.77%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    439ms 
                columns   439ms 
                 index    440ms 
               ========= =======

[100.00%] ··· Running io.json.ToJSON.time_floats_with_int_index                                                  ok
[100.00%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    160ms 
                columns   173ms 
                 index    183ms 
               ========= =======

jreback · 2017-12-19T11:34:25Z

thanks @mroeschke

ideally would consolidate the rest of the io benchmarks in io/

jreback added the Benchmark Performance (ASV) benchmarks label Dec 18, 2017

mroeschke added 5 commits December 18, 2017 19:06

CLN: ASV io bench

10f1112

Migrate benchmarks for parser_vb

6fef2c7

Undo removed file

bf4e257

Add additional BasIO

4eed4c0

Create io folder for benchmarks

e8eda6a

mroeschke force-pushed the asv_clean_io_bench branch from 7b6a72c to e8eda6a Compare December 19, 2017 03:52

jreback added this to the 0.22.0 milestone Dec 19, 2017

jreback merged commit b2a02bd into pandas-dev:master Dec 19, 2017

mroeschke deleted the asv_clean_io_bench branch December 19, 2017 17:36

mroeschke mentioned this pull request Dec 22, 2017

CLN: ASV io benchmarks #18906

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

CLN: ASV io_bench, parser_vb #18815

CLN: ASV io_bench, parser_vb #18815

Uh oh!

mroeschke commented Dec 18, 2017

Uh oh!

codecov bot commented Dec 18, 2017 •

edited

Loading

Uh oh!

jreback commented Dec 18, 2017

Uh oh!

mroeschke commented Dec 19, 2017

Uh oh!

jreback commented Dec 19, 2017

Uh oh!

Uh oh!

Uh oh!

CLN: ASV io_bench, parser_vb #18815

CLN: ASV io_bench, parser_vb #18815

Uh oh!

Conversation

mroeschke commented Dec 18, 2017

Uh oh!

codecov bot commented Dec 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jreback commented Dec 18, 2017

Uh oh!

mroeschke commented Dec 19, 2017

Uh oh!

jreback commented Dec 19, 2017

Uh oh!

Uh oh!

codecov bot commented Dec 18, 2017 •

edited

Loading