Skip to content

CLN: ASV io_bench, parser_vb #18815

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 19, 2017
Merged

Conversation

mroeschke
Copy link
Member

There are a lot of io benchmarks scattered among hdfstore_bench.py, io_bench.py, parser_vb.py, and packer.py. I think it would be a lot cleaner if each io method had it's own file (like io_sql.py which already exists)

This PR creates io_csv.py and io_json.py that consolidates the csv and json benchmarks that exist. Benchmarks were flake8'd, param'd, and simplified where available:

$ asv dev -b ^io_json
· Discovering benchmarks
· Running 15 total benchmarks (1 commits * 1 environments * 15 benchmarks)
[  0.00%] ·· Building for existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[  0.00%] ·· Benchmarking existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[  6.67%] ··· Running io_json.ReadJSON.time_read_json                                                              ok
[  6.67%] ···· 
               ========= ======= ==========
               --              index       
               --------- ------------------
                 orient    int    datetime 
               ========= ======= ==========
                 split    294ms    272ms   
                 index    7.61s    7.65s   
                records   614ms    613ms   
               ========= ======= ==========

[ 13.33%] ··· Running io_json.ReadJSONLines.peakmem_read_json_lines                                                ok
[ 13.33%] ···· 
               ========== ======
                 index          
               ---------- ------
                  int      192M 
                datetime   192M 
               ========== ======

[ 20.00%] ··· Running io_json.ReadJSONLines.peakmem_read_json_lines_concat                                         ok
[ 20.00%] ···· 
               ========== ======
                 index          
               ---------- ------
                  int      164M 
                datetime   164M 
               ========== ======

[ 26.67%] ··· Running io_json.ReadJSONLines.time_read_json_lines                                                   ok
[ 26.67%] ···· 
               ========== =======
                 index           
               ---------- -------
                  int      755ms 
                datetime   720ms 
               ========== =======

[ 33.33%] ··· Running io_json.ReadJSONLines.time_read_json_lines_concat                                            ok
[ 33.33%] ···· 
               ========== =======
                 index           
               ---------- -------
                  int      749ms 
                datetime   752ms 
               ========== =======

[ 40.00%] ··· Running io_json.ToJSON.time_delta_int_tstamp                                                         ok
[ 40.00%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    238ms 
                columns   234ms 
                 index    397ms 
               ========= =======

[ 46.67%] ··· Running io_json.ToJSON.time_delta_int_tstamp_lines                                                   ok
[ 46.67%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    593ms 
                columns   566ms 
                 index    540ms 
               ========= =======

[ 53.33%] ··· Running io_json.ToJSON.time_float_int                                                                ok
[ 53.33%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    237ms 
                columns   214ms 
                 index    371ms 
               ========= =======

[ 60.00%] ··· Running io_json.ToJSON.time_float_int_lines                                                          ok
[ 60.00%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    569ms 
                columns   570ms 
                 index    570ms 
               ========= =======

[ 66.67%] ··· Running io_json.ToJSON.time_float_int_str                                                            ok
[ 66.67%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    238ms 
                columns   216ms 
                 index    384ms 
               ========= =======

[ 73.33%] ··· Running io_json.ToJSON.time_float_int_str_lines                                                      ok
[ 73.33%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    598ms 
                columns   614ms 
                 index    617ms 
               ========= =======

[ 80.00%] ··· Running io_json.ToJSON.time_floats_with_dt_index                                                     ok
[ 80.00%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    182ms 
                columns   209ms 
                 index    208ms 
               ========= =======

[ 86.67%] ··· Running io_json.ToJSON.time_floats_with_dt_index_lines                                               ok
[ 86.67%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    442ms 
                columns   444ms 
                 index    449ms 
               ========= =======

[ 93.33%] ··· Running io_json.ToJSON.time_floats_with_int_idex_lines                                               ok
[ 93.33%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    442ms 
                columns   442ms 
                 index    441ms 
               ========= =======

[100.00%] ··· Running io_json.ToJSON.time_floats_with_int_index                                                    ok
[100.00%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    159ms 
                columns   171ms 
                 index    182ms 
               ========= =======
asv dev -b ^io_csv
· Discovering benchmarks
· Running 16 total benchmarks (1 commits * 1 environments * 16 benchmarks)
[  0.00%] ·· Building for existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[  0.00%] ·· Benchmarking existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[  6.25%] ··· Running io_csv.ReadCSVCategorical.time_convert_direct                                            99.1ms
[ 12.50%] ··· Running io_csv.ReadCSVCategorical.time_convert_post                                               139ms
[ 18.75%] ··· Running io_csv.ReadCSVComment.time_comment                                                       60.7ms
[ 25.00%] ··· Running io_csv.ReadCSVDInferDatetimeFormat.time_read_csv                                             ok
[ 25.00%] ···· 
               ======================= ======== ========= ========
               --                                 format          
               ----------------------- ---------------------------
                infer_datetime_format   custom   iso8601    ymd   
               ======================= ======== ========= ========
                         True           23.0ms    4.65ms   4.77ms 
                        False           619ms     3.49ms   3.18ms 
               ======================= ======== ========= ========

[ 31.25%] ··· Running io_csv.ReadCSVFloatPrecision.time_read_csv                                                   ok
[ 31.25%] ···· 
               ===== ========== ========== ================ ========== ========== ================
               --                              decimal / float_precision                          
               ----- -----------------------------------------------------------------------------
                sep   . / None   . / high   . / round_trip   _ / None   _ / high   _ / round_trip 
               ===== ========== ========== ================ ========== ========== ================
                 ,     3.89ms     3.73ms        5.12ms        4.17ms     4.10ms        4.16ms     
                 ;     3.87ms     3.71ms        5.23ms        4.11ms     4.18ms        4.23ms     
               ===== ========== ========== ================ ========== ========== ================

[ 37.50%] ··· Running io_csv.ReadCSVFloatPrecision.time_read_csv_python_engine                                     ok
[ 37.50%] ···· 
               ===== ========== ========== ================ ========== ========== ================
               --                              decimal / float_precision                          
               ----- -----------------------------------------------------------------------------
                sep   . / None   . / high   . / round_trip   _ / None   _ / high   _ / round_trip 
               ===== ========== ========== ================ ========== ========== ================
                 ,     7.84ms     7.76ms        7.80ms        6.14ms     6.24ms        6.25ms     
                 ;     7.76ms     7.80ms        8.41ms        6.10ms     6.25ms        6.14ms     
               ===== ========== ========== ================ ========== ========== ================

[ 43.75%] ··· Running io_csv.ReadCSVParseDates.time_baseline                                                   2.85ms
[ 50.00%] ··· Running io_csv.ReadCSVParseDates.time_multiple_date                                              2.84ms
[ 56.25%] ··· Running io_csv.ReadCSVSkipRows.time_skipprows                                                        ok
[ 56.25%] ···· 
               ========== ========
                skiprows          
               ---------- --------
                  None     44.8ms 
                 10000     30.9ms 
               ========== ========

[ 62.50%] ··· Running io_csv.ReadCSVThousands.time_thousands                                                       ok
[ 62.50%] ···· 
               ===== ======== ========
               --        thousands    
               ----- -----------------
                sep    None      ,    
               ===== ======== ========
                 ,    38.5ms   37.0ms 
                 |    37.8ms   39.6ms 
               ===== ======== ========

[ 68.75%] ··· Running io_csv.ReadUint64Integers.time_read_uint64                                               8.99ms
[ 75.00%] ··· Running io_csv.ReadUint64Integers.time_read_uint64_na_values                                     13.2ms
[ 81.25%] ··· Running io_csv.ReadUint64Integers.time_read_uint64_neg_values                                    12.9ms
[ 87.50%] ··· Running io_csv.S3.time_read_csv_10_rows                                                              ok
[ 87.50%] ···· 
               ============= ======== =======
               --                 engine     
               ------------- ----------------
                compression   python     c   
               ============= ======== =======
                    None      6.41s    6.13s 
                    gzip      6.20s    6.60s 
                    bz2       35.1s     n/a  
               ============= ======== =======

[ 93.75%] ··· Running io_csv.ToCSV.time_frame                                                                      ok
[ 93.75%] ···· 
               ======= ========
                 kind          
               ------- --------
                 wide   84.4ms 
                 long   164ms  
                mixed   37.4ms 
               ======= ========

@codecov
Copy link

codecov bot commented Dec 18, 2017

Codecov Report

Merging #18815 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18815      +/-   ##
==========================================
- Coverage   91.64%   91.62%   -0.02%     
==========================================
  Files         154      154              
  Lines       51408    51408              
==========================================
- Hits        47113    47104       -9     
- Misses       4295     4304       +9
Flag Coverage Δ
#multiple 89.49% <ø> (ø) ⬆️
#single 40.83% <ø> (-0.12%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.68% <0%> (-0.11%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b6a7cc9...e8eda6a. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Dec 18, 2017

+1 on this, does asv supported nested benchmarks, e.g. can we create a benchmarks/io/json..... and so on and have it work?

@jreback jreback added the Benchmark Performance (ASV) benchmarks label Dec 18, 2017
@mroeschke
Copy link
Member Author

Yup. Created a new io folder with csv.py and json.py

asv dev -b ^io[^_]
· Discovering benchmarks
· Running 31 total benchmarks (1 commits * 1 environments * 31 benchmarks)
[  0.00%] ·· Building for existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[  0.00%] ·· Benchmarking existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[  3.23%] ··· Running io.csv.ReadCSVCategorical.time_convert_direct                                          98.1ms
[  6.45%] ··· Running io.csv.ReadCSVCategorical.time_convert_post                                             143ms
[  9.68%] ··· Running io.csv.ReadCSVComment.time_comment                                                     55.6ms
[ 12.90%] ··· Running io.csv.ReadCSVDInferDatetimeFormat.time_read_csv                                           ok
[ 12.90%] ···· 
               ======================= ======== ========= ========
               --                                 format          
               ----------------------- ---------------------------
                infer_datetime_format   custom   iso8601    ymd   
               ======================= ======== ========= ========
                         True           22.9ms    4.62ms   4.83ms 
                        False           615ms     3.46ms   3.22ms 
               ======================= ======== ========= ========

[ 16.13%] ··· Running io.csv.ReadCSVFloatPrecision.time_read_csv                                                 ok
[ 16.13%] ···· 
               ===== ========== ========== ================ ========== ========== ================
               --                              decimal / float_precision                          
               ----- -----------------------------------------------------------------------------
                sep   . / None   . / high   . / round_trip   _ / None   _ / high   _ / round_trip 
               ===== ========== ========== ================ ========== ========== ================
                 ,     3.97ms     4.01ms        5.18ms        4.20ms     4.17ms        4.14ms     
                 ;     4.08ms     3.75ms        5.27ms        4.12ms     4.09ms        4.24ms     
               ===== ========== ========== ================ ========== ========== ================

[ 19.35%] ··· Running io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine                                   ok
[ 19.35%] ···· 
               ===== ========== ========== ================ ========== ========== ================
               --                              decimal / float_precision                          
               ----- -----------------------------------------------------------------------------
                sep   . / None   . / high   . / round_trip   _ / None   _ / high   _ / round_trip 
               ===== ========== ========== ================ ========== ========== ================
                 ,     7.93ms     7.76ms        7.70ms        6.14ms     6.21ms        6.21ms     
                 ;     7.69ms     7.80ms        7.72ms        6.16ms     6.36ms        6.23ms     
               ===== ========== ========== ================ ========== ========== ================

[ 22.58%] ··· Running io.csv.ReadCSVParseDates.time_baseline                                                 2.91ms
[ 25.81%] ··· Running io.csv.ReadCSVParseDates.time_multiple_date                                            2.91ms
[ 29.03%] ··· Running io.csv.ReadCSVSkipRows.time_skipprows                                                      ok
[ 29.03%] ···· 
               ========== ========
                skiprows          
               ---------- --------
                  None     46.1ms 
                 10000     30.7ms 
               ========== ========

[ 32.26%] ··· Running io.csv.ReadCSVThousands.time_thousands                                                     ok
[ 32.26%] ···· 
               ===== ======== ========
               --        thousands    
               ----- -----------------
                sep    None      ,    
               ===== ======== ========
                 ,    38.3ms   35.9ms 
                 |    39.1ms   38.6ms 
               ===== ======== ========

[ 35.48%] ··· Running io.csv.ReadUint64Integers.time_read_uint64                                             8.76ms
[ 38.71%] ··· Running io.csv.ReadUint64Integers.time_read_uint64_na_values                                   13.1ms
[ 41.94%] ··· Running io.csv.ReadUint64Integers.time_read_uint64_neg_values                                  13.1ms
[ 45.16%] ··· Running io.csv.S3.time_read_csv_10_rows                                                            ok
[ 45.16%] ···· 
               ============= ======== =======
               --                 engine     
               ------------- ----------------
                compression   python     c   
               ============= ======== =======
                    None      6.15s    6.53s 
                    gzip      6.78s    6.15s 
                    bz2       34.0s     n/a  
               ============= ======== =======

[ 48.39%] ··· Running io.csv.ToCSV.time_frame                                                                    ok
[ 48.39%] ···· 
               ======= ========
                 kind          
               ------- --------
                 wide   107ms  
                 long   190ms  
                mixed   41.4ms 
               ======= ========

[ 51.61%] ··· Running io.csv.ToCSVDatetime.time_frame_date_formatting                                        21.9ms
[ 54.84%] ··· Running io.json.ReadJSON.time_read_json                                                            ok
[ 54.84%] ···· 
               ========= ======= ==========
               --              index       
               --------- ------------------
                 orient    int    datetime 
               ========= ======= ==========
                 split    254ms    270ms   
                 index    7.43s    7.55s   
                records   610ms    618ms   
               ========= ======= ==========

[ 58.06%] ··· Running io.json.ReadJSONLines.peakmem_read_json_lines                                              ok
[ 58.06%] ···· 
               ========== ======
                 index          
               ---------- ------
                  int      192M 
                datetime   192M 
               ========== ======

[ 61.29%] ··· Running io.json.ReadJSONLines.peakmem_read_json_lines_concat                                       ok
[ 61.29%] ···· 
               ========== ======
                 index          
               ---------- ------
                  int      164M 
                datetime   164M 
               ========== ======

[ 64.52%] ··· Running io.json.ReadJSONLines.time_read_json_lines                                                 ok
[ 64.52%] ···· 
               ========== =======
                 index           
               ---------- -------
                  int      715ms 
                datetime   711ms 
               ========== =======

[ 67.74%] ··· Running io.json.ReadJSONLines.time_read_json_lines_concat                                          ok
[ 67.74%] ···· 
               ========== =======
                 index           
               ---------- -------
                  int      748ms 
                datetime   748ms 
               ========== =======

[ 70.97%] ··· Running io.json.ToJSON.time_delta_int_tstamp                                                       ok
[ 70.97%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    236ms 
                columns   231ms 
                 index    381ms 
               ========= =======

[ 74.19%] ··· Running io.json.ToJSON.time_delta_int_tstamp_lines                                                 ok
[ 74.19%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    541ms 
                columns   541ms 
                 index    539ms 
               ========= =======

[ 77.42%] ··· Running io.json.ToJSON.time_float_int                                                              ok
[ 77.42%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    220ms 
                columns   213ms 
                 index    370ms 
               ========= =======

[ 80.65%] ··· Running io.json.ToJSON.time_float_int_lines                                                        ok
[ 80.65%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    577ms 
                columns   569ms 
                 index    567ms 
               ========= =======

[ 83.87%] ··· Running io.json.ToJSON.time_float_int_str                                                          ok
[ 83.87%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    237ms 
                columns   214ms 
                 index    387ms 
               ========= =======

[ 87.10%] ··· Running io.json.ToJSON.time_float_int_str_lines                                                    ok
[ 87.10%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    595ms 
                columns   598ms 
                 index    600ms 
               ========= =======

[ 90.32%] ··· Running io.json.ToJSON.time_floats_with_dt_index                                                   ok
[ 90.32%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    181ms 
                columns   206ms 
                 index    208ms 
               ========= =======

[ 93.55%] ··· Running io.json.ToJSON.time_floats_with_dt_index_lines                                             ok
[ 93.55%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    446ms 
                columns   447ms 
                 index    444ms 
               ========= =======

[ 96.77%] ··· Running io.json.ToJSON.time_floats_with_int_idex_lines                                             ok
[ 96.77%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    439ms 
                columns   439ms 
                 index    440ms 
               ========= =======

[100.00%] ··· Running io.json.ToJSON.time_floats_with_int_index                                                  ok
[100.00%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    160ms 
                columns   173ms 
                 index    183ms 
               ========= =======

@jreback jreback added this to the 0.22.0 milestone Dec 19, 2017
@jreback jreback merged commit b2a02bd into pandas-dev:master Dec 19, 2017
@jreback
Copy link
Contributor

jreback commented Dec 19, 2017

thanks @mroeschke

ideally would consolidate the rest of the io benchmarks in io/

@mroeschke mroeschke deleted the asv_clean_io_bench branch December 19, 2017 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmark Performance (ASV) benchmarks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants