Skip to content

CLN: writers.pyx cdef cleanup #23880

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 25, 2018
Merged

Conversation

mroeschke
Copy link
Member

  • Changed some int to Py_ssize_t
  • Remove/add some unused/uncdef'd variables

@WillAyd
Copy link
Member

WillAyd commented Nov 23, 2018

What factors are you using to determine int vs Py_ssize_t?

ndarray[uint8_t] narr
unsigned char v, comma, left_bracket, right_brack, newline
ndarray[uint8_t, ndim=1] narr
unsigned char v, newline, comma, left_bracket, right_bracket, quote
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Only nitpick while we're here: can you change v to val just to avoid 1-character names

@codecov
Copy link

codecov bot commented Nov 24, 2018

Codecov Report

Merging #23880 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #23880   +/-   ##
=======================================
  Coverage   92.29%   92.29%           
=======================================
  Files         161      161           
  Lines       51498    51498           
=======================================
  Hits        47530    47530           
  Misses       3968     3968
Flag Coverage Δ
#multiple 90.69% <ø> (ø) ⬆️
#single 42.43% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/reshape/pivot.py 96.55% <0%> (ø) ⬆️
pandas/io/json/normalize.py 96.87% <0%> (ø) ⬆️
pandas/io/packers.py 88.08% <0%> (ø) ⬆️
pandas/core/util/hashing.py 98.4% <0%> (ø) ⬆️
pandas/core/algorithms.py 95.11% <0%> (ø) ⬆️
pandas/util/_decorators.py 91.34% <0%> (ø) ⬆️
pandas/tseries/offsets.py 96.98% <0%> (ø) ⬆️
pandas/core/indexes/datetimes.py 96.2% <0%> (ø) ⬆️
pandas/core/indexes/multi.py 95.49% <0%> (ø) ⬆️
pandas/plotting/_core.py 83.63% <0%> (ø) ⬆️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 94ce05d...b4beaed. Read the comment docs.

@gfyoung
Copy link
Member

gfyoung commented Nov 24, 2018

What factors are you using to determine int vs Py_ssize_t?

To add onto @WillAyd point, asv results (or some kind of benchmark) would be good to see here, to make sure the change doesn't somehow impact performance.

@jbrockmendel
Copy link
Member

@WillAyd py_ssize_t is unsigned, is the suggested usage for length-like variables.

@WillAyd
Copy link
Member

WillAyd commented Nov 24, 2018

@jbrockmendel it's size_t that is unsigned and Py_ssize_t is signed, no?

https://www.python.org/dev/peps/pep-0353/

IIUC helps ensure cross-platform indexing can go beyond 2**31 that compilers may use for an int even on 64 bit platforms. That makes sense to me though I'd just be curious then if there's a reason we would choose this over say a uint64_t

@mroeschke
Copy link
Member Author

The train of thought was, as @jbrockmendel mentioned, to use Py_ssize_t as the preferred type for indexing and length like variables. https://stackoverflow.com/a/20987501

The pep mentiones Py_ssize_t being signed but typedef'd for ssize_t where available.

No performance regressions when running the asvs:

$ asv continuous -f 1.1 upstream/master HEAD -b ^io
· Running 132 total benchmarks (2 commits * 1 environments * 66 benchmarks)
[  0.00%] · For pandas commit hash 216631e3:
[  0.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[  0.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[  0.00%] ··· Running benchmarks........................................
[  0.76%] ··· io.json.ReadJSONLines.peakmem_read_json_lines             178M;...
[  1.52%] ··· ...ReadJSONLines.peakmem_read_json_lines_concat           142M;...
[  1.52%] ··· Running benchmarks............................
[  2.27%] ··· io.csv.ReadCSVCategorical.time_convert_direct             40.1±1ms
[  3.03%] ··· io.csv.ReadCSVCategorical.time_convert_post             61.2±0.3ms
[  3.79%] ··· io.csv.ReadCSVComment.time_comment                      28.9±0.4ms
[  4.55%] ··· ...sv.ReadCSVDInferDatetimeFormat.time_read_csv    5.40±0.06ms;...
[  5.30%] ··· io.csv.ReadCSVFloatPrecision.time_read_csv         2.23±0.05ms;...
[  6.06%] ··· ...SVFloatPrecision.time_read_csv_python_engine    3.75±0.04ms;...
[  6.82%] ··· io.csv.ReadCSVParseDates.time_baseline                    1.70±0ms
[  7.58%] ··· io.csv.ReadCSVParseDates.time_multiple_date            2.02±0.02ms
[  8.33%] ··· io.csv.ReadCSVSkipRows.time_skipprows              21.7±0.06ms;...
[  9.09%] ··· io.csv.ReadCSVThousands.time_thousands             19.3±0.06ms;...
[  9.85%] ··· io.csv.ReadUint64Integers.time_read_uint64             3.29±0.02ms
[ 10.61%] ··· ...eadUint64Integers.time_read_uint64_na_values        5.75±0.02ms
[ 11.36%] ··· ...adUint64Integers.time_read_uint64_neg_values        5.71±0.02ms
[ 12.12%] ··· io.csv.ToCSV.time_frame                              111±0.4ms;...
[ 12.88%] ··· io.csv.ToCSVDatetime.time_frame_date_formatting         11.3±0.1ms
[ 13.64%] ··· io.excel.Excel.time_read_excel                        88.7±1ms;...
[ 14.39%] ··· io.excel.Excel.time_write_excel                        386±3ms;...
[ 15.15%] ··· io.hdf.HDF.time_read_hdf                             139±0.3ms;...
[ 15.91%] ··· io.hdf.HDF.time_write_hdf                              153±1ms;...
[ 16.67%] ··· io.hdf.HDFStoreDataFrame.time_query_store_table        9.89±0.03ms
[ 17.42%] ··· ...DFStoreDataFrame.time_query_store_table_wide        18.9±0.03ms
[ 18.18%] ··· io.hdf.HDFStoreDataFrame.time_read_store                29.9±0.3ms
[ 18.94%] ··· io.hdf.HDFStoreDataFrame.time_read_store_mixed          38.7±0.5ms
[ 19.70%] ··· io.hdf.HDFStoreDataFrame.time_read_store_table         5.88±0.02ms
[ 20.45%] ··· ...DFStoreDataFrame.time_read_store_table_mixed         76.5±0.8ms
[ 21.21%] ··· ...HDFStoreDataFrame.time_read_store_table_wide         23.7±0.2ms
[ 21.97%] ··· io.hdf.HDFStoreDataFrame.time_store_info               8.77±0.04ms
[ 22.73%] ··· io.hdf.HDFStoreDataFrame.time_store_repr               6.32±0.02μs
[ 23.48%] ··· io.hdf.HDFStoreDataFrame.time_store_str                 6.08±0.1μs
[ 24.24%] ··· io.hdf.HDFStoreDataFrame.time_write_store               27.3±0.1ms
[ 25.00%] ··· io.hdf.HDFStoreDataFrame.time_write_store_mixed         43.0±0.3ms
[ 25.76%] ··· io.hdf.HDFStoreDataFrame.time_write_store_table         58.6±0.6ms
[ 26.52%] ··· ....HDFStoreDataFrame.time_write_store_table_dc            318±1ms
[ 27.27%] ··· ...FStoreDataFrame.time_write_store_table_mixed         81.5±0.4ms
[ 28.03%] ··· ...DFStoreDataFrame.time_write_store_table_wide         75.2±0.2ms
[ 28.79%] ··· ...df.HDFStorePanel.time_read_store_table_panel         51.5±0.8ms
[ 29.55%] ··· ...f.HDFStorePanel.time_write_store_table_panel         64.7±0.6ms
[ 30.30%] ··· io.json.ReadJSON.time_read_json                     82.4±0.3ms;...
[ 31.06%] ··· io.json.ReadJSONLines.time_read_json_lines             407±3ms;...
[ 31.82%] ··· ...on.ReadJSONLines.time_read_json_lines_concat        416±2ms;...
[ 32.58%] ··· io.json.ToJSON.time_delta_int_tstamp                80.9±0.9ms;...
[ 33.33%] ··· io.json.ToJSON.time_delta_int_tstamp_lines             253±5ms;...
[ 34.09%] ··· io.json.ToJSON.time_float_int                       75.9±0.6ms;...
[ 34.85%] ··· io.json.ToJSON.time_float_int_lines                  254±0.9ms;...
[ 35.61%] ··· io.json.ToJSON.time_float_int_str                     79.6±1ms;...
[ 36.36%] ··· io.json.ToJSON.time_float_int_str_lines                264±1ms;...
[ 37.12%] ··· io.json.ToJSON.time_floats_with_dt_index              66.7±1ms;...
[ 37.88%] ··· io.json.ToJSON.time_floats_with_dt_index_lines       198±0.7ms;...
[ 38.64%] ··· io.json.ToJSON.time_floats_with_int_idex_lines         202±2ms;...
[ 39.39%] ··· io.json.ToJSON.time_floats_with_int_index           61.8±0.4ms;...
[ 40.15%] ··· io.msgpack.MSGPack.time_read_msgpack                    31.1±0.3ms
[ 40.91%] ··· io.msgpack.MSGPack.time_write_msgpack                   21.3±0.4ms
[ 41.67%] ··· io.pickle.Pickle.time_read_pickle                         15.4±2ms
[ 42.42%] ··· io.pickle.Pickle.time_write_pickle                      15.9±0.4ms
[ 43.18%] ··· io.sas.SAS.time_read_msgpack                        73.5±0.6ms;...
[ 43.94%] ··· io.sql.ReadSQLTable.time_read_sql_table_all             37.1±0.6ms
[ 44.70%] ··· ...ReadSQLTable.time_read_sql_table_parse_dates         13.3±0.3ms
[ 45.45%] ··· ...eadSQLTableDtypes.time_read_sql_table_column     10.2±0.2ms;...
[ 46.21%] ··· io.sql.SQL.time_read_sql_query                      29.3±0.6ms;...
[ 46.97%] ··· io.sql.SQL.time_to_sql_dataframe                      95.9±1ms;...
[ 47.73%] ··· ...eSQLDtypes.time_read_sql_query_select_column    8.07±0.04ms;...
[ 48.48%] ··· ....WriteSQLDtypes.time_to_sql_dataframe_column    36.3±0.08ms;...
[ 49.24%] ··· io.stata.Stata.time_read_stata                       157±0.7ms;...
[ 50.00%] ··· io.stata.Stata.time_write_stata                        387±2ms;...
[ 50.00%] · For pandas commit hash 94ce05d1:
[ 50.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt.......................................
[ 50.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 50.00%] ··· Running benchmarks........................................
[ 50.76%] ··· io.json.ReadJSONLines.peakmem_read_json_lines             179M;...
[ 51.52%] ··· ...ReadJSONLines.peakmem_read_json_lines_concat           142M;...
[ 51.52%] ··· Running benchmarks............................
[ 52.27%] ··· io.csv.ReadCSVCategorical.time_convert_direct             39.9±2ms
[ 53.03%] ··· io.csv.ReadCSVCategorical.time_convert_post             61.4±0.6ms
[ 53.79%] ··· io.csv.ReadCSVComment.time_comment                      29.6±0.1ms
[ 54.55%] ··· ...sv.ReadCSVDInferDatetimeFormat.time_read_csv    5.36±0.08ms;...
[ 55.30%] ··· io.csv.ReadCSVFloatPrecision.time_read_csv         2.26±0.07ms;...
[ 56.06%] ··· ...SVFloatPrecision.time_read_csv_python_engine    3.75±0.04ms;...
[ 56.82%] ··· io.csv.ReadCSVParseDates.time_baseline                 1.71±0.02ms
[ 57.58%] ··· io.csv.ReadCSVParseDates.time_multiple_date            2.01±0.01ms
[ 58.33%] ··· io.csv.ReadCSVSkipRows.time_skipprows               21.7±0.2ms;...
[ 59.09%] ··· io.csv.ReadCSVThousands.time_thousands              19.0±0.3ms;...
[ 59.85%] ··· io.csv.ReadUint64Integers.time_read_uint64             3.29±0.04ms
[ 60.61%] ··· ...eadUint64Integers.time_read_uint64_na_values        5.74±0.01ms
[ 61.36%] ··· ...adUint64Integers.time_read_uint64_neg_values        5.64±0.01ms
[ 62.12%] ··· io.csv.ToCSV.time_frame                              112±0.6ms;...
[ 62.88%] ··· io.csv.ToCSVDatetime.time_frame_date_formatting         11.5±0.1ms
[ 63.64%] ··· io.excel.Excel.time_read_excel                      88.4±0.6ms;...
[ 64.39%] ··· io.excel.Excel.time_write_excel                        386±3ms;...
[ 65.15%] ··· io.hdf.HDF.time_read_hdf                               140±1ms;...
[ 65.91%] ··· io.hdf.HDF.time_write_hdf                            152±0.5ms;...
[ 66.67%] ··· io.hdf.HDFStoreDataFrame.time_query_store_table        9.18±0.04ms
[ 67.42%] ··· ...DFStoreDataFrame.time_query_store_table_wide        19.0±0.03ms
[ 68.18%] ··· io.hdf.HDFStoreDataFrame.time_read_store                30.1±0.2ms
[ 68.94%] ··· io.hdf.HDFStoreDataFrame.time_read_store_mixed          38.8±0.5ms
[ 69.70%] ··· io.hdf.HDFStoreDataFrame.time_read_store_table         6.12±0.02ms
[ 70.45%] ··· ...DFStoreDataFrame.time_read_store_table_mixed         76.0±0.7ms
[ 71.21%] ··· ...HDFStoreDataFrame.time_read_store_table_wide         22.4±0.2ms
[ 71.97%] ··· io.hdf.HDFStoreDataFrame.time_store_info               8.72±0.02ms
[ 72.73%] ··· io.hdf.HDFStoreDataFrame.time_store_repr               6.24±0.08μs
[ 73.48%] ··· io.hdf.HDFStoreDataFrame.time_store_str                5.99±0.05μs
[ 74.24%] ··· io.hdf.HDFStoreDataFrame.time_write_store              27.2±0.08ms
[ 75.00%] ··· io.hdf.HDFStoreDataFrame.time_write_store_mixed         42.9±0.2ms
[ 75.76%] ··· io.hdf.HDFStoreDataFrame.time_write_store_table         58.7±0.3ms
[ 76.52%] ··· ....HDFStoreDataFrame.time_write_store_table_dc            319±2ms
[ 77.27%] ··· ...FStoreDataFrame.time_write_store_table_mixed         82.4±0.3ms
[ 78.03%] ··· ...DFStoreDataFrame.time_write_store_table_wide           76.4±1ms
[ 78.79%] ··· ...df.HDFStorePanel.time_read_store_table_panel         51.7±0.4ms
[ 79.55%] ··· ...f.HDFStorePanel.time_write_store_table_panel         64.3±0.5ms
[ 80.30%] ··· io.json.ReadJSON.time_read_json                     82.2±0.6ms;...
[ 81.06%] ··· io.json.ReadJSONLines.time_read_json_lines             407±1ms;...
[ 81.82%] ··· ...on.ReadJSONLines.time_read_json_lines_concat        416±2ms;...
[ 82.58%] ··· io.json.ToJSON.time_delta_int_tstamp                82.8±0.5ms;...
[ 83.33%] ··· io.json.ToJSON.time_delta_int_tstamp_lines             256±2ms;...
[ 84.09%] ··· io.json.ToJSON.time_float_int                         76.4±1ms;...
[ 84.85%] ··· io.json.ToJSON.time_float_int_lines                    253±2ms;...
[ 85.61%] ··· io.json.ToJSON.time_float_int_str                   79.4±0.8ms;...
[ 86.36%] ··· io.json.ToJSON.time_float_int_str_lines              263±0.7ms;...
[ 87.12%] ··· io.json.ToJSON.time_floats_with_dt_index            66.7±0.3ms;...
[ 87.88%] ··· io.json.ToJSON.time_floats_with_dt_index_lines         202±3ms;...
[ 88.64%] ··· io.json.ToJSON.time_floats_with_int_idex_lines         199±1ms;...
[ 89.39%] ··· io.json.ToJSON.time_floats_with_int_index           60.4±0.3ms;...
[ 90.15%] ··· io.msgpack.MSGPack.time_read_msgpack                    31.0±0.3ms
[ 90.91%] ··· io.msgpack.MSGPack.time_write_msgpack                   21.8±0.2ms
[ 91.67%] ··· io.pickle.Pickle.time_read_pickle                         15.2±2ms
[ 92.42%] ··· io.pickle.Pickle.time_write_pickle                      15.8±0.1ms
[ 93.18%] ··· io.sas.SAS.time_read_msgpack                        73.0±0.5ms;...
[ 93.94%] ··· io.sql.ReadSQLTable.time_read_sql_table_all             36.6±0.1ms
[ 94.70%] ··· ...ReadSQLTable.time_read_sql_table_parse_dates        13.0±0.02ms
[ 95.45%] ··· ...eadSQLTableDtypes.time_read_sql_table_column   10.00±0.02ms;...
[ 96.21%] ··· io.sql.SQL.time_read_sql_query                     28.6±0.05ms;...
[ 96.97%] ··· io.sql.SQL.time_to_sql_dataframe                    94.9±0.6ms;...
[ 97.73%] ··· ...eSQLDtypes.time_read_sql_query_select_column    8.02±0.02ms;...
[ 98.48%] ··· ....WriteSQLDtypes.time_to_sql_dataframe_column     36.7±0.6ms;...
[ 99.24%] ··· io.stata.Stata.time_read_stata                         157±1ms;...
[100.00%] ··· io.stata.Stata.time_write_stata                        386±2ms;...

BENCHMARKS NOT SIGNIFICANTLY CHANGED.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK by me. Curious if this cleanup can be applied elsewhere in the code base

@@ -23,7 +23,7 @@ ctypedef fused pandas_string:
@cython.boundscheck(False)
@cython.wraparound(False)
def write_csv_rows(list data, ndarray data_index,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we stand to gain anything by being more explicit about the type of the ndarrays here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to find a type for these ndarrays that passes the test suite. Should be okay as is.

@mroeschke mroeschke added this to the 0.24.0 milestone Nov 24, 2018
@jreback jreback merged commit a37b3e2 into pandas-dev:master Nov 25, 2018
@jreback
Copy link
Contributor

jreback commented Nov 25, 2018

thanks @mroeschke

@mroeschke mroeschke deleted the writers_cleanup branch November 25, 2018 17:51
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants