Skip to content

Implement helper method to get char* buffer from Python objects #25895

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Mar 29, 2019

Conversation

vnlitvinov
Copy link
Contributor

@vnlitvinov vnlitvinov commented Mar 27, 2019

  • closes N/A
  • tests added - test_parsers_iso8601_leading_space
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

This is a follow-up PR for #25754 which adds a utility method for getting internal char * buffer from unicode or bytes Python object and switch at least some of usages to it.

Its advantages over what is built-in in Cython is saving at least one extra memory allocation (Cython internally calls Python C API that creates a new char * copy which is not needed for most unicode objects which are internally stored in utf8 encoding), and it also obtains the length in one call.

cc @jbrockmendel

@vnlitvinov
Copy link
Contributor Author

Note for reviewers: errors in CI seem to be caused by #25875

@codecov
Copy link

codecov bot commented Mar 27, 2019

Codecov Report

Merging #25895 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #25895   +/-   ##
=======================================
  Coverage   91.47%   91.47%           
=======================================
  Files         175      175           
  Lines       52863    52863           
=======================================
  Hits        48357    48357           
  Misses       4506     4506
Flag Coverage Δ
#multiple 90.04% <ø> (ø) ⬆️
#single 41.8% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ac318d2...4d1916c. Read the comment docs.

@codecov
Copy link

codecov bot commented Mar 27, 2019

Codecov Report

Merging #25895 into master will increase coverage by 0.23%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25895      +/-   ##
==========================================
+ Coverage   91.53%   91.77%   +0.23%     
==========================================
  Files         175      175              
  Lines       52808    52606     -202     
==========================================
- Hits        48338    48277      -61     
+ Misses       4470     4329     -141
Flag Coverage Δ
#multiple 90.32% <ø> (+0.22%) ⬆️
#single 41.9% <ø> (+0.07%) ⬆️
Impacted Files Coverage Δ
pandas/compat/pickle_compat.py 69.51% <0%> (-6.1%) ⬇️
pandas/plotting/_compat.py 83.33% <0%> (-3.34%) ⬇️
pandas/compat/numpy/__init__.py 93.1% <0%> (-0.23%) ⬇️
pandas/io/sas/sas_xport.py 90.09% <0%> (-0.05%) ⬇️
pandas/io/packers.py 88.08% <0%> (-0.04%) ⬇️
pandas/core/groupby/generic.py 87.03% <0%> (-0.02%) ⬇️
pandas/io/formats/csvs.py 98.2% <0%> (-0.02%) ⬇️
pandas/core/arrays/datetimes.py 97.79% <0%> (-0.01%) ⬇️
pandas/io/formats/format.py 97.99% <0%> (ø) ⬆️
pandas/core/arrays/sparse.py 92.17% <0%> (ø) ⬆️
... and 20 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 882961d...f5f8e29. Read the comment docs.

@gfyoung gfyoung added the Internals Related to non-user accessible pandas implementation label Mar 27, 2019
@vnlitvinov
Copy link
Contributor Author

Fun fact - it speeds up iso8601 parsing noticeably:

asv continuous -f 1.05 882961df 1db613e6 -e -b timeseries.ToDatetimeISO8601 -a sample_time=1 -a warmup_time=1

before after ratio test name
[882961d] [1db613e]
master get_string_data_pr
2.78±0.02ms 2.44±0.02ms 0.87 timeseries.ToDatetimeISO8601.time_iso8601_nosep
2.85±0.03ms 2.47±0.03ms 0.87 timeseries.ToDatetimeISO8601.time_iso8601_format
2.81±0.01ms 2.43±0.01ms 0.87 timeseries.ToDatetimeISO8601.time_iso8601_format_no_sep
2.84±0.03ms 2.44±0.03ms 0.86 timeseries.ToDatetimeISO8601.time_iso8601

@jreback jreback added this to the 0.25.0 milestone Mar 28, 2019
@jreback
Copy link
Contributor

jreback commented Mar 28, 2019

lgtm. @jbrockmendel ?

@jreback
Copy link
Contributor

jreback commented Mar 29, 2019

lgtm. @jbrockmendel merge when ready.

@jbrockmendel jbrockmendel merged commit 1d4c89f into pandas-dev:master Mar 29, 2019
@jbrockmendel
Copy link
Member

Thanks @vnlitvin

@vnlitvinov vnlitvinov deleted the get_string_data_pr branch April 1, 2019 14:31
anmyachev pushed a commit to anmyachev/pandas that referenced this pull request Apr 18, 2019
…as-dev#25895)

* removed extra layer; using get_string_data now

* fix problem with const char* value, that return PyUnicode_AsUTF8AndSize in Python3.7 case; added docstring to get_string_data func

* fix code style

* replaced get_c_string to get_string_data, added 'note' paragraph in get_string_data docstring

* Re-instate raising TypeError when trying to get string data of non-string object

* test case for overflow in parse_iso_8601_datetime

* change get_string_data signature to more pythonic

* Added test for parsing leading spaces

* Rework get_string_data to cleaner get_c_string_buf_and_size

* Fix Python 3.7 compilation

* added comment for test; changed name variable: s -> py_string
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants