Skip to content

ENH Enable bzip2 streaming for Python 3 #11072

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 13, 2015

Conversation

stephen-hoover
Copy link
Contributor

This is the one modification related to issue #11070 which affects non-S3 interactions with read_csv. The Python 3 standard library has an improved capability for handling bz2 compression, so a simple change will let read_csv stream bz2-compressed files.

@jreback
Copy link
Contributor

jreback commented Sep 12, 2015

tests!

@stephen-hoover
Copy link
Contributor Author

I added a test for reading from an open file with the C parser. It fails on the master branch and passes here. How's that?

@jreback
Copy link
Contributor

jreback commented Sep 12, 2015

do you have exactly the same deps

@stephen-hoover
Copy link
Contributor Author

Yes, exactly the same dependencies. This PR works because the standard library bz2 module was upgraded to accept file pointers in 3.3.

@jreback jreback added IO Data IO issues that don't fit into a more specific label IO CSV read_csv, to_csv labels Sep 12, 2015
@jreback
Copy link
Contributor

jreback commented Sep 12, 2015

ok, this looks good. pls add a note in whatsnew for 0.17.0 (just released the rc1 yesterday, but this is ok). reference both the original issue and this PR number I think.

squash & ping when green.

@jreback jreback added this to the 0.17.0 milestone Sep 12, 2015
@stephen-hoover
Copy link
Contributor Author

Note added. It doesn't look like anything else references a PR; should I leave that reference in?

@@ -465,6 +465,8 @@ Other enhancements

- Improved error message when concatenating an empty iterable of dataframes (:issue:`9157`)

- ``pd.read_csv`` can now read bz2-compressed files incrementally, and the C parser can read bz2-compressed files from AWS S3 (:issue:`110701`, :pr:`11072`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just reference it like an issue :issue:11072``, we don't distinguish

Python 2 can't read bz2 files, but Python 3 can. Python 3 can also read bzip files one piece at a time.
@stephen-hoover
Copy link
Contributor Author

@jreback , tests are green!

jreback added a commit that referenced this pull request Sep 13, 2015
ENH Enable bzip2 streaming for Python 3
@jreback jreback merged commit e8d4243 into pandas-dev:master Sep 13, 2015
@jreback
Copy link
Contributor

jreback commented Sep 13, 2015

thanks!

@stephen-hoover stephen-hoover deleted the stream-bzip2-files branch September 14, 2015 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants