-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Add option in read_csv to infer compression type from filename #9770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I think it would even be fine even to change the default here. We are not that strict about backwards compatibility in pandas -- any users who relied on the previous behavior were basically relying on a bug. |
I agree with @shoyer here, let's just have it infer these filename endings as compression (move the release note to the API section). |
fe09884
to
48fd726
Compare
I've totally borked this branch with an accidental force push. I'll fix it tonight. |
fe09884
to
7fe1c69
Compare
Should be fixed. |
compression : {'gzip', 'bz2', 'infer', None}, default 'infer' | ||
For on-the-fly decompression of on-disk data. If 'infer', then use gzip or | ||
bz2 if filepath_or_buffer is a string ending in '.gz' or '.bz2', | ||
respectively, and no decompression otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add here what None does?
couple of minor comments, pls rebase, and ping when green |
7fe1c69
to
6cb41c6
Compare
Docs are fixed, rebased/squashed, and tests are green. |
Looks great to me! |
ENH: Add option in read_csv to infer compression type from filename
@evanpw thanks! |
Thank you! |
Ideally, I would love for this to be the default, but that wouldn't be backwards-compatible in the case where the filename ends in '.gz' or '.bz2' and you want to treat it as uncompressed. That seems like it would be very rare, though.