Skip to content

ENH: Add option in read_csv to infer compression type from filename #9770

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 18, 2015

Conversation

evanpw
Copy link
Contributor

@evanpw evanpw commented Apr 1, 2015

Ideally, I would love for this to be the default, but that wouldn't be backwards-compatible in the case where the filename ends in '.gz' or '.bz2' and you want to treat it as uncompressed. That seems like it would be very rare, though.

@shoyer
Copy link
Member

shoyer commented Apr 1, 2015

I think it would even be fine even to change the default here. We are not that strict about backwards compatibility in pandas -- any users who relied on the previous behavior were basically relying on a bug.

@jreback
Copy link
Contributor

jreback commented Apr 2, 2015

I agree with @shoyer here, let's just have it infer these filename endings as compression (move the release note to the API section).

@jreback jreback added API Design IO CSV read_csv, to_csv labels Apr 2, 2015
@jreback jreback added this to the 0.16.1 milestone Apr 2, 2015
@evanpw evanpw force-pushed the infer_compression branch from fe09884 to 48fd726 Compare April 9, 2015 15:51
@evanpw
Copy link
Contributor Author

evanpw commented Apr 9, 2015

I've totally borked this branch with an accidental force push. I'll fix it tonight.

@evanpw evanpw force-pushed the infer_compression branch from fe09884 to 7fe1c69 Compare April 10, 2015 03:18
@evanpw
Copy link
Contributor Author

evanpw commented Apr 10, 2015

Should be fixed.

compression : {'gzip', 'bz2', 'infer', None}, default 'infer'
For on-the-fly decompression of on-disk data. If 'infer', then use gzip or
bz2 if filepath_or_buffer is a string ending in '.gz' or '.bz2',
respectively, and no decompression otherwise.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add here what None does?

@jreback
Copy link
Contributor

jreback commented Apr 17, 2015

couple of minor comments, pls rebase, and ping when green

@evanpw evanpw force-pushed the infer_compression branch from 7fe1c69 to 6cb41c6 Compare April 17, 2015 14:03
@evanpw
Copy link
Contributor Author

evanpw commented Apr 17, 2015

Docs are fixed, rebased/squashed, and tests are green.

@jreback
Copy link
Contributor

jreback commented Apr 17, 2015

lgtm

@shoyer @jorisvandenbossche

@shoyer
Copy link
Member

shoyer commented Apr 17, 2015

Looks great to me!

jreback added a commit that referenced this pull request Apr 18, 2015
ENH: Add option in read_csv to infer compression type from filename
@jreback jreback merged commit 529cd3d into pandas-dev:master Apr 18, 2015
@jreback
Copy link
Contributor

jreback commented Apr 18, 2015

@evanpw thanks!

@evanpw
Copy link
Contributor Author

evanpw commented Apr 18, 2015

Thank you!

@evanpw evanpw deleted the infer_compression branch April 18, 2015 03:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants