Skip to content

added a compression argument to to_csv to be sent to _get_handle #2636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

yoavram
Copy link

@yoavram yoavram commented Jan 4, 2013

Writing csv files with gzip compression is very useful as it both minimized the disk size taken by large data files, and can be easily read by R, often faster than the same csv file without compression. 
Reading compressed csv files in R is done without any additional input by the user, and is also already supported in Pandas

Writing `csv` with `gzip` is very useful as it both minimized the disk size taken by large data files,
and can be often read by *R* faster than the same `csv` file without compression. 
Also, reading by *R* is done without any effort by the user, and is also already supported in *Pandas*
@wesm
Copy link
Member

wesm commented Jan 19, 2013

Needs a test case

@yoavram
Copy link
Author

yoavram commented Jan 20, 2013

Can you suggest where a test case should be written?

@ghost
Copy link

ghost commented Jan 20, 2013

pandas/tests/test_frame.py has all the test_to_csv_* tests.

@ghost
Copy link

ghost commented Feb 4, 2013

I intended to write a roundtrip test, but from_csv doesn't support compression
despite it's hiding down the call stack.
Should probably keep things consistent and update from_csv the same way.

@yoavram
Copy link
Author

yoavram commented Feb 4, 2013

OK. This is on my task list, but unfortunately not high enough...
Hopefully I will get to it soon.

@ghost
Copy link

ghost commented Jul 29, 2013

MIA.

@ghost ghost closed this Jul 29, 2013
@yoavram
Copy link
Author

yoavram commented Oct 2, 2015

@y-p I'd like to reopen this and complete the PR.
I notice that read_csv quietly reads gzipped csv files, so if I understand correctly, I need to add a test for both to_csv using a roundtrip with read_csv in pandas/tests/test_frame.py.
Let me know if this is true and if this is still relevant.
Also - should I do anything because so much time passed, like rebase or just fork again and apply my minor changes, or would this be handled by you during merging?

@jreback
Copy link
Contributor

jreback commented Oct 2, 2015

you should rebase on master and open a new PR
quite a lot has changed since this issue came up

contributing docs are here

and this issue is #7615

@jreback jreback modified the milestones: 0.17.1, Someday Oct 12, 2015
@jreback jreback added Enhancement IO CSV read_csv, to_csv labels Oct 12, 2015
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants