Skip to content

ENH: added compression kw to to_csv GH7615 #11219

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 12, 2015

Conversation

yoavram
Copy link

@yoavram yoavram commented Oct 2, 2015

This closes #7615 and represents work that started in #2636.

@yoavram
Copy link
Author

yoavram commented Oct 2, 2015

I need help from pandas people (@jreback ?)
I get this single error on Travis-CI:

Traceback (most recent call last):

  File "/home/travis/build/pydata/pandas/pandas/tests/test_frame.py", line 7364, in test_to_csv_compression

    df.to_csv(filename, compression="gzip")

  File "/home/travis/build/pydata/pandas/pandas/core/frame.py", line 1294, in to_csv

    formatter.save()

  File "/home/travis/build/pydata/pandas/pandas/core/format.py", line 1460, in save

    self._save()

  File "/home/travis/build/pydata/pandas/pandas/core/format.py", line 1546, in _save

    self._save_header()

  File "/home/travis/build/pydata/pandas/pandas/core/format.py", line 1542, in _save_header

    writer.writerow(encoded_labels)

io.UnsupportedOperation: not writable

I assume "not writable" means that the file I'm trying to write to, which I got using ensure_clean(), is not writable. I don't understand why, I copied the test code from test_to_csv_float_format.

@jreback
Copy link
Contributor

jreback commented Oct 2, 2015

and does it work locally for you?

@yoavram
Copy link
Author

yoavram commented Oct 2, 2015

I couldn't build (python setup.py develop) pandas on my windows pc or on a digitalocean ubuntu droplet - does pandas have a docker image you can recommend?
I'll try again, but please have a look at the travis build - some builds passed, does that make sense?

@jreback
Copy link
Contributor

jreback commented Oct 2, 2015

pls read the contributing docs here

creation of the proper build env is pretty easy using conda (and even local tools on linux, but windows we recommend conda as its dead simple). docker is much too heavyweight for most things iMHO.

pls build and debug locally first.

@yoavram
Copy link
Author

yoavram commented Oct 3, 2015

I had problems building pandas (python setup.py develop), but I managed to locally build pandas, see below.

Test problem and solution
The test error was caused because the compressed file was opened as read-only. I had to change the mode argument used to create compressed file handles in pandas.core.common._get_handle(). The function has a mode arg, but it was overriding it with rb for compressed files.
Anyway, thanks for helping me out with this!!

Build problem and solution
Building on my Windows laptop, I had other versions of gcc.exe installed. Calling which gcc should return C:\Anaconda\envs\pandas_dev\MinGW\bin\gcc.exe (depending on the path to Anaconda).
On my system I had to manually add C:\Anaconda\envs\pandas_dev\MinGW\bin to the begining of PATH:

set PATH=C:\Anaconda\envs\pandas_dev\MinGW\bin;%PATH%

and if you already tried python setup.py develop once, then you probably need to do python setup.py clean before trying again.

@jreback jreback added Enhancement IO CSV read_csv, to_csv labels Oct 3, 2015
@jreback jreback added this to the 0.17.1 milestone Oct 5, 2015
@@ -1431,7 +1432,7 @@ def save(self):
close = False
else:
f = com._get_handle(self.path_or_buf, self.mode,
encoding=self.encoding)
encoding=self.encoding, compression=self.compression)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add this on the next line

@jreback
Copy link
Contributor

jreback commented Oct 5, 2015

pls add a release note in 0.17.1 (enhancements section), and squash.

@yoavram
Copy link
Author

yoavram commented Oct 5, 2015

Sure thing. I've never 'squashed' before - any comments/instructions beyond the contributing.html doc?

@jreback
Copy link
Contributor

jreback commented Oct 5, 2015

nope, that says it all

@yoavram yoavram force-pushed the issue7615 branch 2 times, most recently from 0c78d96 to 9bd5d24 Compare October 8, 2015 13:08
@yoavram
Copy link
Author

yoavram commented Oct 8, 2015

Fixed following code review (d3f63d8).
Also added some test code to explicitly make sure that the file was compressed, rather than just using the read_csv, in case to_csv and read_csv both "cheat" but in the same way (such as ignoring the compression flag).

@yoavram yoavram force-pushed the issue7615 branch 2 times, most recently from 7193924 to 19e97fb Compare October 9, 2015 11:54
@yoavram
Copy link
Author

yoavram commented Oct 9, 2015

OK this was fun, I learned a lot. Thanks.

@jreback
Copy link
Contributor

jreback commented Oct 9, 2015

looks good. pls add a whatsnew note in enhancements for 0.17.1

@yoavram
Copy link
Author

yoavram commented Oct 9, 2015

Right. I did it but it got lost when I squashed. I'll do it again and squash properly.

@@ -2847,11 +2847,11 @@ def _get_handle(path, mode, encoding=None, compression=None):

if compression == 'gzip':
import gzip
f = gzip.GzipFile(path, 'rb')
f = gzip.GzipFile(path, mode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this defaults to rb or wb as appropriate, but should this be passed by the caller?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as I cant tell, if we don't pass mode then it will default to rb and then to_csv will fail.

@@ -7328,6 +7328,52 @@ def test_to_csv_path_is_none(self):
recons = pd.read_csv(StringIO(csv_str), index_col=0)
assert_frame_equal(self.frame, recons)

def test_to_csv_compression_gzip(self):
## GH7615
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test to ensure that the ValueError is raised when passing an invalid string as compression

@yoavram
Copy link
Author

yoavram commented Oct 12, 2015

Anything else comes to mind?

jreback added a commit that referenced this pull request Oct 12, 2015
ENH: added compression kw to to_csv GH7615
@jreback jreback merged commit c2aa6a2 into pandas-dev:master Oct 12, 2015
@jreback
Copy link
Contributor

jreback commented Oct 12, 2015

thank you sir!

@yoavram
Copy link
Author

yoavram commented Oct 12, 2015

Thank you, I've enjoyed the process and learned much.

@yoavram yoavram deleted the issue7615 branch April 23, 2016 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: DataFrame.to_csv support for "compression='gzip'"
2 participants