Skip to content

EHN: to_csv compression accepts file-like object #21249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 30, 2018

Conversation

minggli
Copy link
Contributor

@minggli minggli commented May 29, 2018

Handle an unsupported case when a file-like object instead of path passed into to_csv with compression. According to documentation, compression keyword requires it to be a filename.

At the moment, when a handle is passed, it appears to be uncompressed.

Tentative enhancement.

@minggli minggli force-pushed the enhancement/fh_compression branch from bd8977d to 00712b5 Compare May 29, 2018 17:29
@codecov
Copy link

codecov bot commented May 29, 2018

Codecov Report

Merging #21249 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21249      +/-   ##
==========================================
+ Coverage   91.84%   91.84%   +<.01%     
==========================================
  Files         153      153              
  Lines       49538    49540       +2     
==========================================
+ Hits        45499    45501       +2     
  Misses       4039     4039
Flag Coverage Δ
#multiple 90.24% <100%> (ø) ⬆️
#single 41.87% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/frame.py 97.22% <ø> (ø) ⬆️
pandas/core/series.py 94.12% <ø> (ø) ⬆️
pandas/io/formats/csvs.py 98.13% <100%> (ø) ⬆️
pandas/util/_decorators.py 82.25% <0%> (ø) ⬆️
pandas/tseries/offsets.py 97% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c85ab08...8d0c45b. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks reasonable, can you add a whatsnew note

columns=['X', 'Y', 'Z']),
Series(100 * [0.123456, 0.234567, 0.567567], name='X')])
@pytest.mark.parametrize('method', ['to_csv'])
def test_compression_size_fh(obj, method, compression):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make a new fixture alongside the existing to just have the compression options (exclude None), e.g. maybe compression_only (and can change the above test to use as well)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah sure.

with tm.ensure_clean() as filename:
with open(filename, 'w') as fh:
getattr(obj, method)(fh, compression=compression)
compressed = os.path.getsize(filename)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you asssert that the fh is still open (inside the with block)?

Copy link
Contributor Author

@minggli minggli May 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 not sure how to do it here. with compression, the original handle is first written and closed and then compressed handle is created in the same name with entirety of strings of original handle enclosed. so assert fh.closed will pass inside the with block at the moment.

@jreback jreback added Output-Formatting __repr__ of pandas objects, to_string IO CSV read_csv, to_csv labels May 29, 2018
@jreback jreback added this to the 0.24.0 milestone May 30, 2018
@jreback jreback merged commit bc9241d into pandas-dev:master May 30, 2018
@jreback
Copy link
Contributor

jreback commented May 30, 2018

thanks @minggli nice patch!

note - we might be able to use this compression_only fixture in some of the parser compression tests (eg. where we would skip on None)

@minggli minggli deleted the enhancement/fh_compression branch May 31, 2018 20:06
david-liu-brattle-1 pushed a commit to david-liu-brattle-1/pandas that referenced this pull request Jun 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

df.to_csv ignores compression when provided with a file handle
2 participants