Skip to content

BUG: Fix default encoding for CSVFormatter.save #17821

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Oct 11, 2017

Conversation

Licht-T
Copy link
Contributor

@Licht-T Licht-T commented Oct 9, 2017

@Licht-T Licht-T force-pushed the fix-invalid-encoding-to_csv branch 3 times, most recently from 6e725bc to 8263456 Compare October 9, 2017 01:55
@codecov
Copy link

codecov bot commented Oct 9, 2017

Codecov Report

Merging #17821 into master will decrease coverage by 0.02%.
The diff coverage is 87.5%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17821      +/-   ##
==========================================
- Coverage   91.26%   91.24%   -0.03%     
==========================================
  Files         163      163              
  Lines       49980    49985       +5     
==========================================
- Hits        45613    45607       -6     
- Misses       4367     4378      +11
Flag Coverage Δ
#multiple 89.04% <87.5%> (-0.01%) ⬇️
#single 40.24% <62.5%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/formats/format.py 95.94% <87.5%> (-0.13%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.74% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9f0ee53...8263456. Read the comment docs.

@codecov
Copy link

codecov bot commented Oct 9, 2017

Codecov Report

Merging #17821 into master will decrease coverage by 0.05%.
The diff coverage is 87.5%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17821      +/-   ##
==========================================
- Coverage   91.26%    91.2%   -0.06%     
==========================================
  Files         163      163              
  Lines       49980    50019      +39     
==========================================
+ Hits        45613    45621       +8     
- Misses       4367     4398      +31
Flag Coverage Δ
#multiple 89.01% <87.5%> (-0.04%) ⬇️
#single 40.24% <62.5%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/formats/format.py 95.94% <87.5%> (-0.13%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_compat.py 62% <0%> (-6.19%) ⬇️
pandas/plotting/_converter.py 63.38% <0%> (-1.82%) ⬇️
pandas/core/nanops.py 96.67% <0%> (-0.99%) ⬇️
pandas/core/frame.py 97.77% <0%> (-0.08%) ⬇️
pandas/core/panel.py 96.94% <0%> (ø) ⬆️
pandas/core/sparse/series.py 95.26% <0%> (ø) ⬆️
pandas/tests/indexes/conftest.py
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9f0ee53...c222ee8. Read the comment docs.

writer_kwargs['encoding'] = self.encoding
self.writer = UnicodeWriter(f, **writer_kwargs)
else:
if encoding is 'ascii':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String equality check instead of identity check (i.e. use ==).

df = DataFrame({'col': [u"AAAAA", u"ÄÄÄÄÄ", u"ßßßßß", u"聞聞聞聞聞"]})

with tm.ensure_clean('test.csv') as path:
if pd.compat.PY2:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment about why we're raising an error and what should be done to avoid this error (i.e. the encoding argument parameter should be correct).

@gfyoung gfyoung added Bug IO CSV read_csv, to_csv Unicode Unicode strings labels Oct 9, 2017
@Licht-T Licht-T force-pushed the fix-invalid-encoding-to_csv branch from 8263456 to 0cd7bf7 Compare October 9, 2017 03:49
@Licht-T
Copy link
Contributor Author

Licht-T commented Oct 9, 2017

@gfyoung Thanks for your review. Fixed!

else:
encoding = 'utf-8'
else:
encoding = self.encoding
Copy link
Member

@jschendel jschendel Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is overkill, but you can do encoding = self.encoding or ('ascii' if compat.PY2 else 'utf-8') to condense this whole block of if-else logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, matter of taste here. I prefer the block-logic because its easier to read I find.

@jreback jreback added this to the 0.21.0 milestone Oct 10, 2017
@jreback
Copy link
Contributor

jreback commented Oct 10, 2017

@Licht-T this looks good, can you add a whatsnew note (bug fixes section is ok).

also pls look and see if this change may close any/all of (if it does just indicate and include that issue in the note as well): #13068, #10813, #9712, #10755

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments

@TomAugspurger
Copy link
Contributor

AFAICT, none of those issues were closed by this one.

@Licht-T
Copy link
Contributor Author

Licht-T commented Oct 11, 2017

Sorry, I also added the whatsnew note

@Licht-T
Copy link
Contributor Author

Licht-T commented Oct 11, 2017

@jreback @TomAugspurger This PR does not solve these issue, I think.

@TomAugspurger
Copy link
Contributor

The CI failure here was from the matplotlib 2.1 release. It'll be fixed when the merge commit is tested.

Thanks @Licht-T

@TomAugspurger TomAugspurger merged commit 9772c22 into pandas-dev:master Oct 11, 2017
ghost pushed a commit to reef-technologies/pandas that referenced this pull request Oct 16, 2017
* BUG: Fix default encoding for CSVFormatter.save

* TST: Add to_csv defualt encoding test

* DOC: Add comments on to_csv defualt encoding test

* DOC: added release note

* DOC: Add the fixing to_csv default encoding to whatsnew note

* Revert "DOC: Add the fixing to_csv default encoding to whatsnew note"

This reverts commit 039f2cf.
alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
* BUG: Fix default encoding for CSVFormatter.save

* TST: Add to_csv defualt encoding test

* DOC: Add comments on to_csv defualt encoding test

* DOC: added release note

* DOC: Add the fixing to_csv default encoding to whatsnew note

* Revert "DOC: Add the fixing to_csv default encoding to whatsnew note"

This reverts commit 039f2cf.
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
* BUG: Fix default encoding for CSVFormatter.save

* TST: Add to_csv defualt encoding test

* DOC: Add comments on to_csv defualt encoding test

* DOC: added release note

* DOC: Add the fixing to_csv default encoding to whatsnew note

* Revert "DOC: Add the fixing to_csv default encoding to whatsnew note"

This reverts commit 039f2cf.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv Unicode Unicode strings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: On Python 3 to_csv() encoding defaults to ascii if the dataframe contains special characters.
5 participants