Fix df.to_csv() for string arrays when encoded in utf-8 #18013

rtkaleta · 2017-10-28T18:06:33Z

So it looks like df.to_csv() is now working correctly for string arrays when using the ascii encoding but it is still broken when using utf-8.

jreback · 2017-10-31T00:59:43Z

pandas/tests/io/formats/test_to_csv.py

+        str_array = [{'names': ['foo', 'bar']}, {'names': ['baz', 'qux']}]
+        df = pd.DataFrame(str_array)
+        expected_ascii = '''\
+,names


so if you make this 2 test functions, then you can xfail the non-working one (to at least get things passing)

rtkaleta · 2017-11-05T14:35:11Z

@jreback It seems the current behaviour stems from the fact that pandas' own UnicodeWriter calls pprint_thing without quote_strings=True so then:

>>> from pandas.io.formats.printing import pprint_thing
>>> pprint_thing([u'foo', u'bar'])
u'[foo, bar]'

instead of the more intuitive (at least to me):

>>> pprint_thing([u'foo', u'bar'], quote_strings=True)
u"[u'foo', u'bar']"

A couple of questions come to mind:

Can you recall why we have our own UnicodeWriter here instead of e.g. unicodecsv.writer?
Better to expose the quote_strings (or similar) parameter to the to_csv caller, or is the behaviour I expect so ubiquitously intuitive that we should be changing things under the hood?

codecov · 2017-11-05T15:13:12Z

Codecov Report

Merging #18013 into master will decrease coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #18013      +/-   ##
==========================================
- Coverage   91.24%   91.24%   -0.01%     
==========================================
  Files         163      163              
  Lines       50176    50124      -52     
==========================================
- Hits        45785    45734      -51     
+ Misses       4391     4390       -1

Flag	Coverage Δ
#multiple	`89.05% <ø> (ø)`	⬆️
#single	`40.32% <ø> (-0.02%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/tseries/frequencies.py	`96% <0%> (-0.11%)`	⬇️
pandas/core/frame.py	`97.75% <0%> (-0.1%)`	⬇️
pandas/io/excel.py	`80.39% <0%> (-0.01%)`	⬇️
pandas/io/stata.py	`93.7% <0%> (-0.01%)`	⬇️
pandas/tseries/offsets.py	`97.15% <0%> (-0.01%)`	⬇️
pandas/io/sas/sas_xport.py	`90.27% <0%> (ø)`	⬆️
pandas/core/reshape/merge.py	`94.26% <0%> (ø)`	⬆️
pandas/tslib.py	`100% <0%> (ø)`	⬆️
pandas/plotting/_core.py	`82.45% <0%> (ø)`	⬆️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 39a6b8f...86a3a1f. Read the comment docs.

jreback · 2017-11-06T13:47:19Z

will have a look

jreback · 2017-11-07T18:10:18Z

thanks @rtkaleta

love for you to take a stab at fixing the xfailed unicode case!

…8013)

to_csv now working for string arrays using ascii, still broken for utf-8

dedfb2a

rtkaleta mentioned this pull request Oct 28, 2017

to_csv with lists of strings and unicode encoding produces wrong output #10813

Closed

rtkaleta changed the title ~~to_csv now working for string arrays using ascii, still broken for utf-8~~ Fix df.to_csv() for string arrays when encoded in utf-8 Oct 28, 2017

Make the expected outcome easier to read

18a158f

gfyoung added IO CSV read_csv, to_csv Output-Formatting __repr__ of pandas objects, to_string labels Oct 30, 2017

jreback requested changes Oct 31, 2017

View reviewed changes

Address comment about xfailing the utf8 case

86a3a1f

jreback added this to the 0.22.0 milestone Nov 7, 2017

jreback approved these changes Nov 7, 2017

View reviewed changes

jreback merged commit a2d0eed into pandas-dev:master Nov 7, 2017

watercrossing pushed a commit to watercrossing/pandas that referenced this pull request Nov 10, 2017

Fix df.to_csv() for string arrays when encoded in utf-8 (pandas-dev#1…

2e93d21

…8013)

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

Fix df.to_csv() for string arrays when encoded in utf-8 (pandas-dev#1…

15af361

…8013)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix df.to_csv() for string arrays when encoded in utf-8 #18013

Fix df.to_csv() for string arrays when encoded in utf-8 #18013

Uh oh!

rtkaleta commented Oct 28, 2017 •

edited

Loading

Uh oh!

jreback Oct 31, 2017

Uh oh!

rtkaleta commented Nov 5, 2017 •

edited

Loading

Uh oh!

codecov bot commented Nov 5, 2017 •

edited

Loading

Uh oh!

jreback commented Nov 6, 2017

Uh oh!

jreback commented Nov 7, 2017

Uh oh!

Uh oh!

Uh oh!

Fix df.to_csv() for string arrays when encoded in utf-8 #18013

Fix df.to_csv() for string arrays when encoded in utf-8 #18013

Uh oh!

Conversation

rtkaleta commented Oct 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback Oct 31, 2017

Choose a reason for hiding this comment

Uh oh!

rtkaleta commented Nov 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Nov 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jreback commented Nov 6, 2017

Uh oh!

jreback commented Nov 7, 2017

Uh oh!

Uh oh!

rtkaleta commented Oct 28, 2017 •

edited

Loading

rtkaleta commented Nov 5, 2017 •

edited

Loading

codecov bot commented Nov 5, 2017 •

edited

Loading