-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Fix df.to_csv() for string arrays when encoded in utf-8 #18013
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
str_array = [{'names': ['foo', 'bar']}, {'names': ['baz', 'qux']}] | ||
df = pd.DataFrame(str_array) | ||
expected_ascii = '''\ | ||
,names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if you make this 2 test functions, then you can xfail the non-working one (to at least get things passing)
@jreback It seems the current behaviour stems from the fact that pandas' own >>> from pandas.io.formats.printing import pprint_thing
>>> pprint_thing([u'foo', u'bar'])
u'[foo, bar]' instead of the more intuitive (at least to me): >>> pprint_thing([u'foo', u'bar'], quote_strings=True)
u"[u'foo', u'bar']" A couple of questions come to mind:
|
Codecov Report
@@ Coverage Diff @@
## master #18013 +/- ##
==========================================
- Coverage 91.24% 91.24% -0.01%
==========================================
Files 163 163
Lines 50176 50124 -52
==========================================
- Hits 45785 45734 -51
+ Misses 4391 4390 -1
Continue to review full report at Codecov.
|
will have a look |
thanks @rtkaleta love for you to take a stab at fixing the xfailed unicode case! |
So it looks like
df.to_csv()
is now working correctly for string arrays when using theascii
encoding but it is still broken when usingutf-8
.