Skip to content

always use UnicodeWriter for csv, default to utf-8 #2006

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

always use UnicodeWriter for csv, default to utf-8 #2006

wants to merge 1 commit into from

Conversation

ghost
Copy link

@ghost ghost commented Oct 2, 2012

see long commit message for the rational.

maybe closes #1966
If the input is NOT pure ascii and no encoding is specified,
the python stdlib csv module will die.
if the input IS pure ascii, then using UnicodeWriter with utf-8 as
encoding will produce the same end result as a pure ascii writer.
This change will "just work" for more cases. also, presumably,
internal representations of all text in pandas will eventually
be unicode, so this meshes with that program too.

there might be a performance issue for large files (is the python
csv native?). If so, I think this still the way to go with
the stdlib csv module becoming the optional path.

a lot of issues have touched on csv and unicode,
see #206,#300,#680,#705,#1966, probably more
@ghost
Copy link
Author

ghost commented Oct 2, 2012

should be ok now. The patch dovetails with the series in #2005.
is this ok, or should there be a keyword to force native csv?

@ghost
Copy link
Author

ghost commented Oct 11, 2012

withdrawn. bad idea.

@ghost ghost closed this Oct 11, 2012
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants