-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Unicode repr failure in DataFrame #795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm hitting this issue right now with the MovieLens 100k dataset, which uses the iso8859_2 encoding (as inferred by chardet). My issue: when I call df.to_string, I can pass "force_unicode=True". However, I am not sure how to set force_unicode=True for all calls to to_string, e.g. any time repr() is called on df, which occurs when printing the df to the shell. Gnarly issue. Character encodings in Python are never fun. |
@hammer : Is this with the current development version of pandas? Can you post the traceback somewhere? |
@hammer I'm able to reproduce the issue on the movielens data. If you pass
I'll see about doing this automatically with chardet or some way to modify the repr code to not blow up with a UnicodeError |
ok @hammer I think I have this sorted out. If you don't specify the encoding it will not blow up anymore:
but if you do, it will render the Unicode correctly in the console.
Short of shipping chardet I don't know if there's a way to automatically infer the encoding |
However this broke Python 3 tests. leaving issue open |
I'll look into it with Python 3. |
I get one failure, with reading the newly added CSV file. |
Allow passing keep_mins in write and append
here are lines
The text was updated successfully, but these errors were encountered: