-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
UnicodeEncodeError from DataFrame.to_records #11879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
you are referring to a VERY old issue FYI. Pls show this should be: |
If you can't be bothered to verify the code I posted, then just delete the issue. I don't give a damn. |
@kynnjo I did repro right after you posted that's why I marked it as a bug we don't appreciate rude behavior. please use respectful language. |
just delete the issue and we're done |
I actually find this a valid issue. thank you for reporting. don't you wish to see pandas improved and others helped? |
This works on current HEAD:
Please consider closing. |
This fails in py2.
|
column names in python 2. closes pandas-dev#11879 closes pandas-dev#13462
Pandas 0.20.0 introduced a bug fix [1] which changed the behaviour of 'DataFrame.to_records()', so that the resulting Record objects dtype names are unicodes if the data frames column names were unicode. Before this bug fix the dtype names were str, no matter whether the column names were str or unicode. Unfortunately np unpickle breaks if dtype names are unicode [2]. Since many of our data frame columns are unicode, loading arrays often fails. In python3 this isn't a problem anymore, so until then we fix this by introducing a simple monkey patch to pandas, which basically reverts the mentioned bug fix. [1] pandas-dev/pandas#11879 [2] Small example to reproduce this error: '' import os import numpy as np import pandas as pd r = pd.DataFrame({u'A':[1,2,3]}).to_records() a = np.ndarray(shape=r.shape, dtype=r.dtype.fields) p = "t" try: os.remove(p) except: pass with open(p, 'wb') as f: np.save(f, a) with open(p, 'rb') as f: np.load(f) ''
Pandas 0.20.0 introduced a bug fix [1] which changed the behaviour of 'DataFrame.to_records()', so that the resulting Record objects dtype names are unicodes if the data frames column names were unicode. Before this bug fix the dtype names were str, no matter whether the column names were str or unicode. Unfortunately np unpickle breaks if dtype names are unicode [2]. Since many of our data frame columns are unicode, loading arrays often fails. In python3 this isn't a problem anymore, so until then we fix this by introducing a simple monkey patch to pandas, which basically reverts the mentioned bug fix. [1] pandas-dev/pandas#11879 [2] Small example to reproduce this error: '' import os import numpy as np import pandas as pd r = pd.DataFrame({u'A':[1,2,3]}).to_records() a = np.ndarray(shape=r.shape, dtype=r.dtype.fields) p = "t" try: os.remove(p) except: pass with open(p, 'wb') as f: np.save(f, a) with open(p, 'rb') as f: np.load(f) ''
Pandas 0.20.0 introduced a bug fix [1] which changed the behaviour of 'DataFrame.to_records()', so that the resulting Record objects dtype names are unicodes if the data frames column names were unicode. Before this bug fix the dtype names were str, no matter whether the column names were str or unicode. Unfortunately np unpickle breaks if dtype names are unicode [2]. Since many of our data frame columns are unicode, loading arrays often fails. In python3 this isn't a problem anymore, so until then we fix this by introducing a simple monkey patch to pandas, which basically reverts the mentioned bug fix. [1] pandas-dev/pandas#11879 [2] Small example to reproduce this error: '' import os import numpy as np import pandas as pd r = pd.DataFrame({u'A':[1,2,3]}).to_records() a = np.ndarray(shape=r.shape, dtype=r.dtype.fields) p = "t" try: os.remove(p) except: pass with open(p, 'wb') as f: np.save(f, a) with open(p, 'rb') as f: np.load(f) ''
Pandas 0.20.0 introduced a bug fix [1] which changed the behaviour of 'DataFrame.to_records()', so that the resulting Record objects dtype names are unicodes if the data frames column names were unicode. Before this bug fix the dtype names were str, no matter whether the column names were str or unicode. Unfortunately np unpickle breaks if dtype names are unicode [2]. Since many of our data frame columns are unicode, loading arrays often fails. In python3 this isn't a problem anymore, so until then we fix this by introducing a simple monkey patch to pandas, which basically reverts the mentioned bug fix. [1] pandas-dev/pandas#11879 [2] Small example to reproduce this error: '' import os import numpy as np import pandas as pd r = pd.DataFrame({u'A':[1,2,3]}).to_records() a = np.ndarray(shape=r.shape, dtype=r.dtype.fields) p = "t" try: os.remove(p) except: pass with open(p, 'wb') as f: np.save(f, a) with open(p, 'rb') as f: np.load(f) '' /reviewed-on https://lab.nexedi.com/nexedi/erp5/merge_requests/1738 /reviewed-by @jerome @klaus
Pandas 0.20.0 introduced a bug fix [1] which changed the behaviour of 'DataFrame.to_records()', so that the resulting Record objects dtype names are unicodes if the data frames column names were unicode. Before this bug fix the dtype names were str, no matter whether the column names were str or unicode. Unfortunately np unpickle breaks if dtype names are unicode [2]. Since many of our data frame columns are unicode, loading arrays often fails. In python3 this isn't a problem anymore, so until then we fix this by introducing a simple monkey patch to pandas, which basically reverts the mentioned bug fix. [1] pandas-dev/pandas#11879 [2] Small example to reproduce this error: '' import os import numpy as np import pandas as pd r = pd.DataFrame({u'A':[1,2,3]}).to_records() a = np.ndarray(shape=r.shape, dtype=r.dtype.fields) p = "t" try: os.remove(p) except: pass with open(p, 'wb') as f: np.save(f, a) with open(p, 'rb') as f: np.load(f) '' /reviewed-on https://lab.nexedi.com/nexedi/erp5/merge_requests/1738 /reviewed-by @jerome @klaus
Pandas 0.20.0 introduced a bug fix [1] which changed the behaviour of 'DataFrame.to_records()', so that the resulting Record objects dtype names are unicodes if the data frames column names were unicode. Before this bug fix the dtype names were str, no matter whether the column names were str or unicode. Unfortunately np unpickle breaks if dtype names are unicode [2]. Since many of our data frame columns are unicode, loading arrays often fails. In python3 this isn't a problem anymore, so until then we fix this by introducing a simple monkey patch to pandas, which basically reverts the mentioned bug fix. [1] pandas-dev/pandas#11879 [2] Small example to reproduce this error: '' import os import numpy as np import pandas as pd r = pd.DataFrame({u'A':[1,2,3]}).to_records() a = np.ndarray(shape=r.shape, dtype=r.dtype.fields) p = "t" try: os.remove(p) except: pass with open(p, 'wb') as f: np.save(f, a) with open(p, 'rb') as f: np.load(f) '' /reviewed-on https://lab.nexedi.com/nexedi/erp5/merge_requests/1738 /reviewed-by @jerome @klaus
The
DataFrame.to_records
method fails with aUnicodeEncodeError
for some unicode column names.(This issue is related to #680. The example below extends the example given in that issue.)
The text was updated successfully, but these errors were encountered: