UnicodeEncodeError from DataFrame.to_records #11879

kynnjo · 2015-12-21T18:06:24Z

The DataFrame.to_records method fails with a UnicodeEncodeError for some unicode column names.

(This issue is related to #680. The example below extends the example given in that issue.)

In [322]: df = pandas.DataFrame({u'c/\u03c3':[1,2,3]})

In [323]: df
Out[323]: 
   c/σ
0    1
1    2
2    3

In [324]: df.to_records()
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-324-6d3142e97d2d> in <module>()
----> 1 df.to_records()

/redacted/python2.7/site-packages/pandas/core/frame.pyc in to_records(self, index, convert_datetime64)
   1013             elif index_names[0] is None:
   1014                 index_names = ['index']
-> 1015             names = index_names + lmap(str, self.columns)
   1016         else:
   1017             arrays = [self[c].get_values() for c in self.columns]

UnicodeEncodeError: 'ascii' codec can't encode character u'\u03c3' in position 2: ordinal not in range(128)

The text was updated successfully, but these errors were encountered:

jreback · 2015-12-23T17:44:54Z

you are referring to a VERY old issue FYI. Pls show pd.show_versions(). This a bug in any event so pull-requests are welcome.

this should be: lmap(compat.text_type, self.columns) I think

kynnjo · 2015-12-28T04:01:49Z

If you can't be bothered to verify the code I posted, then just delete the issue. I don't give a damn.

jreback · 2015-12-28T04:07:26Z

@kynnjo I did repro right after you posted that's why I marked it as a bug
I asked nicely to have you post the diagnostic. I even put what I think the fix is.

we don't appreciate rude behavior. please use respectful language.

kynnjo · 2015-12-28T04:09:19Z

just delete the issue and we're done

jreback · 2015-12-28T04:10:16Z

I actually find this a valid issue. thank you for reporting. don't you wish to see pandas improved and others helped?

gliptak · 2016-05-28T23:29:19Z

This works on current HEAD:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({u'c/\u03c3':[1,2,3]})

In [3]: df
Out[3]: 
   c/σ
0    1
1    2
2    3

In [4]: df.to_records()
Out[4]: 
rec.array([(0, 1), (1, 2), (2, 3)], 
          dtype=[('index', '<i8'), ('c/σ', '<i8')])

Please consider closing.

jreback · 2016-05-29T14:43:02Z

This fails in py2.

In [1]: df = pandas.DataFrame({u'c/\u03c3':[1,2,3]})

In [2]: df.to_records()
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-2-6d3142e97d2d> in <module>()
----> 1 df.to_records()

/Users/jreback/pandas/pandas/core/frame.pyc in to_records(self, index, convert_datetime64)
   1063             elif index_names[0] is None:
   1064                 index_names = ['index']
-> 1065             names = lmap(str, index_names) + lmap(str, self.columns)
   1066         else:
   1067             arrays = [self[c].get_values() for c in self.columns]

UnicodeEncodeError: 'ascii' codec can't encode character u'\u03c3' in position 2: ordinal not in range(128)

column names in python 2. closes pandas-dev#11879 closes pandas-dev#13462

Pandas 0.20.0 introduced a bug fix [1] which changed the behaviour of 'DataFrame.to_records()', so that the resulting Record objects dtype names are unicodes if the data frames column names were unicode. Before this bug fix the dtype names were str, no matter whether the column names were str or unicode. Unfortunately np unpickle breaks if dtype names are unicode [2]. Since many of our data frame columns are unicode, loading arrays often fails. In python3 this isn't a problem anymore, so until then we fix this by introducing a simple monkey patch to pandas, which basically reverts the mentioned bug fix. [1] pandas-dev/pandas#11879 [2] Small example to reproduce this error: '' import os import numpy as np import pandas as pd r = pd.DataFrame({u'A':[1,2,3]}).to_records() a = np.ndarray(shape=r.shape, dtype=r.dtype.fields) p = "t" try: os.remove(p) except: pass with open(p, 'wb') as f: np.save(f, a) with open(p, 'rb') as f: np.load(f) ''

@jerome

Pandas 0.20.0 introduced a bug fix [1] which changed the behaviour of 'DataFrame.to_records()', so that the resulting Record objects dtype names are unicodes if the data frames column names were unicode. Before this bug fix the dtype names were str, no matter whether the column names were str or unicode. Unfortunately np unpickle breaks if dtype names are unicode [2]. Since many of our data frame columns are unicode, loading arrays often fails. In python3 this isn't a problem anymore, so until then we fix this by introducing a simple monkey patch to pandas, which basically reverts the mentioned bug fix. [1] pandas-dev/pandas#11879 [2] Small example to reproduce this error: '' import os import numpy as np import pandas as pd r = pd.DataFrame({u'A':[1,2,3]}).to_records() a = np.ndarray(shape=r.shape, dtype=r.dtype.fields) p = "t" try: os.remove(p) except: pass with open(p, 'wb') as f: np.save(f, a) with open(p, 'rb') as f: np.load(f) '' /reviewed-on https://lab.nexedi.com/nexedi/erp5/merge_requests/1738 /reviewed-by @jerome @klaus

@jerome

Pandas 0.20.0 introduced a bug fix [1] which changed the behaviour of 'DataFrame.to_records()', so that the resulting Record objects dtype names are unicodes if the data frames column names were unicode. Before this bug fix the dtype names were str, no matter whether the column names were str or unicode. Unfortunately np unpickle breaks if dtype names are unicode [2]. Since many of our data frame columns are unicode, loading arrays often fails. In python3 this isn't a problem anymore, so until then we fix this by introducing a simple monkey patch to pandas, which basically reverts the mentioned bug fix. [1] pandas-dev/pandas#11879 [2] Small example to reproduce this error: '' import os import numpy as np import pandas as pd r = pd.DataFrame({u'A':[1,2,3]}).to_records() a = np.ndarray(shape=r.shape, dtype=r.dtype.fields) p = "t" try: os.remove(p) except: pass with open(p, 'wb') as f: np.save(f, a) with open(p, 'rb') as f: np.load(f) '' /reviewed-on https://lab.nexedi.com/nexedi/erp5/merge_requests/1738 /reviewed-by @jerome @klaus

@jerome

Pandas 0.20.0 introduced a bug fix [1] which changed the behaviour of 'DataFrame.to_records()', so that the resulting Record objects dtype names are unicodes if the data frames column names were unicode. Before this bug fix the dtype names were str, no matter whether the column names were str or unicode. Unfortunately np unpickle breaks if dtype names are unicode [2]. Since many of our data frame columns are unicode, loading arrays often fails. In python3 this isn't a problem anymore, so until then we fix this by introducing a simple monkey patch to pandas, which basically reverts the mentioned bug fix. [1] pandas-dev/pandas#11879 [2] Small example to reproduce this error: '' import os import numpy as np import pandas as pd r = pd.DataFrame({u'A':[1,2,3]}).to_records() a = np.ndarray(shape=r.shape, dtype=r.dtype.fields) p = "t" try: os.remove(p) except: pass with open(p, 'wb') as f: np.save(f, a) with open(p, 'rb') as f: np.load(f) '' /reviewed-on https://lab.nexedi.com/nexedi/erp5/merge_requests/1738 /reviewed-by @jerome @klaus

jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Unicode Unicode strings Difficulty Novice labels Dec 23, 2015

jreback added this to the Next Major Release milestone Dec 23, 2015

AlexisMignon mentioned this issue Jun 16, 2016

BUG: Fix a bug when using DataFrame.to_records with unicode column names #13462

Closed

4 tasks

jreback modified the milestones: 0.20.0, Next Major Release Dec 30, 2016

jreback closed this as completed in 25dcff5 Feb 27, 2017

AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this issue Mar 21, 2017

BUG: Fix a bug occuring when using DataFrame.to_records with unicode

cd64027

column names in python 2. closes pandas-dev#11879 closes pandas-dev#13462

TomAugspurger mentioned this issue May 15, 2017

DataFrame.to_records dtype shouldn't use unicode for every column #16358

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeEncodeError from DataFrame.to_records #11879

UnicodeEncodeError from DataFrame.to_records #11879

kynnjo commented Dec 21, 2015

jreback commented Dec 23, 2015

kynnjo commented Dec 28, 2015

jreback commented Dec 28, 2015

kynnjo commented Dec 28, 2015

jreback commented Dec 28, 2015

gliptak commented May 28, 2016

jreback commented May 29, 2016

UnicodeEncodeError from DataFrame.to_records #11879

UnicodeEncodeError from DataFrame.to_records #11879

Comments

kynnjo commented Dec 21, 2015

jreback commented Dec 23, 2015

kynnjo commented Dec 28, 2015

jreback commented Dec 28, 2015

kynnjo commented Dec 28, 2015

jreback commented Dec 28, 2015

gliptak commented May 28, 2016

jreback commented May 29, 2016