BUG: .filter with unicode labels when can't encode #13101

griai · 2016-05-06T07:41:01Z

Edit #10506 breaks if the DataFrame contains unicode column names with non-ASCII characters.

import pandas as pd
df = pd.DataFrame({u'a': [1, 2, 3], u'ä': [4, 5, 6]})
df.filter(regex=u'a')

throws me a

---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-10-9de5a19c260e> in <module>()
----> 1 df.filter(regex=u'a')

C:\Users\...\AppData\Local\Continuum\32bit\Anaconda\envs\test\lib\site-packages\pandas\core\generic.pyc in filter(self, items, like, regex, axis)
   2013             matcher = re.compile(regex)
   2014             return self.select(lambda x: matcher.search(str(x)) is not None,
-> 2015                                axis=axis_name)
   2016         else:
   2017             raise TypeError('Must pass either `items`, `like`, or `regex`')

C:\Users\...\AppData\Local\Continuum\32bit\Anaconda\envs\test\lib\site-packages\pandas\core\generic.pyc in select(self, crit, axis)
   1545         if len(axis_values) > 0:
   1546             new_axis = axis_values[
-> 1547                 np.asarray([bool(crit(label)) for label in axis_values])]
   1548         else:
   1549             new_axis = axis_values

C:\Users\...\AppData\Local\Continuum\32bit\Anaconda\envs\test\lib\site-packages\pandas\core\generic.pyc in <lambda>(x)
   2012         elif regex:
   2013             matcher = re.compile(regex)
-> 2014             return self.select(lambda x: matcher.search(str(x)) is not None,
   2015                                axis=axis_name)
   2016         else:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 0: ordinal not in range(128)

The text was updated successfully, but these errors were encountered:

jreback · 2016-05-06T13:02:57Z

xref #10384

yeah str(x) will try to encode, so probably easiest to either just catch this (and pass thru if it cannot encode), or just stringify integers (but then that leaves out things like float columns and such).

So I think the former is ok. want to do a PR?

would need to add some tests for other column label types as well

(e.g. the tests should loop thru all of the index types).

griai · 2016-05-09T07:48:23Z

I don't have an installed git environment at the moment. So I cannot do the Pull Request, unfortunately.
I would support the passing-through solution if the argument cannot be encoded, since it is the easiest and a pretty general fix (although this fallback mechanism might seem a bit intransparent).

griai changed the title ~~Edit #10506 breaks if the DataFrame contains unicode column names with non-ASCII characters.~~ BUG: Edit #10506 breaks if the DataFrame contains unicode column names with non-ASCII characters. May 6, 2016

jreback added Bug Unicode Unicode strings Difficulty Novice labels May 6, 2016

jreback added this to the 0.18.2 milestone May 6, 2016

jreback changed the title ~~BUG: Edit #10506 breaks if the DataFrame contains unicode column names with non-ASCII characters.~~ BUG: .filter with unicode labels when can't encode May 6, 2016

fmarczin mentioned this issue May 18, 2016

json_normalize() can't deal with non-ascii characters in unicode keys #13213

Closed

jorisvandenbossche modified the milestones: Next Major Release, 0.19.0 Aug 18, 2016

TomAugspurger added the good first issue label Oct 11, 2017

Licht-T mentioned this issue Nov 12, 2017

BUG: Fix filter method so that accepts byte and unicode column names #18238

Merged

4 tasks

jreback modified the milestones: Next Major Release, 0.21.1 Nov 12, 2017

jreback closed this as completed in #18238 Nov 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: .filter with unicode labels when can't encode #13101

BUG: .filter with unicode labels when can't encode #13101

griai commented May 6, 2016

jreback commented May 6, 2016

griai commented May 9, 2016

BUG: .filter with unicode labels when can't encode #13101

BUG: .filter with unicode labels when can't encode #13101

Comments

griai commented May 6, 2016

jreback commented May 6, 2016

griai commented May 9, 2016