BUG: Series.map using categorical Series raises AttributeError #10464

sinhrks · 2015-06-28T11:03:24Z

Based on #9848, using .get_values should be avoided?

jreback · 2015-06-28T11:05:38Z

get_values is fine
in fact this is the point of it

sinhrks · 2015-06-28T11:13:35Z

Thanks. Another point I forgot to refer is below case results in object dtype. Should it be category?

a = Series([1, 2, 3, 4])
b = Series(["even", "odd", "even", "odd"], dtype="category")
a.map(b)
# 0     odd
# 1    even
# 2     odd
# 3     NaN
# dtype: object

jreback · 2015-06-28T11:48:30Z

pandas/core/series.py

@@ -2010,7 +2010,7 @@ def map_f(values, f):
                arg = self._constructor(arg, index=arg.keys())

            indexer = arg.index.get_indexer(values)
-            new_values = com.take_1d(arg.values, indexer)
+            new_values = com.take_1d(arg.get_values(), indexer)


so the way to do this is actually
arg.values.take_nd(indexer) which will return a Categorical.

so maybe make this mod in core.common.take_nd (which is aliases to take_1d). if its a categorical_dtype, then short-circuit as I show above. (it ultimately calls take_1d as well but with the codes)

sinhrks · 2015-06-28T13:21:44Z

Applied a fix to common.take_nd. I will add more tests later.

jreback · 2015-06-28T14:34:40Z

pandas/core/common.py

+
+    if is_categorical(arr):
+        result = take_nd(arr.get_values(), indexer, axis=axis, out=out,
+                         fill_value=fill_value, mask_info=mask_info,


more like return arr.values.take_1d(...)

@jreback Categorical doesn't have values, but get_values().

import pandas as pd c = pd.Categorical(['A', 'B', 'A', 'A']) c # [A, B, A, A] # Categories (2, object): [A, B] c.values # AttributeError: 'Categorical' object has no attribute 'values'

no, you send this from a higher level (e.g. arr is actually a Series/CategoricalIndex object when you start).

You must use the take_1d in Categorical, and NOT create one here, this is moving too specific knowledge to this function

something like:

if is_categorical_dtype(arr): arr = getattr(arr,'value',arr) return arr.take_1d(....) ...

@jreback I think I could understand. Changed to use take_nd, as Categorical doesn't have take_1d.

sinhrks · 2015-06-29T12:07:45Z

BTW, I've noticed Categorical has its own take_nd. I think we can use this, but it raises IndexError for non-existing key.

import pandas as pd
c = pd.Categorical(['A', 'B', 'A', 'A'])
c.take_nd([2, 1])
# [A, B]
# Categories (2, object): [A, B]

c.take_nd([2, 1, 4])
# IndexError: Out of bounds on buffer access (axis 0)

So another point of the original issue is the behavior when the mapped category doesn't contain all the mappings. Should raise or padding with np.nan (create a new category implicitly)?

# this should work because all values are mapped to existing categories
pd.Series([0, 1, 2, 3]).map(pd.Series(["even", "odd", "even", "odd"], dtype="category"))
# 0    even
# 1     odd
# 2    even
# 3     odd
# dtype: category 

pd.Series([1, 2, 3, 4]).map(pd.Series(["even", "odd", "even", "odd"], dtype="category"))
# raise or padding with np.nan?

jreback · 2015-06-29T12:13:32Z

These should be the same as a post-convert to category
The nans are NOT part of the categorical, rather missing in the Series themselves.

In [5]: a = pd.Series([0, 1, 2, 3]).map(pd.Series(["even", "odd", "even", "odd"]))                   

In [6]: b = pd.Series([1, 2, 3, 4]).map(pd.Series(["even", "odd", "even", "odd"]))

In [7]: a.astype('category')
Out[7]: 
0    even
1     odd
2    even
3     odd
dtype: category
Categories (2, object): [even, odd]

In [8]: b.astype('category')
Out[8]: 
0     odd
1    even
2     odd
3     NaN
dtype: category
Categories (2, object): [even, odd]

sinhrks · 2015-06-29T12:41:04Z

I see. Then, current impl works as expected. I'll consider if it can be better.

b = pd.Series([1, 2, 3, 4]).map(pd.Series(["even", "odd", "even", "odd"], dtype='category'))
b
# 0     odd
# 1    even
# 2     odd
# 3     NaN
# dtype: category
# Categories (2, object): [even, odd]

jreback · 2015-07-01T13:28:57Z

pandas/core/common.py

+
+    if is_categorical(arr):
+        return arr.take_nd(indexer, fill_value=fill_value,
+                           allow_fill=allow_fill)


jreback · 2015-07-01T13:29:31Z

looks good. go ahead and merge when green.

sinhrks · 2015-07-01T13:46:03Z

@jreback Thanks for review. Merging.

BUG: Series.map using categorical Series raises AttributeError

sinhrks added Bug Compat pandas objects compatability with Numpy or Python functions Categorical Categorical Data Type labels Jun 28, 2015

sinhrks added this to the 0.17.0 milestone Jun 28, 2015

sinhrks mentioned this pull request Jun 28, 2015

BUG: MultiIndex.get_level_values created from Categorical raises AttributeError #10465

Closed

jreback reviewed Jun 28, 2015
View reviewed changes

sinhrks force-pushed the categorical_map branch 2 times, most recently from 130786d to 5566933 Compare June 28, 2015 13:19

sinhrks changed the title ~~BUG: Series.map using categorical Series raises AttributeError~~ (WIP) BUG: Series.map using categorical Series raises AttributeError Jun 28, 2015

jreback reviewed Jun 28, 2015
View reviewed changes

sinhrks force-pushed the categorical_map branch from 5566933 to dcbcc66 Compare June 30, 2015 13:12

BUG: Series.map using categorical Series raises AttributeError

762d680

sinhrks force-pushed the categorical_map branch from dcbcc66 to 762d680 Compare July 1, 2015 12:04

sinhrks changed the title ~~(WIP) BUG: Series.map using categorical Series raises AttributeError~~ BUG: Series.map using categorical Series raises AttributeError Jul 1, 2015

jreback reviewed Jul 1, 2015
View reviewed changes

pandas/core/common.py

if is_categorical(arr):

return arr.take_nd(indexer, fill_value=fill_value,

allow_fill=allow_fill)

Copy link

Contributor

jreback Jul 1, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly!

sinhrks added a commit that referenced this pull request Jul 1, 2015

Merge pull request #10464 from sinhrks/categorical_map

c962c0b

BUG: Series.map using categorical Series raises AttributeError

sinhrks merged commit c962c0b into pandas-dev:master Jul 1, 2015

sinhrks deleted the categorical_map branch July 1, 2015 13:47

jreback mentioned this pull request Aug 17, 2015

Categorical can not be used as key in merges #10832

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Series.map using categorical Series raises AttributeError #10464

BUG: Series.map using categorical Series raises AttributeError #10464

sinhrks commented Jun 28, 2015

jreback commented Jun 28, 2015

sinhrks commented Jun 28, 2015

jreback Jun 28, 2015

sinhrks commented Jun 28, 2015

jreback Jun 28, 2015

sinhrks Jun 29, 2015

jreback Jun 29, 2015

sinhrks Jul 1, 2015

sinhrks commented Jun 29, 2015

jreback commented Jun 29, 2015

sinhrks commented Jun 29, 2015

jreback Jul 1, 2015

jreback commented Jul 1, 2015

sinhrks commented Jul 1, 2015

BUG: Series.map using categorical Series raises AttributeError #10464

BUG: Series.map using categorical Series raises AttributeError #10464

Conversation

sinhrks commented Jun 28, 2015

jreback commented Jun 28, 2015

sinhrks commented Jun 28, 2015

jreback Jun 28, 2015

Choose a reason for hiding this comment

sinhrks commented Jun 28, 2015

jreback Jun 28, 2015

Choose a reason for hiding this comment

sinhrks Jun 29, 2015

Choose a reason for hiding this comment

jreback Jun 29, 2015

Choose a reason for hiding this comment

sinhrks Jul 1, 2015

Choose a reason for hiding this comment

sinhrks commented Jun 29, 2015

jreback commented Jun 29, 2015

sinhrks commented Jun 29, 2015

jreback Jul 1, 2015

Choose a reason for hiding this comment

jreback commented Jul 1, 2015

sinhrks commented Jul 1, 2015