Skip to content

BUG: Series.map using categorical Series raises AttributeError #10464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 1, 2015

Conversation

sinhrks
Copy link
Member

@sinhrks sinhrks commented Jun 28, 2015

Closes #10324. Closes #10460.

Based on #9848, using .get_values should be avoided?

@sinhrks sinhrks added Bug Compat pandas objects compatability with Numpy or Python functions Categorical Categorical Data Type labels Jun 28, 2015
@sinhrks sinhrks added this to the 0.17.0 milestone Jun 28, 2015
@jreback
Copy link
Contributor

jreback commented Jun 28, 2015

get_values is fine
in fact this is the point of it

@sinhrks
Copy link
Member Author

sinhrks commented Jun 28, 2015

Thanks. Another point I forgot to refer is below case results in object dtype. Should it be category?

a = Series([1, 2, 3, 4])
b = Series(["even", "odd", "even", "odd"], dtype="category")
a.map(b)
# 0     odd
# 1    even
# 2     odd
# 3     NaN
# dtype: object

@@ -2010,7 +2010,7 @@ def map_f(values, f):
arg = self._constructor(arg, index=arg.keys())

indexer = arg.index.get_indexer(values)
new_values = com.take_1d(arg.values, indexer)
new_values = com.take_1d(arg.get_values(), indexer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the way to do this is actually
arg.values.take_nd(indexer) which will return a Categorical.

so maybe make this mod in core.common.take_nd (which is aliases to take_1d). if its a categorical_dtype, then short-circuit as I show above. (it ultimately calls take_1d as well but with the codes)

@sinhrks sinhrks force-pushed the categorical_map branch 2 times, most recently from 130786d to 5566933 Compare June 28, 2015 13:19
@sinhrks sinhrks changed the title BUG: Series.map using categorical Series raises AttributeError (WIP) BUG: Series.map using categorical Series raises AttributeError Jun 28, 2015
@sinhrks
Copy link
Member Author

sinhrks commented Jun 28, 2015

Applied a fix to common.take_nd. I will add more tests later.


if is_categorical(arr):
result = take_nd(arr.get_values(), indexer, axis=axis, out=out,
fill_value=fill_value, mask_info=mask_info,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more like return arr.values.take_1d(...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback Categorical doesn't have values, but get_values().

import pandas as pd
c = pd.Categorical(['A', 'B', 'A', 'A'])
c
# [A, B, A, A]
# Categories (2, object): [A, B]
c.values
# AttributeError: 'Categorical' object has no attribute 'values'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, you send this from a higher level (e.g. arr is actually a Series/CategoricalIndex object when you start).

You must use the take_1d in Categorical, and NOT create one here, this is moving too specific knowledge to this function

something like:

if is_categorical_dtype(arr):
    arr = getattr(arr,'value',arr)
    return arr.take_1d(....)

...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I think I could understand. Changed to use take_nd, as Categorical doesn't have take_1d.

@sinhrks
Copy link
Member Author

sinhrks commented Jun 29, 2015

BTW, I've noticed Categorical has its own take_nd. I think we can use this, but it raises IndexError for non-existing key.

import pandas as pd
c = pd.Categorical(['A', 'B', 'A', 'A'])
c.take_nd([2, 1])
# [A, B]
# Categories (2, object): [A, B]

c.take_nd([2, 1, 4])
# IndexError: Out of bounds on buffer access (axis 0)

So another point of the original issue is the behavior when the mapped category doesn't contain all the mappings. Should raise or padding with np.nan (create a new category implicitly)?

# this should work because all values are mapped to existing categories
pd.Series([0, 1, 2, 3]).map(pd.Series(["even", "odd", "even", "odd"], dtype="category"))
# 0    even
# 1     odd
# 2    even
# 3     odd
# dtype: category 

pd.Series([1, 2, 3, 4]).map(pd.Series(["even", "odd", "even", "odd"], dtype="category"))
# raise or padding with np.nan?

@jreback
Copy link
Contributor

jreback commented Jun 29, 2015

These should be the same as a post-convert to category
The nans are NOT part of the categorical, rather missing in the Series themselves.

In [5]: a = pd.Series([0, 1, 2, 3]).map(pd.Series(["even", "odd", "even", "odd"]))                   

In [6]: b = pd.Series([1, 2, 3, 4]).map(pd.Series(["even", "odd", "even", "odd"]))

In [7]: a.astype('category')
Out[7]: 
0    even
1     odd
2    even
3     odd
dtype: category
Categories (2, object): [even, odd]

In [8]: b.astype('category')
Out[8]: 
0     odd
1    even
2     odd
3     NaN
dtype: category
Categories (2, object): [even, odd]

@sinhrks
Copy link
Member Author

sinhrks commented Jun 29, 2015

I see. Then, current impl works as expected. I'll consider if it can be better.

b = pd.Series([1, 2, 3, 4]).map(pd.Series(["even", "odd", "even", "odd"], dtype='category'))
b
# 0     odd
# 1    even
# 2     odd
# 3     NaN
# dtype: category
# Categories (2, object): [even, odd]

@sinhrks sinhrks force-pushed the categorical_map branch from dcbcc66 to 762d680 Compare July 1, 2015 12:04
@sinhrks sinhrks changed the title (WIP) BUG: Series.map using categorical Series raises AttributeError BUG: Series.map using categorical Series raises AttributeError Jul 1, 2015

if is_categorical(arr):
return arr.take_nd(indexer, fill_value=fill_value,
allow_fill=allow_fill)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly!

@jreback
Copy link
Contributor

jreback commented Jul 1, 2015

looks good. go ahead and merge when green.

@sinhrks
Copy link
Member Author

sinhrks commented Jul 1, 2015

@jreback Thanks for review. Merging.

sinhrks added a commit that referenced this pull request Jul 1, 2015
BUG: Series.map using categorical Series raises AttributeError
@sinhrks sinhrks merged commit c962c0b into pandas-dev:master Jul 1, 2015
@sinhrks sinhrks deleted the categorical_map branch July 1, 2015 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants