Skip to content

Series.replace fails on categoricals with list #31720

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bnaul opened this issue Feb 5, 2020 · 3 comments · Fixed by #31734
Closed

Series.replace fails on categoricals with list #31720

bnaul opened this issue Feb 5, 2020 · 3 comments · Fixed by #31734
Labels
Categorical Categorical Data Type Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@bnaul
Copy link
Contributor

bnaul commented Feb 5, 2020

Looks like this was introduced in #27026: not sure of the details of what was incorrect in the old version but I do know that this at least didn't error:

cat = pd.CategoricalIndex(['a', 'b']).to_series()
cat.replace('a', 'A')  # works
cat.replace(['a'], 'A')
~/model/.venv/lib/python3.7/site-packages/pandas/core/arrays/categorical.py in replace(self, to_replace, value, inplace)
   2440         inplace = validate_bool_kwarg(inplace, "inplace")
   2441         cat = self if inplace else self.copy()
-> 2442         if to_replace in cat.categories:
   2443             if isna(value):
   2444                 cat.remove_categories(to_replace, inplace=True)

~/model/.venv/lib/python3.7/site-packages/pandas/core/indexes/base.py in __contains__(self, key)
   3898     @Appender(_index_shared_docs["contains"] % _index_doc_kwargs)
   3899     def __contains__(self, key) -> bool:
-> 3900         hash(key)
   3901         try:
   3902             return key in self._engine

TypeError: unhashable type: 'list'

Seems like most of the methods used in this implementation already handle list-like inputs so it should be a pretty easy fix..?

Output of pd.show_versions()

pandas : 1.0.0
@bnaul
Copy link
Contributor Author

bnaul commented Feb 5, 2020

cc @JustinZhengBC just in case you have an easy fix off the top of your head 🙂

@jorisvandenbossche jorisvandenbossche added Categorical Categorical Data Type Regression Functionality that used to work in a prior pandas version labels Feb 5, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.0.2 milestone Feb 5, 2020
@dsaxton
Copy link
Member

dsaxton commented Feb 5, 2020

Also interesting that tuples of values get ignored instead of raising an error:

In [11]: s                                                                                                                                                            
Out[11]: 
0    0
1    1
2    2
3    3
4    4
dtype: category
Categories (5, int64): [0, 1, 2, 3, 4]

In [12]: s.replace((0,), 999)                                                                                                                                         
Out[12]: 
0    0
1    1
2    2
3    3
4    4
dtype: category
Categories (5, int64): [0, 1, 2, 3, 4]

In [13]: s.astype(int).replace((0,), 999)                                                                                                                             
Out[13]: 
0    999
1      1
2      2
3      3
4      4
dtype: int64

@JustinZhengBC
Copy link
Contributor

At a glance it looks like we can just check if to_replace and value are list-like and do recursive calls accordingly. Tuples probably are doing nothing because it's checking if the tuple itself is a category, rather than the values within it. I'll see if I can get a fix in tonight

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants