Categorical.replace() unexpectedly returns non-categorical #18250

toobaz · 2017-11-12T21:35:51Z

Code Sample, a copy-pastable example if possible

In [2]: pd.Series([1, 2, 3, 2], dtype='category').replace(2, 1) # good
Out[2]: 
0    1
1    1
2    3
3    1
dtype: category
Categories (3, int64): [1, 2, 3]

In [3]: pd.Series([1, 2, 3, 2], dtype='category').replace(2, 15) # bad
Out[3]: 
0     1
1    15
2     3
3    15
dtype: int64

Problem description

I think replace() called with unknown category should either raise an error, or just replace the element in the list of categories.

The current behavior is even more annoying if you do

In [4]: pd.Series([1, 2, np.nan, 2], dtype='category').replace(np.nan, 15)
Out[4]: 
0     1.0
1     2.0
2    15.0
3     2.0
dtype: float64

because precisely the operation of removing NaNs from integers makes then floats! Notice, for comparison, that pd.Series([1, 2, np.nan, 2], dtype='category').fillna(15) raises a ValueError.

This is related to (the discussion about) #18185 .

Expected Output

In [3]: pd.Series([1, 2, 3, 2], dtype='category').replace(2, 15)
Out[3]: 
0    1
1    15
2    3
3    15
dtype: category
Categories (3, int64): [1, 3, 15]

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: 9e3ad63
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-3-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8

pandas: 0.22.0.dev0+114.g9e3ad63cd
pytest: 3.2.3
pip: 9.0.1
setuptools: 36.7.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.0dev
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: None
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.6
lxml: None
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1

The text was updated successfully, but these errors were encountered:

jreback · 2017-11-12T21:55:17Z

yeah I think raising would be ok here; though we do tend to coerce categorical to object if the user does something w/o changing the categories. This becomes an issue not with a Series but with a whole frame where some columns are categorical and some are not.

mroeschke · 2021-06-12T18:56:56Z

This looks to work on master. Could use a test

In [23]:
    ...: In [3]: pd.Series([1, 2, 3, 2], dtype='category').replace(2, 15)
Out[23]:
0     1
1    15
2     3
3    15
dtype: category
Categories (3, int64): [1, 15, 3]

Bhavay-2001 · 2021-08-18T14:04:06Z

hey @mroeschke , I want to work on this issue. Could you please assign this to me?

jbrockmendel · 2021-12-21T03:26:04Z

closing as tests for this exist in both tests.arrays.categorical and tests.series.methods.test_replace (xref #24971, #23305)

jreback added Categorical Categorical Data Type Error Reporting Incorrect or improved errors from pandas labels Nov 12, 2017

TomAugspurger mentioned this issue Jul 4, 2018

DataFrame.replace doesn't work with categorical data #21726

Closed

mroeschke added Bug replace replace method labels Jun 28, 2020

oguzhanogreden mentioned this issue Nov 16, 2020

BUG: pd.DataFrame.replace(to_replace: list, method: str) raises TypeError: No matching signature found #37899

Closed

3 tasks

mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Categorical Categorical Data Type Error Reporting Incorrect or improved errors from pandas replace replace method labels Jun 12, 2021

jbrockmendel closed this as completed Dec 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Categorical.replace() unexpectedly returns non-categorical #18250

Categorical.replace() unexpectedly returns non-categorical #18250

toobaz commented Nov 12, 2017

INSTALLED VERSIONS

jreback commented Nov 12, 2017 •

edited

Loading

mroeschke commented Jun 12, 2021

Bhavay-2001 commented Aug 18, 2021

jbrockmendel commented Dec 21, 2021 •

edited

Loading

Categorical.replace() unexpectedly returns non-categorical #18250

Categorical.replace() unexpectedly returns non-categorical #18250

Comments

toobaz commented Nov 12, 2017

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

jreback commented Nov 12, 2017 • edited Loading

mroeschke commented Jun 12, 2021

Bhavay-2001 commented Aug 18, 2021

jbrockmendel commented Dec 21, 2021 • edited Loading

Output of `pd.show_versions()`

jreback commented Nov 12, 2017 •

edited

Loading

jbrockmendel commented Dec 21, 2021 •

edited

Loading