BUG: Series(category).replace to maintain order of categories #51057

lukemanley · 2023-01-29T21:08:44Z

closes BUG: replace is changing the order of ordered categories #51016
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.

No whatsnew as this is a regression on main only.

mroeschke · 2023-01-30T18:56:18Z

pandas/tests/arrays/categorical/test_replace.py

+    # GH51016
+    dtype = pd.CategoricalDtype([0, 1, 2], ordered=True)
+    ser = pd.Series([0, 1, 2], dtype=dtype)
+    result = ser.replace(0, 2)


Do we have a test where a missing value is replaced?

I don't see any. Series(category).replace doesn't actually work for replacing missing values - see below. I checked 1.5 and it doesn't replace in 1.5 either.

In [13]: import pandas as pd In [14]: import numpy as np In [15]: dtype = pd.CategoricalDtype(["A", "B"]) In [16]: ser = pd.Series(["A", None], dtype=dtype) In [17]: ser Out[17]: 0 A 1 NaN dtype: category Categories (2, object): ['A', 'B'] In [18]: ser.replace(np.nan, "C") # <- this doesn't actually replace the missing value Out[18]: 0 A 1 NaN dtype: category Categories (2, object): ['A', 'B']

mroeschke · 2023-02-01T18:58:29Z

pandas/core/arrays/categorical.py

-        new_categories = Index(ser.drop_duplicates(keep="first"))
+
+        # GH51016: maintain order of existing categories
+        idxr = cat.categories.get_indexer_for(all_values)


Do we need to do all this if ordered=False?

1.5x maintains the existing order of categories regardless of ordered so I'd lean towards keeping that behavior. The new ordering logic is barely measurable perf-wise relative to the whole replace operation.

mroeschke · 2023-02-01T23:38:40Z

Thanks @lukemanley

BUG: Series(category).replace to maintain order of categories

2fb2fdb

lukemanley added Bug Categorical Categorical Data Type labels Jan 29, 2023

fix

c12d40b

mroeschke reviewed Jan 30, 2023

View reviewed changes

mroeschke reviewed Feb 1, 2023

View reviewed changes

mroeschke added this to the 2.0 milestone Feb 1, 2023

mroeschke approved these changes Feb 1, 2023

View reviewed changes

mroeschke merged commit 6478e70 into pandas-dev:main Feb 1, 2023

lukemanley deleted the categorical-replace-order branch February 23, 2023 01:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Series(category).replace to maintain order of categories #51057

BUG: Series(category).replace to maintain order of categories #51057

lukemanley commented Jan 29, 2023

mroeschke Jan 30, 2023

lukemanley Jan 31, 2023

mroeschke Feb 1, 2023

lukemanley Feb 1, 2023

mroeschke commented Feb 1, 2023

BUG: Series(category).replace to maintain order of categories #51057

BUG: Series(category).replace to maintain order of categories #51057

Conversation

lukemanley commented Jan 29, 2023

mroeschke Jan 30, 2023

Choose a reason for hiding this comment

lukemanley Jan 31, 2023

Choose a reason for hiding this comment

mroeschke Feb 1, 2023

Choose a reason for hiding this comment

lukemanley Feb 1, 2023

Choose a reason for hiding this comment

mroeschke commented Feb 1, 2023