-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Series(category).replace to maintain order of categories #51057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Series(category).replace to maintain order of categories #51057
Conversation
# GH51016 | ||
dtype = pd.CategoricalDtype([0, 1, 2], ordered=True) | ||
ser = pd.Series([0, 1, 2], dtype=dtype) | ||
result = ser.replace(0, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a test where a missing value is replaced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any. Series(category).replace
doesn't actually work for replacing missing values - see below. I checked 1.5 and it doesn't replace in 1.5 either.
In [13]: import pandas as pd
In [14]: import numpy as np
In [15]: dtype = pd.CategoricalDtype(["A", "B"])
In [16]: ser = pd.Series(["A", None], dtype=dtype)
In [17]: ser
Out[17]:
0 A
1 NaN
dtype: category
Categories (2, object): ['A', 'B']
In [18]: ser.replace(np.nan, "C") # <- this doesn't actually replace the missing value
Out[18]:
0 A
1 NaN
dtype: category
Categories (2, object): ['A', 'B']
new_categories = Index(ser.drop_duplicates(keep="first")) | ||
|
||
# GH51016: maintain order of existing categories | ||
idxr = cat.categories.get_indexer_for(all_values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do all this if ordered=False
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.5x maintains the existing order of categories regardless of ordered
so I'd lean towards keeping that behavior. The new ordering logic is barely measurable perf-wise relative to the whole replace operation.
Thanks @lukemanley |
replace
is changing the order of ordered categories #51016No whatsnew as this is a regression on main only.