-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Fixed regression in Series.duplicated for categorical dtype with bool categories #44356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
phofl
commented
Nov 8, 2021
- closes REGR: Series.duplicated with category dtype and nulls raises ValueError #44351
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @phofl Generally LGTM.
I think could maybe also mention drop_duplicates
in the release note, but probably don't need extra tests.
However, the code sample in the issue OP is based on test_drop_duplicates_categorical_bool
in pandas/tests/series/methods/test_drop_duplicates.py
so could make sense to co-locate the tests and also test for drop_duplicates
Added the test and modified the release note |
) | ||
result = ser.drop_duplicates() | ||
expected = Series( | ||
Categorical([True, False, np.nan], categories=[True, False], ordered=True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might be a bug as this is not preserving NA (but unrelated / not a regression, so pls open a new issue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, #44405
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test case is actually a bit misleading. This regression was just about a boolean categorical with missing values, not specifically with NA
(also if you create the data with np.nan, you had the same issue)
@meeseeksdev backport 1.3.x |
thanks @phofl |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
… categories (pandas-dev#44356) (cherry picked from commit 9f54f70)