-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Categorical.remove_categories(np.nan) fails when underlying dtype is float #10304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
result = result.remove_categories(np.nan) | ||
expected = Categorical([], categories=[1.0, 2.0]) | ||
self.assert_categorical_equal(result, expected) | ||
|
||
def test_remove_unused_categories(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test for this as well >>> pd.Categorical([], categories=[np.nan, None])
also add in using None
in the .remove_categories
. Pls test on an object dtype as well as floating-point.
Bonus points if you can make this work for datetimelike (e.g. using pd.NaT
.
can you update |
I think these tests are what you were asking for. Let me know if otherwise. |
yep, ping when green. |
It turns out that this already works for datetimelike categoricals. I added some tests and squashed. |
BUG: Categorical.remove_categories(np.nan) fails when underlying dtype is float
awesome @evanpw ! keep em coming |
Thanks very much for fixing this. I am not sure if I got this right but would not it make sense to reorder the code slightly to avoid trying removing categories twice when
Instead of this:
|
In your if branch, what's the original definition of not_included and new_categories? It looks like they're defined in terms of themselves. |
Fair point ;) |
Fixes GH #10156. This also makes different null values indistinguishable inside of remove_categories, but they're already indistinguishable in most other contexts: