-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: should pandas.core.arrays.Categorical have a dropna=False option? #35162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think having NA / NaN keys in the categories is probably a bad idea since it leads to so many downstream complexities. |
agreed this is a bad idea |
Indeed, if I remember correctly, the categories initially could contain a missing value as well. But so that way you can have missing data in two ways: missing indicated by the codes (-1) or missing indicated by the category value. And we couldn't think of a good reason to allow both, so opted for only missing data in the codes, and not in the categories. I didn't look at the linked issue yet, but is there a use case in groupby where having missing categories would be useful? |
Ok! Sounds like it's a bad idea. @jorisvandenbossche In the linked PR I'm fixing a problem with
and in the source code we get the categories by calling As an aside, #28927 seems related although it looks like there things basically work @jreback I don't think we have warnings when someone calls |
@arw2019 what needs to be done here? Is the original post accurate for what we want? |
I think if we add a note in the docs that's good enough. I'll submit a PR |
I ran into this while working on #35078. Here's a simple reproducer:
Do people think it's worth having a
dropna
argument for Categorical, so that one could do:If we set
dropna=False
by default presumably there shouldn't be backward compatibility issues.I can see arguments either way! In case we do want to add this I'm happy work on it (and the solution in #35078 will be a lot cleaner)
The text was updated successfully, but these errors were encountered: