BUG: Categoricals shouldn't allow non-strings when object dtype is passed (#13919) #14027
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
git diff upstream/master | flake8 --diff
Why this change is needed: Categorical variables are by definition single types, so to allow them to take on various different kinds of values is misleading. Object dtypes should only be allowed when ALL strings or ALL periods are passed (due to the way there are handled internally).
The result of this PR will raise a
TypeError
when a categorical is created that has an object dtype but doesn't contain allstring
or allperiod
values.I have a couple questions:
When using
MultiIndex.from_arrays
, it creates Categories here, which can have mixed dtypes, and unfortunately my code disallows this. Any tips on how to circumvent this?Also, this test in
test_constructor
is problematic bc it converts the catergoriesdtype
toobject
ifNaN
is in the categories (although this is deprecated). Should I change my code to allow this, or can I assert that this produces a TypeErrorAny feedback is appreciated, thanks