-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ERR: Categoricals should not allow non-strings when an object dtype is passed #13919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
this is in general a completely bad idea. categories are by-definition single dtypes (and not object), except for strings. These should raise actually on creation. |
if you'd like to push a PR for error reporting in catetgorical creation that would be great. |
table
) doesn't support categories with mixed type inside a category
Yeah, I agree with you that it should be raised on creation. I had a thousands of columns loaded from CSV and was converting them automatically to categories based on something like Will think about doing that PR. Thanks Jeff! |
Sorry, I can't do it in a near future (say, month). But I am marking it down and maybe come back to it. If anyone wants to help here, it's very welcome. |
@jreback Can you clarify what you mean by saying "Categoricals should not allow non-strings when an object dtype is passed" My PR above takes this literally, which breaks many tests, particularly here . That list comprehension passes in In my PR, should I just check if all the values are of the same type? Thanks |
Hello @wcwagner . Thank you for taking a look on this. I believe that it was more like the second option - individual values should be homogenous (in the terms of dtypes). So don't allow something like this: If I should implement it extremely naively, I would do something like: categories_types = [type(x) for x in categories]
if len(categories_types) > 1:
raise ValueError('Categories must be all of the same type. They are %s', categories_types) |
@hnykda we don't allow for
note this should only be done on the categories as these are already coerced as much as possible. |
Repeating what I said in the PR (#14047): personally, I don't think we should check this at Categorical construction, I would rather check for this in the hdf code itself. |
I don't think it's ever useful to support mixed dtypes inside a Categorical even if it's technically possible. |
Given the comments on the PR, it's not technically impossible to disallow, but it would make the implementation (which is mixed with MultiIndex) more complex. |
well, we are not tested at all for mixed type categoricals. I think its pretty reasonable to disallow them; makes them easier to deal with, more meaningful and pure. |
im with @jorisvandenbossche on this one. pd.Categorical can accept pretty much anything that pd.Index can accept. |
HDF Store (with
table
) doesn't support categories with mixed type inside a categoryEven though it is possible to store category type using
to_hdf
withtable
, you can't do that when you have mixed types inside the category. It would be nice to mention this at least on docs.Code Sample, a copy-pastable example if possible
the error:
Expected Output
No error, saved file.
output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: