-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG/API: Unordered Categorical should ignore order in comparisons? #16014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
yes this seems reasonable. creating a if you change this how much breakage on tests? (IOW this is going to affect |
Once upon a time, categoricals were modeled after factor in R and there, levels (=categories) were sorted:
In that case, both cases would result in the same categories. I think the sorting was removed ("categories in order of apearance") and this was overlooked? [edit] None of the commits show this (as far as my github blame skill reach...), so it either happend in one of the early discussions or in my imagination and this was simpy overlooked in the implementation.[/] |
@jreback no, when no categories are specified, the found values are sorted: In [1]: pd.Categorical(['b', 'a'])
It just didn't happen AFAIK :-) |
right, forgot we sort :> so the question is why are we differentiating between passed categories and categories created from direct factorizing in the For
But these all should be equivalent |
I'll play around with it when I'm doing the CategoricalDtype stuff. For ease of implementation, we'll have to do something. Having an "ordering" function that deterministically orders a given set of categories is necessary so that the codes match up.
|
If we are comparing to R, they do not care about order of categories when they are not ordered:
but ... they also don't care when they are ordered:
If we change something, I would personally keep it so that this second case (ordered categories with different order) raises when trying to compare.
I certainly think we should not change the current behaviour, which is have sorted categories when you do not specify them manually.
Because it is a different case. In case of passed categories, you explicitly pass them, so I find it logical that pandas respects the passed values by the user. BTW, there is no difference between the
|
this is all fine, but then they should compare equally. The point of ``ordered=False` is the don't care about the ordering (so its a set rather than a list comparison). |
Agreed
Well... usually :) In [1]: import pandas as pd
In [2]: pd.Categorical(['b', 'a', 1])
Out[2]:
[b, a, 1]
Categories (3, object): [1, a, b]
In [3]: pd.Categorical(['b', 'a', 1, pd.Timestamp('2017')])
Out[3]:
[b, a, 1, 2017-01-01 00:00:00]
Categories (4, object): [b, a, 1, 2017-01-01 00:00:00] But I agree that the current behavior around building categorical shouldn't change, just the behavior when comparing them. I think we're all in agreement on that. |
Yep! Your corner case is a bit strange in that in the first case it seems to be sorted and in the second not, while in both cases the values are not sortable (eg if you would put them in an index or series and sort, you get an error). |
Fixes categorical comparison operations improperly considering ordering when two unordered categoricals are compared. Closes pandas-dev#16014
Fixes categorical comparison operations improperly considering ordering when two unordered categoricals are compared. Closes pandas-dev#16014
Fixes categorical comparison operations improperly considering ordering when two unordered categoricals are compared. Closes pandas-dev#16014
Fixes categorical comparison operations improperly considering ordering when two unordered categoricals are compared. Closes pandas-dev#16014
Fixes categorical comparison operations improperly considering ordering when two unordered categoricals are compared. Closes pandas-dev#16014
Fixes categorical comparison operations improperly considering ordering when two unordered categoricals are compared. Closes #16014
Fixes categorical comparison operations improperly considering ordering when two unordered categoricals are compared. Closes pandas-dev#16014
Fixes categorical comparison operations improperly considering ordering when two unordered categoricals are compared. Closes pandas-dev#16014 (cherry picked from commit 91e9e52)
Fixes categorical comparison operations improperly considering ordering when two unordered categoricals are compared. Closes pandas-dev#16014
I think that when comparing two unordered categorical-dtyped series with categories which differ only by ordered should compare equal
Code Sample, a copy-pastable example if possible
Expected Output
I think this should return True. Unordered categories shouldn't care about the order :)
cc @JanSchulz thoughts?
The text was updated successfully, but these errors were encountered: