-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: groupby reorders categorical categories #49131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@mroeschke - this doesn't tackle the performance aspect of #48749, only the behavior. It's not clear to me but it seems possible there is still something that can be done in regards to performance, so I think we should leave #48749 open for now. |
Agreed Also looks like a ASV benchmark needs addressing: https://github.com/pandas-dev/pandas/actions/runs/3261927947/jobs/5357896059 |
@@ -782,7 +782,7 @@ def test_preserve_categories(): | |||
# ordered=False | |||
df = DataFrame({"A": Categorical(list("ba"), categories=categories, ordered=False)}) | |||
sort_index = CategoricalIndex(categories, categories, ordered=False, name="A") | |||
nosort_index = CategoricalIndex(list("bac"), list("bac"), ordered=False, name="A") | |||
nosort_index = CategoricalIndex(list("bac"), list("abc"), ordered=False, name="A") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a GH reference by these changed tests?
Regarding the failing ASV, not sure how to approach here. Edit: I can reproduce locally, I just missed it in the output:
|
ASVs in groupby:
These benchmarks use data with length 100000 and 10000 categories. |
Awesome, thanks @rhshadrach |
* BUG: groupby reorders categorical categories * Tests and whatsnew * type-ignore * GH# * Add test * Add TODO * GH# * fixups * Revert test change; catch warnings
Ref: #48749
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Still needs more tests and a whatsnew