-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: groupby.describe with as_index=False incorrect #49643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a test where multiple keys are passed into groupby as well e.g. df.groupby(["A", "B"], as_index=False).describe()
?
There is now 😄. Added for Series and DataFrame. There are tests for multiple groupings, but I think all cases were not in-axis, so made sense to add these too. |
Nice thanks @rhshadrach (the groupby describe output looks more sane now) |
* BUG: groupby.describe with as_index=False incorrect * Add test for two groupings * Simplify logic
* BUG: groupby.describe with as_index=False incorrect * Add test for two groupings * Simplify logic
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.The describe bug was found while comparing differences between
_selected_obj
and_obj_with_exclusions
(#46944). By removingas_index
from the conditions that determine_selected_obj
, we fix the describe bug, the apply inconsistency (see below), and bring_selected_obj
and_obj_with_exclusions
closer so that the latter can eventually be removed.Currently
groupby(...).apply
withas_index=True
will try to use the group keys, then if that fails with a TypeError will try to redo the computation without the group keys. In main, this is also attempted whenas_index=False
, however in that case the selection context does not exclude the group keys and so it's really just attempting the same exact computation again. This PR fixes this inconsistency by makingas_index=False
adopt theas_index=True
behavior. In the one test that picked this up, we now agree with the original behavior that was desired in #7002.