-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: GroupBy.apply() throws erroneous ValueError with duplicate axes #35441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Looks good - just some minor comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, one small detail I missed previously.
lgtm. pls merge master and ping on green. |
@jreback I think this is ready to merge. |
thanks @smithto1 very nice |
@@ -940,10 +940,6 @@ def test_frame_describe_multikey(tsframe): | |||
groupedT = tsframe.groupby({"A": 0, "B": 0, "C": 1, "D": 1}, axis=1) | |||
result = groupedT.describe() | |||
expected = tsframe.describe().T | |||
expected.index = pd.MultiIndex( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One existing test previously had to manipulate the expected index, but that is no longer necessary.
@smithto1 I don't understand the change to the expected here, can you expand on why the new value is correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the bugfix in this PR, I think .describe()
on axis=1
is now encountering the bug reported in #26805, #22848 #22546 where group_keys=True
is ignored.
The behaviour of the new code with group_keys=True/False
is the same as the old code with group_keys=False
.
I'll try to take a look at the 'group_keys' bug; if that is fixed I think this test will probably be reverted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may be addressed by #34998 but for the axis=1
case.
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Behavioural Change
GroupBy.apply
would sometimes throw an erroneousValueError: cannot reindex from duplicate axis
because when it checked if the output was mutated from its original shape it would only check the index, when sometimes it should check the columns.Tests
One new test added on
test_apply.py
which fails on master because of the bug; passes with this PR. One existing test was marked xfail but the xfail is now removed. One existing test previously had to manipulate the expected index, but that is no longer necessary.All of the copy-pasteable examples from #16646 are fixed with this PR.