Skip to content

BUG: groupby apply raises ValueError when groupby axis has duplicates and applied identity function #30679

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

fujiaxiang
Copy link
Member

@fujiaxiang fujiaxiang commented Jan 4, 2020

This is a more of a patch than a complete solution to this groupby apply paradigm.
When there are duplicates in the groupby axis, we restore the axis to its original order, but not guaranteeing that the order of the data with the same axis values is restored.

@alimcmaster1
Copy link
Member

Restarted the failing travis build

@pep8speaks
Copy link

pep8speaks commented Jan 5, 2020

Hello @fujiaxiang! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-01-20 13:53:25 UTC

@fujiaxiang fujiaxiang requested a review from jreback January 5, 2020 07:56
@fujiaxiang fujiaxiang requested a review from jreback January 10, 2020 15:04
@fujiaxiang
Copy link
Member Author

ping

@fujiaxiang fujiaxiang requested a review from WillAyd January 17, 2020 01:26
@fujiaxiang
Copy link
Member Author

All tests passed on my local machine which has the most recent master merged. Don't know why this fails. I see many other PRs also failing. Is this a CI issue?

@fujiaxiang fujiaxiang changed the title BUG: groupby apply raises ValueError when groupby axis has duplicates BUG: groupby apply raises ValueError when groupby axis has duplicates and applied identity function Jan 17, 2020
@fujiaxiang
Copy link
Member Author

The CI process keeps failing. Can anyone tell me how I can retrigger CI without pushing new commits?

@jreback jreback modified the milestone: 1.0.0 Jan 20, 2020
@jreback jreback added this to the 1.1 milestone Jan 20, 2020
@jreback jreback added the Bug label Jan 20, 2020
@jreback jreback merged commit be6a3bc into pandas-dev:master Jan 20, 2020
@jreback
Copy link
Contributor

jreback commented Jan 20, 2020

thanks @fujiaxiang very nice.

so a more longer term solution would be for Index.reindex(...) to handle duplicates (rather than raising), via a new keyword (e.g. errors='raise' or errors='coerce'`), though maybe have to think about the api a bit. this might be a fairly big change though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

groupby.apply fails with ValueError: cannot reindex from a duplicate axis
5 participants