Skip to content

BUG: ValueError on groupby with categoricals #35253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 13, 2020

Conversation

smithto1
Copy link
Member

@smithto1 smithto1 commented Jul 12, 2020

Within DataFrameGroupBy._cython_agg_blocks, if it is aggregating a one-column DataFrame, it creates a SeriresGroupBy, calls the function on that and takes the returned values. But the SeriesGroupBy also does the missing-categories reindexing. The DataFrameGroupBy ends up with values that contain the missing categories, and an index that does not. When they are passed into a BlockManager it raises a ValueError stating that their lengths don't match.

Solutions is to have _cython_agg_blocks create a SeriesGroupBy with observed=True so it doesn't do any reindexing. The reindexing is left to the calling DataFrameGroupBy

This also explains why error only occurred in DataFrameGroupBy but not SeriesGroupBy.

@pep8speaks
Copy link

pep8speaks commented Jul 12, 2020

Hello @smithto1! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-07-13 20:18:46 UTC

@jreback jreback added Bug Categorical Categorical Data Type Groupby labels Jul 13, 2020
@smithto1 smithto1 requested a review from jreback July 13, 2020 21:26
@smithto1
Copy link
Member Author

@jreback comments addressed and all checks passing. Can you do a re-review?

@jreback jreback added this to the 1.1 milestone Jul 13, 2020
@jreback jreback merged commit 0ed1dcd into pandas-dev:master Jul 13, 2020
@jreback
Copy link
Contributor

jreback commented Jul 13, 2020

very nice @smithto1 keep em coming!

fangchenli pushed a commit to fangchenli/pandas that referenced this pull request Jul 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type Groupby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: ValueError on groupby with categoricals
3 participants