Skip to content

BUG: Series groupby does not include nan counts for all categorical labels (#17605) #29690

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Nov 20, 2019

Conversation

OliverHofkens
Copy link

This is a simple, low-impact fix for #17605.

@OliverHofkens OliverHofkens changed the title Issue/17605 BUG: Series groupby does not include nan counts for all categorical labels (#17605) Nov 18, 2019
@WillAyd WillAyd added Groupby Categorical Categorical Data Type labels Nov 18, 2019
@OliverHofkens
Copy link
Author

FYI

Output of `python scripts/validate_docstrings.py pandas.core.groupby.GroupBy._reindex_output` ################################################################################ ########### Docstring (pandas.core.groupby.GroupBy._reindex_output) ########### ################################################################################

If we have categorical groupers, then we might want to make sure that
we have a fully re-indexed output to the levels. This means expanding
the output space to accommodate all values in the cartesian product of
our groups, regardless of whether they were observed in the data or
not. This will expand the output space if there are missing groups.

The method returns early without modifying the input if the number of
groupings is less than 2, self.observed == True or none of the groupers
are categorical.

Parameters

output : Series or DataFrame
Object resulting from grouping and applying an operation.
fill_value : scalar, default np.NaN
Value to use for unobserved categories if self.observed is False.

Returns

Series or DataFrame
Object (potentially) re-indexed to include all possible groups.

################################################################################
################################## Validation ##################################
################################################################################

1 Errors found:
Summary should fit in a single line
2 Warnings found:
See Also section not found
No examples section found

@OliverHofkens
Copy link
Author

Thanks for the great feedback so far @TomAugspurger, @WillAyd, @jreback! I've made the requested changed, feel free to re-review.

@jreback jreback added this to the 1.0 milestone Nov 20, 2019
@jreback jreback merged commit c5a1f9e into pandas-dev:master Nov 20, 2019
@jreback
Copy link
Contributor

jreback commented Nov 20, 2019

thanks @OliverHofkens nice patch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Groupby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Series groupby does not include zero or nan counts for all categorical labels, unlike DataFrame groupby
4 participants