-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: Minor improvements groupby user guide #56465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -602,7 +601,7 @@ Any reduction method that pandas implements can be passed as a string to | |||
grouped.agg("sum") | |||
|
|||
The result of the aggregation will have the group names as the | |||
new index along the grouped axis. In the case of multiple keys, the result is a | |||
new index. In the case of multiple keys, the result is a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed since axis=1
is deprecated
@@ -846,15 +845,14 @@ The following methods on GroupBy act as transformations. Of these methods, only | |||
:meth:`~.DataFrameGroupBy.cumsum`;Compute the cumulative sum within each group | |||
:meth:`~.DataFrameGroupBy.diff`;Compute the difference between adjacent values within each group | |||
:meth:`~.DataFrameGroupBy.ffill`;Forward fill NA values within each group | |||
:meth:`~.DataFrameGroupBy.fillna`;Fill NA values within each group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed since fillna is deprecated
|
||
# Use .agg function to aggregate over standard and "nuisance" data types | ||
# at the same time | ||
df_dec.groupby(["id"]).agg({"int_column": "sum", "dec_column": "sum"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You used to need to do this to include nuisance columns; not anymore. Now they are included by default.
@@ -1350,35 +1339,53 @@ The returned dtype of the grouped will *always* include *all* of the categories | |||
|
|||
s = ( | |||
pd.Series([1, 1, 1]) | |||
.groupby(pd.Categorical(["a", "a", "a"], categories=["a", "b"]), observed=False) | |||
.groupby(pd.Categorical(["a", "a", "a"], categories=["a", "b"]), observed=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This demonstrates unobserved categories are being kept, so I think it makes more sense to show with observed=True
Just noting to feel free to liberally move examples and behavior notes to docstrings or remove sections that are too niche/outdated. IMO I feel that information is best served in docstrings as opposed to user guides given how outdated they are |
Thanks @rhshadrach |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.