Skip to content

DOC: Minor improvements groupby user guide #56465

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 12, 2023

Conversation

rhshadrach
Copy link
Member

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

@@ -602,7 +601,7 @@ Any reduction method that pandas implements can be passed as a string to
grouped.agg("sum")

The result of the aggregation will have the group names as the
new index along the grouped axis. In the case of multiple keys, the result is a
new index. In the case of multiple keys, the result is a
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed since axis=1 is deprecated

@@ -846,15 +845,14 @@ The following methods on GroupBy act as transformations. Of these methods, only
:meth:`~.DataFrameGroupBy.cumsum`;Compute the cumulative sum within each group
:meth:`~.DataFrameGroupBy.diff`;Compute the difference between adjacent values within each group
:meth:`~.DataFrameGroupBy.ffill`;Forward fill NA values within each group
:meth:`~.DataFrameGroupBy.fillna`;Fill NA values within each group
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed since fillna is deprecated


# Use .agg function to aggregate over standard and "nuisance" data types
# at the same time
df_dec.groupby(["id"]).agg({"int_column": "sum", "dec_column": "sum"})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You used to need to do this to include nuisance columns; not anymore. Now they are included by default.

@@ -1350,35 +1339,53 @@ The returned dtype of the grouped will *always* include *all* of the categories

s = (
pd.Series([1, 1, 1])
.groupby(pd.Categorical(["a", "a", "a"], categories=["a", "b"]), observed=False)
.groupby(pd.Categorical(["a", "a", "a"], categories=["a", "b"]), observed=True)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This demonstrates unobserved categories are being kept, so I think it makes more sense to show with observed=True

@mroeschke
Copy link
Member

Just noting to feel free to liberally move examples and behavior notes to docstrings or remove sections that are too niche/outdated. IMO I feel that information is best served in docstrings as opposed to user guides given how outdated they are

@mroeschke mroeschke added this to the 2.2 milestone Dec 12, 2023
@mroeschke mroeschke merged commit acc395a into pandas-dev:main Dec 12, 2023
@mroeschke
Copy link
Member

Thanks @rhshadrach

@rhshadrach rhshadrach deleted the doc_gb_user_guide branch December 19, 2023 11:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants