Skip to content

BUG: groupby.agg doesn't include grouping columns in result when selected #51398

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 24, 2023

Conversation

rhshadrach
Copy link
Member

@rhshadrach rhshadrach added Bug Refactor Internal refactoring of code Groupby Apply Apply, Aggregate, Transform, Map labels Feb 15, 2023
@@ -1362,7 +1362,7 @@ Groupby/resample/rolling
- Bug in :meth:`.DataFrameGroupBy.describe` produced incorrect results when data had duplicate columns (:issue:`50806`)
- Bug in :meth:`.DataFrameGroupBy.agg` with ``engine="numba"`` failing to respect ``as_index=False`` (:issue:`51228`)
- Bug in :meth:`DataFrameGroupBy.agg`, :meth:`SeriesGroupBy.agg`, and :meth:`Resampler.agg` would ignore arguments when passed a list of functions (:issue:`50863`)
-
- Bug in :meth:`DataFrameGroupBy.agg` with multiple groupings would not include groupings in the result when they occurred in the selected columns (:issue:`51186`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with multiple groupings

this might not be clear without looking at the new test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I wasn't sure how to describe that the bug only occurs with .groupby([x, y]) and not .groupby(x). Is there better language that could be used here, or maybe just remove it altogether?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'after subsetting columns (e.g. gb[["a", "b"]].agg(...)'?

@jbrockmendel
Copy link
Member

Potential follow-up that this makes possible: _iterate_slices is only used in _python_agg_general, where it iterates over columns calling agg_series. agg_series iterates over groups. Instead we could iterate differently over groups, which would look a lot like _aggregate_frame.

Copy link
Member

@jbrockmendel jbrockmendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rhshadrach rhshadrach added this to the 2.0 milestone Feb 17, 2023
@rhshadrach rhshadrach modified the milestones: 2.0, 2.1 Feb 23, 2023
@rhshadrach
Copy link
Member Author

Assuming no objection, going to merge when greenish. Currently targeted at 2.1.

@rhshadrach rhshadrach merged commit c239f54 into pandas-dev:main Feb 24, 2023
@rhshadrach rhshadrach deleted the iterate_slices branch February 24, 2023 18:11
@jbrockmendel jbrockmendel mentioned this pull request Feb 24, 2023
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Bug Groupby Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

REF: de-duplicate iterate_slices
2 participants