Skip to content

Fixes Issues#44132, #40148, #29033, #22275, #18869: groupby #44142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 23 commits into from

Conversation

Resolution to Issue#44132.
A column of strings was being excluded from the result of .sum(). The problem is caused by the test in determining if the group being summed has only numeric columns thereby excluding the column with strings which can also be summed.
@ikramersh ikramersh changed the title Fix Issue#44132: BUG: groupby.sum removed columns in case sequential calls of several groupby.sum Fixes Issue#44132: BUG: groupby.sum removed columns in case sequential calls of several groupby.sum Oct 22, 2021
Fix tests to consider only columns that are in the expected results and ignore new columns produced by bug fix.
Fix tests to consider only columns that are in the expected results and ignore new columns produced by bug fix.
Fix tests to consider only columns that are in the expected results and ignore new columns produced by bug fix.
Fix tests to consider only columns that are in the expected results and ignore new columns produced by bug fix.
@pep8speaks
Copy link

pep8speaks commented Oct 23, 2021

Hello @ikramersh! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-11-10 04:53:05 UTC

@mroeschke
Copy link
Member

Of note: it appears you are modifying a lot of existing tests / docs which is a sign that this fix is probably not the right approach. Usually a fix should only address the example in the associated issue.

@ikramersh
Copy link
Author

ikramersh commented Oct 24, 2021

Of note: it appears you are modifying a lot of existing tests / docs which is a sign that this fix is probably not the right approach. Usually a fix should only address the example in the associated issue.

I agree and would like some confirmation that the fix is desirable. It is a minute core change with a broad impact. To me it opens up a lot of opportunity because it's more pure Python where anything that can be summed is allowed, for example strings, tupples, lists, dictionaries etc are all included in results now. Previously strings for example were automatically excluded and not expected by the existing tests as results.

Please can I get some direction as to the desirability of this approach?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 on this change

it is unexpected from a user pov as all operations know ally to numeric columns

this can easily be done in nested columns with an .apply

@ikramersh
Copy link
Author

ikramersh commented Oct 24, 2021

-1 on this change

it is unexpected from a user pov as all operations know ally to numeric columns

this can easily be done in nested columns with an .apply

That makes sense too but the fix also makes use of the numeric_only=True argument more applicable now as well as the existing warnings that non numeric columns should be filtered out prior to calling numeric operations.

Happy to abort just explaining what I have discovered on this journey.

@ikramersh
Copy link
Author

The reported issue #44132 shows that the inclusion of not only numeric but all summable types in the result of .sum() is already part of dataframe behaviour. The complaint is that behaviour is not carried through after calling groupby() which is what this fix attempts to resolve.

@jreback jreback added Groupby Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply labels Oct 28, 2021
@ikramersh ikramersh changed the title Fixes Issue#44132: BUG: groupby.sum removed columns in case sequential calls of several groupby.sum Fixes Issue#44132, #29033: BUG: groupby.sum Nov 5, 2021
@ikramersh ikramersh changed the title Fixes Issue#44132, #29033: BUG: groupby.sum Fixes Issue#44132, #40148, #29033: BUG: groupby.sum Nov 6, 2021
@ikramersh ikramersh changed the title Fixes Issue#44132, #40148, #29033: BUG: groupby.sum Fixes Issues#44132, #40148, #29033: groupby Nov 6, 2021
@ikramersh ikramersh changed the title Fixes Issues#44132, #40148, #29033: groupby Fixes Issues#44132, #40148, #29033, #22275: groupby Nov 7, 2021
@ikramersh ikramersh changed the title Fixes Issues#44132, #40148, #29033, #22275: groupby Fixes Issues#44132, #40148, #29033, #22275, #18869: groupby Nov 8, 2021
@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Dec 11, 2021
@jbrockmendel
Copy link
Member

I think we already have a deprecation path in place for numeric_only=None, can close this.

@jbrockmendel jbrockmendel added the Closing Candidate May be closeable, needs more eyeballs label Dec 27, 2021
@jreback
Copy link
Contributor

jreback commented Jan 16, 2022

closing as stale and #44142 (comment) comment

@jreback jreback closed this Jan 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closing Candidate May be closeable, needs more eyeballs Groupby Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply Stale
Projects
None yet
5 participants