-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Fixes Issues#44132, #40148, #29033, #22275, #18869: groupby #44142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Resolution to Issue#44132. A column of strings was being excluded from the result of .sum(). The problem is caused by the test in determining if the group being summed has only numeric columns thereby excluding the column with strings which can also be summed.
Fix tests to consider only columns that are in the expected results and ignore new columns produced by bug fix.
Fix tests to consider only columns that are in the expected results and ignore new columns produced by bug fix.
Fix tests to consider only columns that are in the expected results and ignore new columns produced by bug fix.
Fix tests to consider only columns that are in the expected results and ignore new columns produced by bug fix.
Hello @ikramersh! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-11-10 04:53:05 UTC |
Fix tests to consider only columns that are in the expected results and ignore new columns produced by bug fix.
Of note: it appears you are modifying a lot of existing tests / docs which is a sign that this fix is probably not the right approach. Usually a fix should only address the example in the associated issue. |
I agree and would like some confirmation that the fix is desirable. It is a minute core change with a broad impact. To me it opens up a lot of opportunity because it's more pure Python where anything that can be summed is allowed, for example strings, tupples, lists, dictionaries etc are all included in results now. Previously strings for example were automatically excluded and not expected by the existing tests as results. Please can I get some direction as to the desirability of this approach? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1 on this change
it is unexpected from a user pov as all operations know ally to numeric columns
this can easily be done in nested columns with an .apply
That makes sense too but the fix also makes use of the numeric_only=True argument more applicable now as well as the existing warnings that non numeric columns should be filtered out prior to calling numeric operations. Happy to abort just explaining what I have discovered on this journey. |
The reported issue #44132 shows that the inclusion of not only numeric but all summable types in the result of .sum() is already part of dataframe behaviour. The complaint is that behaviour is not carried through after calling groupby() which is what this fix attempts to resolve. |
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
I think we already have a deprecation path in place for numeric_only=None, can close this. |
closing as stale and #44142 (comment) comment |
groupby.sum
removed columns in case sequential calls of severalgroupby.sum
#44132