-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: handle multi-dimensional grouping better #36605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the PR!
Could you add tests illustrating the new behavior? That's the first thing we look for when reviewing
Bug in issue pandas-dev#36158 appeared both for indices and get_group().
Hi @arw2019 and thanks for initial review! |
All tasks successful, tests passed, ready for review. |
I thought I would mention this here, for visibility: The error also occurs with regular functions, not just |
@AlexanderCecile thank you very much for your input, highly appreciated. Fair point, behavior is the same both for lambda and defined function. Nature of a bug: |
Changes [from comment](pandas-dev#36605 (review)) by @rhshadrach
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
@dimithras - Sorry, I missed your ping. If this is read for review, can you merge master and take this off draft status? |
Hi @rhshadrach , sorry for letting this hanged. I looked at the failing test and issues connected to it, I'm more or less sure it should not be a problem to remove the test. Yet would be better to have a green light from you or someone else more familiar with pandas dev than me. I'll remove the failing test tomorrow (CEST) so that it will be ready to merge. |
Bug in issue pandas-dev#36158 appeared both for indices and get_group().
Missed `self.` during making PR, tests failed. Now corrected!
Changes [from comment](pandas-dev#36605 (review)) by @rhshadrach
As described in issue pandas-dev#36158 multi-dimensional groups prepared with lambda raise error on indices as isnan() is not defined for MultiIndex. Grouping already has the check for MultiIndex on line 440, but it's not triggered on __init__ when grouper is represented by lambda function and gets casted to MultiIndex later where MT stays as is.
Previous assertion replaced with a new check. Added description to linking issue. Co-Authored-By: Richard Shadrach <[email protected]>
Bug in issue pandas-dev#36158 for indices and get_group().
Changes [from comment](pandas-dev#36605 (review)) by @rhshadrach
Previous assertion replaced with a new check. Added description to linking issue. Co-Authored-By: Richard Shadrach <[email protected]>
Master merged with no conflicts, I also did a rebase to shrink the number of commits. |
Converted to draft, test_counting.py |
After merging latest master issue is gone without applying the suggested fixes both by me and @rhshadrach . Removing all changes yet leaving newly added tests referring to original issue.
After merging with latest master the issue is gone without making changes to grouper. @rhshadrach unfortunately All changes reverted, PR left with added tests for original issue. |
py37_32bit failed from out of memory error, Windows py38_np18 has one unrelated failure:
|
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating, changes look good.
@jreback will you have a look if you have time? |
thanks @dimithras |
As described in issue #36158 multi-dimensional groups prepared with lambda or named function raise error on indices as isnan() is not defined for MultiIndex.
Grouping already has the check for MultiIndex on line 440, but it's not triggered on init when grouper is represented by lambda function and gets casted to MultiIndex later where MultiIndex stays as is.
Tested on the code provided in #36158, yet not sure if significant testing is needed as this is a relatively small change to code.Grouper code has not been changed for a while, should be save to merge.
Tests for this particular issue provided.
Concerns: probably it will be better to makeisinstance(grouper, MultiIndex):
a function as after this change this piece of code occurs twice in init.Those operate on different grouper object. First one is passed on init, second one is
self
, where grouper is changed from lambda to MultiIndex and needs further change added in PR.black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff