-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: DataFrameGroupby std/sem modify grouped column when as_index=False #33630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a0cd28c
to
042d52b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR - I think this is a good start
use .get_indexer in the columbus |
@jreback tests now passing |
@WillAyd thanks for the comments, changes made. |
"size", | ||
"skew", | ||
): | ||
pytest.skip("Skip until #5755 is resolved") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you give some more text on what is the issue in 5755
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I broke out the tests which I think have a clear desired behavior but fail from the ones where 5755 needs a resolution, and added a description for each.
Yes, I confirmed on master using this MWE:
This is issue #22046. It is also mentioned in the docstring of iLocIndexer._setitem_with_indexer:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. if you make those comment updates and ping on green.
pandas/core/groupby/groupby.py
Outdated
cols = result.columns.get_indexer_for( | ||
result.columns.difference(self.exclusions).unique() | ||
) | ||
# .values to remove labels; iLocIndexer._setitem_with_indexer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, pls add the issue number here that you found, along with a TODO
pandas/core/groupby/groupby.py
Outdated
@@ -1301,6 +1311,7 @@ def var(self, ddof: int = 1): | |||
"var", alt=lambda x, axis: Series(x).var(ddof=ddof) | |||
) | |||
else: | |||
# TODO: implement at Cython level? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment not needed
pandas/core/groupby/groupby.py
Outdated
cols = result.columns.get_indexer_for( | ||
result.columns.difference(self.exclusions).unique() | ||
) | ||
# .values in both numerator and denominator to remove labels; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above, TODO with an issue number
34fcda3
to
76cd654
Compare
@jreback Changes made, and mostly green. One Travis build failed because of a broken connection. Should I force push in this situation to have the check run again? |
@jreback Merged to fix conflicts, all green now. |
thanks @rhshadrach very nice |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
When working on this, I noticed an unrelated line of code that could be moved inside an if-block and made the change. Should unrelated cleanups like this be left to a separate PR? Can revert if that's the case.