Skip to content

BUG: DataFrameGroupby std/sem modify grouped column when as_index=False #33630

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 19, 2020

Conversation

rhshadrach
Copy link
Member

@rhshadrach rhshadrach commented Apr 18, 2020

When working on this, I noticed an unrelated line of code that could be moved inside an if-block and made the change. Should unrelated cleanups like this be left to a separate PR? Can revert if that's the case.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR - I think this is a good start

@jreback
Copy link
Contributor

jreback commented Apr 19, 2020

use .get_indexer in the columbus
and then use .iloc to avoid all issues

@rhshadrach
Copy link
Member Author

@jreback tests now passing

@rhshadrach
Copy link
Member Author

@WillAyd thanks for the comments, changes made.

"size",
"skew",
):
pytest.skip("Skip until #5755 is resolved")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give some more text on what is the issue in 5755

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I broke out the tests which I think have a clear desired behavior but fail from the ones where 5755 needs a resolution, and added a description for each.

@rhshadrach
Copy link
Member Author

@jreback

can you reproduce this bug independently of this PR? please open a separate issue, this is a problem that needs fixing.

Yes, I confirmed on master using this MWE:

idx = pd.MultiIndex.from_tuples((("a", "a"), ("a", "a")))
df = pd.DataFrame([[1, 1]], columns=idx)
df.iloc[:, [0, 1]] = df.iloc[:, [0, 1]]

This is issue #22046. It is also mentioned in the docstring of iLocIndexer._setitem_with_indexer:

This method is currently broken when dealing with non-unique Indexes,
since it goes from positional indexers back to labels when calling
BlockManager methods, see GH#12991, GH#22046, GH#15686.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. if you make those comment updates and ping on green.

cols = result.columns.get_indexer_for(
result.columns.difference(self.exclusions).unique()
)
# .values to remove labels; iLocIndexer._setitem_with_indexer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, pls add the issue number here that you found, along with a TODO

@@ -1301,6 +1311,7 @@ def var(self, ddof: int = 1):
"var", alt=lambda x, axis: Series(x).var(ddof=ddof)
)
else:
# TODO: implement at Cython level?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment not needed

cols = result.columns.get_indexer_for(
result.columns.difference(self.exclusions).unique()
)
# .values in both numerator and denominator to remove labels;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above, TODO with an issue number

@rhshadrach rhshadrach force-pushed the std_as_index_false branch from 34fcda3 to 76cd654 Compare May 10, 2020 16:58
@rhshadrach
Copy link
Member Author

@jreback Changes made, and mostly green. One Travis build failed because of a broken connection. Should I force push in this situation to have the check run again?

@rhshadrach
Copy link
Member Author

@jreback Merged to fix conflicts, all green now.

@jreback jreback added this to the 1.1 milestone May 19, 2020
@jreback jreback merged commit 0babe10 into pandas-dev:master May 19, 2020
@jreback
Copy link
Contributor

jreback commented May 19, 2020

thanks @rhshadrach very nice

@rhshadrach rhshadrach deleted the std_as_index_false branch July 11, 2020 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: STD modifies groupby target column when as_index=False
3 participants