BUG: DataFrameGroupby std/sem modify grouped column when as_index=False #33630

rhshadrach · 2020-04-18T15:18:20Z

closes BUG: STD modifies groupby target column when as_index=False #10355
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

When working on this, I noticed an unrelated line of code that could be moved inside an if-block and made the change. Should unrelated cleanups like this be left to a separate PR? Can revert if that's the case.

WillAyd

Thanks for the PR - I think this is a good start

pandas/core/groupby/groupby.py

pandas/tests/groupby/test_function.py

pandas/core/groupby/groupby.py

jreback · 2020-04-19T23:16:46Z

use .get_indexer in the columbus
and then use .iloc to avoid all issues

doc/source/whatsnew/v1.1.0.rst

pandas/core/groupby/groupby.py

rhshadrach · 2020-04-21T23:05:59Z

@jreback tests now passing

pandas/core/groupby/groupby.py

doc/source/whatsnew/v1.1.0.rst

pandas/core/groupby/groupby.py

rhshadrach · 2020-04-25T12:15:07Z

@WillAyd thanks for the comments, changes made.

pandas/core/groupby/groupby.py

jreback · 2020-04-25T21:56:18Z

pandas/tests/groupby/test_function.py

+        "size",
+        "skew",
+    ):
+        pytest.skip("Skip until #5755 is resolved")


can you give some more text on what is the issue in 5755

I broke out the tests which I think have a clear desired behavior but fail from the ones where 5755 needs a resolution, and added a description for each.

pandas/core/groupby/groupby.py

rhshadrach · 2020-05-02T17:51:25Z

@jreback

can you reproduce this bug independently of this PR? please open a separate issue, this is a problem that needs fixing.

Yes, I confirmed on master using this MWE:

idx = pd.MultiIndex.from_tuples((("a", "a"), ("a", "a")))
df = pd.DataFrame([[1, 1]], columns=idx)
df.iloc[:, [0, 1]] = df.iloc[:, [0, 1]]

This is issue #22046. It is also mentioned in the docstring of iLocIndexer._setitem_with_indexer:

This method is currently broken when dealing with non-unique Indexes,
since it goes from positional indexers back to labels when calling
BlockManager methods, see GH#12991, GH#22046, GH#15686.

jreback

looks good. if you make those comment updates and ping on green.

jreback · 2020-05-10T14:56:44Z

pandas/core/groupby/groupby.py

+            cols = result.columns.get_indexer_for(
+                result.columns.difference(self.exclusions).unique()
+            )
+            # .values to remove labels; iLocIndexer._setitem_with_indexer


ok, pls add the issue number here that you found, along with a TODO

jreback · 2020-05-10T14:56:58Z

pandas/core/groupby/groupby.py

@@ -1301,6 +1311,7 @@ def var(self, ddof: int = 1):
                "var", alt=lambda x, axis: Series(x).var(ddof=ddof)
            )
        else:
+            # TODO: implement at Cython level?


comment not needed

jreback · 2020-05-10T14:57:18Z

pandas/core/groupby/groupby.py

+            cols = result.columns.get_indexer_for(
+                result.columns.difference(self.exclusions).unique()
+            )
+            # .values in both numerator and denominator to remove labels;


same comment as above, TODO with an issue number

rhshadrach · 2020-05-10T18:33:10Z

@jreback Changes made, and mostly green. One Travis build failed because of a broken connection. Should I force push in this situation to have the check run again?

rhshadrach · 2020-05-16T14:36:45Z

@jreback Merged to fix conflicts, all green now.

jreback · 2020-05-19T12:56:55Z

thanks @rhshadrach very nice

rhshadrach force-pushed the std_as_index_false branch from a0cd28c to 042d52b Compare April 18, 2020 15:27

WillAyd requested changes Apr 18, 2020

View reviewed changes

pandas/core/groupby/groupby.py Show resolved Hide resolved

pandas/core/groupby/groupby.py Show resolved Hide resolved

pandas/tests/groupby/test_function.py Outdated Show resolved Hide resolved

WillAyd added the Groupby label Apr 18, 2020

jreback requested changes Apr 19, 2020

View reviewed changes

pandas/core/groupby/groupby.py Outdated Show resolved Hide resolved

pandas/core/groupby/groupby.py Outdated Show resolved Hide resolved

jreback requested changes Apr 21, 2020

View reviewed changes

doc/source/whatsnew/v1.1.0.rst Show resolved Hide resolved

pandas/core/groupby/groupby.py Outdated Show resolved Hide resolved

WillAyd requested changes Apr 22, 2020

View reviewed changes

pandas/core/groupby/groupby.py Outdated Show resolved Hide resolved

doc/source/whatsnew/v1.1.0.rst Outdated Show resolved Hide resolved

pandas/core/groupby/groupby.py Show resolved Hide resolved

jreback requested changes Apr 25, 2020

View reviewed changes

jreback requested changes May 2, 2020

View reviewed changes

pandas/core/groupby/groupby.py Show resolved Hide resolved

pandas/core/groupby/groupby.py Show resolved Hide resolved

jreback requested changes May 10, 2020

View reviewed changes

BUG: DataFrameGroupby std/sem modify grouped column when as_index=False

76cd654

rhshadrach force-pushed the std_as_index_false branch from 34fcda3 to 76cd654 Compare May 10, 2020 16:58

Merge remote-tracking branch 'upstream/master' into std_as_index_false

b4bf083

jreback added this to the 1.1 milestone May 19, 2020

jreback approved these changes May 19, 2020

View reviewed changes

jreback merged commit 0babe10 into pandas-dev:master May 19, 2020

rhshadrach mentioned this pull request May 25, 2020

CLN: Unify signatures in _libs.groupby #34372

Merged

5 tasks

rhshadrach deleted the std_as_index_false branch July 11, 2020 16:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: DataFrameGroupby std/sem modify grouped column when as_index=False #33630

BUG: DataFrameGroupby std/sem modify grouped column when as_index=False #33630

rhshadrach commented Apr 18, 2020 •

edited

Loading

WillAyd left a comment

jreback commented Apr 19, 2020

rhshadrach commented Apr 21, 2020

rhshadrach commented Apr 25, 2020

jreback Apr 25, 2020

rhshadrach Apr 26, 2020

rhshadrach commented May 2, 2020

jreback left a comment

jreback May 10, 2020

jreback May 10, 2020

jreback May 10, 2020

rhshadrach commented May 10, 2020

rhshadrach commented May 16, 2020

jreback commented May 19, 2020

BUG: DataFrameGroupby std/sem modify grouped column when as_index=False #33630

BUG: DataFrameGroupby std/sem modify grouped column when as_index=False #33630

Conversation

rhshadrach commented Apr 18, 2020 • edited Loading

WillAyd left a comment

Choose a reason for hiding this comment

jreback commented Apr 19, 2020

rhshadrach commented Apr 21, 2020

rhshadrach commented Apr 25, 2020

jreback Apr 25, 2020

Choose a reason for hiding this comment

rhshadrach Apr 26, 2020

Choose a reason for hiding this comment

rhshadrach commented May 2, 2020

jreback left a comment

Choose a reason for hiding this comment

jreback May 10, 2020

Choose a reason for hiding this comment

jreback May 10, 2020

Choose a reason for hiding this comment

jreback May 10, 2020

Choose a reason for hiding this comment

rhshadrach commented May 10, 2020

rhshadrach commented May 16, 2020

jreback commented May 19, 2020

rhshadrach commented Apr 18, 2020 •

edited

Loading