Bug: Grouping by index and column fails on DataFrame with single index (GH14327) #14333

jonmmease · 2016-10-02T01:31:15Z

closes Grouping by index and column fails on DataFrame with single index #14327
tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

Don't know if this is too late for 0.19.0 but I went ahead and added the whatsnew entry there for now.

Existing logic under "if level is not None:" assumed that index was a MultiIndex. Now we check and also handle the case where an Index is passed in with a None grouper. This resolves GH 14327

codecov-io · 2016-10-02T01:58:22Z

Current coverage is 85.26% (diff: 95.83%)

Merging #14333 into master will increase coverage by <.01%

@@             master     #14333   diff @@
==========================================
  Files           140        140          
  Lines         50630      50639     +9   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43169      43177     +8   
- Misses         7461       7462     +1   
  Partials          0          0

Powered by Codecov. Last update 6dcc238...33eb725

jreback · 2016-10-03T18:04:13Z

doc/source/whatsnew/v0.19.0.txt

@@ -1584,3 +1584,4 @@ Bug Fixes
 - ``PeridIndex`` can now accept ``list`` and ``array`` which contains ``pd.NaT`` (:issue:`13430`)
 - Bug in ``df.groupby`` where ``.median()`` returns arbitrary values if grouped dataframe contains empty bins (:issue:`13629`)
 - Bug in ``Index.copy()`` where ``name`` parameter was ignored (:issue:`14302`)
+- Bug in ``df.groupby`` causing an ``AttributeError`` when grouping a single index frame by a column and the index (:issue`14327`)


move to 0.20.0

jonmmease · 2016-10-03T19:33:36Z

whatsnew moved to 0.20.0 as @jreback requested

jorisvandenbossche · 2016-10-04T14:39:16Z

Looks good to me

jreback · 2016-10-04T14:41:34Z

pandas/core/groupby.py

            if self.name is None:
                self.name = index.names[level]

-            # XXX complete hack
+            if isinstance(index, MultiIndex):
+                inds = index.labels[level]


i think this should be a private method in an Index instead (and overridden in MultiIndex)

Do you mean everything inside the if isinstance(index, MultiIndex): block? If so then it looks like the MultiIndex override would need to input grouper, index, and level and return a tuple of labels, level_index, and grouper. This seems a little messy to me since the parent Index method would have no use for the level parameter and would need to return None for the labels and level_index values.

just take args and return a tuples of things
no state is kept

Got it, thanks for the clarification

…14327)

TomAugspurger

Looks good overall, just a very minor comment. We might be asking to change the whatsnew to v0.19.1, but we need to sort that out first; will keep you posted.

TomAugspurger · 2016-10-05T20:58:15Z

pandas/indexes/base.py

@@ -432,6 +432,13 @@ def _update_inplace(self, result, **kwargs):
        # guard when called from IndexOpsMixin
        raise TypeError("Index can't be updated inplace")

+    def _get_grouper_for_level(self, grouper, level):
+        # return grouper if grouper is not None else self


This comment isn't all that helpful :) Could you maybe change it to explain what the tuple being returned is (the two Nones are labels and level_index) and maybe note that the MultiIndex version is what's useful.

Yeah, good call :-)

TomAugspurger · 2016-10-05T21:03:03Z

pandas/core/groupby.py

-                self._labels = labels
-                self._group_index = level_index
-                self.grouper = level_index.take(labels)
+            self.grouper, self._labels, self._group_index = index._get_grouper_for_level(self.grouper, level)


I think this is the line causing the travis failure. You'll need to wrap it.

Thanks for catching that! Guess I forgot to run flake8

jorisvandenbossche · 2016-10-06T09:22:10Z

pandas/indexes/multi.py

@@ -524,6 +524,39 @@ def _format_native_types(self, na_rep='nan', **kwargs):

        return mi.values

+    def _get_grouper_for_level(self, grouper, level):
+


Can you add a docstring here?

yep, same as above for styling.

jreback · 2016-10-06T10:24:59Z

pandas/indexes/base.py

@@ -432,6 +432,17 @@ def _update_inplace(self, result, **kwargs):
        # guard when called from IndexOpsMixin
        raise TypeError("Index can't be updated inplace")

+    def _get_grouper_for_level(self, grouper, level):
+        # Use self (Index) as grouper if None was passed


can you add a Parameters section in the doc-string. and move the in-line comment to the Returns part.

jreback · 2016-10-06T10:26:52Z

pandas/indexes/multi.py

+
+        # XXX complete hack
+
+        if grouper is not None:


if you want to put a comment here explain what is going on would be great (for future readers).

Further you can just return if grouper is not None (and then don't use an else), I think makes the code read slightly better.

jreback · 2016-10-06T10:27:32Z

pandas/tests/test_groupby.py

+                                 'B': ['one', 'one', 'two',
+                                       'two', 'one', 'one']},
+                                index=idx)
+        result = df_multi.groupby(['B', pd.Grouper(level='inner')]).mean()


can you try with these reversed as well, e.g. [pd.Grouper(....), 'B'])

jonmmease · 2016-10-06T19:07:47Z

Thanks for the feedback @jreback, @jorisvandenbossche, and @TomAugspurger. Writing the docstrings and working through the type signatures sure helped me better understand the logic I was refactoring!

jonmmease · 2016-10-13T01:16:09Z

@TomAugspurger @jorisvandenbossche @jreback I've noticed that there's now a 0.19.1 whatsnew file. Should I move the whatsnew entry there?

jorisvandenbossche · 2016-10-13T07:20:06Z

Yes, as it seems a straightforward bug fix, you can put it in 0.19.1

jonmmease · 2016-10-13T23:56:21Z

@jorisvandenbossche Hmm, I rebased my branch on upstream/master so that I have a copy of the 0.19.1 whatsnew file to work with. After making the change I did a force push which successfully updated the branch in my fork (https://github.com/jmmease/pandas/tree/bug_14327), but it didn't update this pull request.

Should I have merged upstream/master instead of rebasing?

jorisvandenbossche · 2016-10-14T00:33:14Z

Well, something is going wrong since we moved the repo from the pydata org to the pandas-dev org. PRs that were already open before seem to have this problem.

@wesm @TomAugspurger @jreback anybody experience with this problem, or an idea what could be going on?

jreback · 2016-10-14T20:50:30Z

pandas/indexes/base.py

+        ----------
+        group_mapper: Group mapping function or None
+            Function mapping index values to groups
+        level : int


just make this level=None by default and assert it is None for Index. don't put the (Only used phrase)

jreback · 2016-10-14T20:50:48Z

pandas/indexes/base.py

+
+        Returns
+        -------
+        grouper : Index


it needs to always return a tuple (doesn't matter if they are used or not)

jreback · 2016-10-14T20:51:24Z

pandas/indexes/base.py

@@ -432,6 +432,35 @@ def _update_inplace(self, result, **kwargs):
        # guard when called from IndexOpsMixin
        raise TypeError("Index can't be updated inplace")

+    def _get_grouper_for_level(self, group_mapper, level):


call this mapper

jreback · 2016-10-14T20:52:08Z

pandas/indexes/multi.py

+        level_index : Index or None
+            Index of unique values for level
+        """
+        inds = self.labels[level]


call this indexer

jreback · 2016-10-14T20:53:22Z

pandas/indexes/multi.py

+            Index of values to group on
+        labels : ndarray of int or None
+            Array of locations in level_index
+        level_index : Index or None


call this uniques (these are not actually used anywhere, but are descriptive). We use certain terms in the codebase. This will make it consistent.

jreback · 2016-10-14T20:54:00Z

@jmmease lgtm. just some doc consistency changes. ping on green.

jonmmease · 2016-10-14T21:11:22Z

@jreback Thanks, I'll make these updates this evening. Do you have any guidance on how I should proceed since this PR didn't update with my rebase+force push from yesterday? (See comment from @jorisvandenbossche above) I could try again with a merge instead of rebase or I could open a new PR and reference this one.

jorisvandenbossche · 2016-10-14T21:45:15Z

On the short term, opening as a new PR is probably the easiest (if you close this one, you can just create a new one from the same branch)

jonmmease · 2016-10-14T23:41:45Z

PR reopened in #14428

Jon M. Mease added 3 commits October 1, 2016 21:14

Added failing test case of GH 14327

a421a52

Handle specification of level for non-MultiIndex in Grouping constructor

ec9340f

Existing logic under "if level is not None:" assumed that index was a MultiIndex. Now we check and also handle the case where an Index is passed in with a None grouper. This resolves GH 14327

Release notes for fix to GH 14327

848c9bb

jonmmease changed the title ~~Bug 14327~~ Bug: Grouping by index and column fails on DataFrame with single index (GH14327) Oct 2, 2016

jreback requested changes Oct 3, 2016

View reviewed changes

Moved whatsnew to 0.20.0

0f95bca

jreback requested changes Oct 4, 2016

View reviewed changes

Moved grouper level handling logic to methods on Index/MultiIndex (GH…

75a0390

…14327)

TomAugspurger reviewed Oct 5, 2016

View reviewed changes

Jon M. Mease added 2 commits October 5, 2016 19:38

Improve comments for Index._get_grouper_for_level

6b37bd4

Wrap line violating PEP8

897ec1c

jorisvandenbossche added Bug Groupby labels Oct 6, 2016

jorisvandenbossche reviewed Oct 6, 2016

View reviewed changes

jreback reviewed Oct 6, 2016

View reviewed changes

Jon M. Mease added 2 commits October 6, 2016 07:44

Added test cases that group by column then index

05e6557

Cleaned up _get_grouper_for_level implementations and added docstrings

33eb725

jorisvandenbossche added this to the 0.19.1 milestone Oct 13, 2016

jonmmease force-pushed the bug_14327 branch from 33eb725 to 7b29c66 Compare October 13, 2016 23:39

jreback reviewed Oct 14, 2016

View reviewed changes

jreback approved these changes Oct 14, 2016

View reviewed changes

jonmmease closed this Oct 14, 2016

jonmmease mentioned this pull request Oct 14, 2016

Bug: Grouping by index and column fails on DataFrame with single index (GH14327) #14428

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Grouping by index and column fails on DataFrame with single index (GH14327) #14333

Bug: Grouping by index and column fails on DataFrame with single index (GH14327) #14333

jonmmease commented Oct 2, 2016

codecov-io commented Oct 2, 2016 •

edited

Loading

jreback Oct 3, 2016

jonmmease commented Oct 3, 2016

jorisvandenbossche commented Oct 4, 2016

jreback Oct 4, 2016

jonmmease Oct 4, 2016

jreback Oct 4, 2016

jonmmease Oct 5, 2016

TomAugspurger left a comment

TomAugspurger Oct 5, 2016 •

edited

Loading

jonmmease Oct 5, 2016

TomAugspurger Oct 5, 2016

jonmmease Oct 5, 2016

jorisvandenbossche Oct 6, 2016

jreback Oct 6, 2016

jreback Oct 6, 2016

jreback Oct 6, 2016

jreback Oct 6, 2016

jonmmease commented Oct 6, 2016

jonmmease commented Oct 13, 2016

jorisvandenbossche commented Oct 13, 2016

jonmmease commented Oct 13, 2016

jorisvandenbossche commented Oct 14, 2016

jreback Oct 14, 2016

jreback Oct 14, 2016

jreback Oct 14, 2016

jreback Oct 14, 2016

jreback Oct 14, 2016

jreback commented Oct 14, 2016

jonmmease commented Oct 14, 2016

jorisvandenbossche commented Oct 14, 2016

jonmmease commented Oct 14, 2016

		@@ -524,6 +524,39 @@ def _format_native_types(self, na_rep='nan', **kwargs):

		return mi.values

		def _get_grouper_for_level(self, grouper, level):

Bug: Grouping by index and column fails on DataFrame with single index (GH14327) #14333

Bug: Grouping by index and column fails on DataFrame with single index (GH14327) #14333

Conversation

jonmmease commented Oct 2, 2016

codecov-io commented Oct 2, 2016 • edited Loading

Current coverage is 85.26% (diff: 95.83%)

Choose a reason for hiding this comment

jonmmease commented Oct 3, 2016

jorisvandenbossche commented Oct 4, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger left a comment

Choose a reason for hiding this comment

TomAugspurger Oct 5, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonmmease commented Oct 6, 2016

jonmmease commented Oct 13, 2016

jorisvandenbossche commented Oct 13, 2016

jonmmease commented Oct 13, 2016

jorisvandenbossche commented Oct 14, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Oct 14, 2016

jonmmease commented Oct 14, 2016

jorisvandenbossche commented Oct 14, 2016

jonmmease commented Oct 14, 2016

codecov-io commented Oct 2, 2016 •

edited

Loading

TomAugspurger Oct 5, 2016 •

edited

Loading