Bug in groupby.get_group on categoricalindex #15163

watercrossing · 2017-01-19T11:41:28Z

closes Groupby level fails to enumerate groups #15155
tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

TomAugspurger

Thanks for the PR, a few comments inline.

TomAugspurger · 2017-01-19T13:43:03Z

doc/source/whatsnew/v0.20.0.txt

@@ -389,7 +389,7 @@ Bug Fixes

 - Bug in compat for passing long integers to ``Timestamp.replace`` (:issue:`15030`)
 - Bug in ``.loc`` that would not return the correct dtype for scalar access for a DataFrame (:issue:`11617`)
-
+- Bug in ``CategoricalIndex``, a missing method is causing ``groupby`` to fail (:issue:`15155`)


Can you re-phrase this as "Bug in GroupBy.get_group failing with a categorical grouper (:issue15155)"?

TomAugspurger · 2017-01-19T13:48:03Z

pandas/tests/groupby/test_categorical.py

@@ -48,8 +48,9 @@ def setUp(self):
             'D': np.array(
                 np.random.randn(8), dtype='float32')})

-        index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], ['one', 'two',
-                                                                  'three']],
+        index = MultiIndex(levels=[pd.CategoricalIndex(


Changing this could break other tests that use self.mframe. Better to just change this in your tests.

def test_level_groupby_get_group(self): df = self.mframe.copy() index = ... # your new index w/ a Categorial df.index = index ... # your test

Well my test is the only one using self.mframe - in fact quite a few tests could use self.mframe now. I can change it back, but this feels more natural.

you can also simply contruct the MI you need (e.g. from the original issue) directly in the test

no, .mframe is used by several tests, create a new df local to the test, this is very specific.

TomAugspurger · 2017-01-19T13:49:23Z

pandas/tests/groupby/test_categorical.py

+    def test_level_groupby_get_group(self):
+        # gh15155
+        testGroupBy = self.mframe.groupby(level=["first"])
+        assert_numpy_array_equal(testGroupBy.get_group("foo").values,


Not a huge deal since this is mostly a smoke test, but better to use assert_frame_equal(a, b) so that we test the indexes and names line up too.

don't use assert_numpy_array_equal, rather directly test the get_group and assert the results

I can't use assert_frame_equal(a,b) since the two have different structures, at least not as it is:

# testGroupBy.get_group("foo") A B C first second foo one -0.529506 -1.282295 0.521247 two 0.982640 0.428228 1.031026 three -0.753013 0.087009 1.082119 # self.mframe.loc["foo"] A B C second one -0.529506 -1.282295 0.521247 two 0.982640 0.428228 1.031026 three -0.753013 0.087009 1.082119

I agree that testing frames would be nicer, but how?

In [3]: mframe.loc[['foo']] Out[3]: A B C first second foo one -0.859892 0.873341 0.679335 two -1.751616 0.678056 0.118863 three 0.055330 0.550937 1.545533

That still fails (now using the code from the example to generate the test data):

# assert_frame_equal(testGroupedBy.get_group("a"),test.loc[["a"]]) MultiIndex level [0] classes are not equivalent [left]: CategoricalIndex([u'a', u'a', u'a', u'a', u'a'], categories=[u'a', u'b'], ordered=False, name=u'Index1', dtype='category') [right]: Index([u'a', u'a', u'a', u'a', u'a'], dtype='object', name=u'Index1')

Is this another bug?

this is a separate issue. please post an issue with a simpe repro.

for now you can simply construct the expected frame directly for comparision, and put a note with a reference to the new issue.

…#15155)

codecov-io · 2017-01-19T20:10:24Z

Current coverage is 85.54% (diff: 100%)

No coverage report found for master at 77518d8.

Powered by Codecov. Last update 77518d8...742d4a5

jreback · 2017-01-19T20:22:30Z

thanks @watercrossing

if you are interested in working on #15166 would be great!

closes pandas-dev#15155 Author: watercrossing <[email protected]> Closes pandas-dev#15163 from watercrossing/indexgroup and squashes the following commits: 742d4a5 [watercrossing] BUG: GroupBy.get_group failing with a categorical grouper (pandas-dev#15155)

TomAugspurger requested changes Jan 19, 2017

View reviewed changes

jreback changed the title ~~Fix #15155~~ Bug in groupby on categoricalindex Jan 19, 2017

jreback added Bug Groupby Indexing Related to indexing on series/frames, not to indexes themselves labels Jan 19, 2017

jreback changed the title ~~Bug in groupby on categoricalindex~~ Bug in groupby.get_group on categoricalindex Jan 19, 2017

jreback mentioned this pull request Jan 19, 2017

BUG: .loc indexing not preserving Index type #15166

Closed

BUG: GroupBy.get_group failing with a categorical grouper (pandas-dev…

742d4a5

…#15155)

watercrossing force-pushed the indexgroup branch from 6a99731 to 742d4a5 Compare January 19, 2017 16:40

jreback closed this in 4c65d5f Jan 19, 2017

jreback added this to the 0.20.0 milestone Jan 19, 2017

watercrossing deleted the indexgroup branch November 10, 2017 11:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in groupby.get_group on categoricalindex #15163

Bug in groupby.get_group on categoricalindex #15163

watercrossing commented Jan 19, 2017 •

edited

Loading

TomAugspurger left a comment

TomAugspurger Jan 19, 2017

watercrossing Jan 19, 2017

TomAugspurger Jan 19, 2017

watercrossing Jan 19, 2017

jreback Jan 19, 2017

jreback Jan 19, 2017

TomAugspurger Jan 19, 2017

jreback Jan 19, 2017

watercrossing Jan 19, 2017

jreback Jan 19, 2017

watercrossing Jan 19, 2017 •

edited

Loading

jreback Jan 19, 2017

codecov-io commented Jan 19, 2017

jreback commented Jan 19, 2017

Bug in groupby.get_group on categoricalindex #15163

Bug in groupby.get_group on categoricalindex #15163

Conversation

watercrossing commented Jan 19, 2017 • edited Loading

TomAugspurger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

watercrossing Jan 19, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Jan 19, 2017

Current coverage is 85.54% (diff: 100%)

jreback commented Jan 19, 2017

watercrossing commented Jan 19, 2017 •

edited

Loading

watercrossing Jan 19, 2017 •

edited

Loading