Skip to content

Bug in groupby.get_group on categoricalindex #15163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

watercrossing
Copy link
Contributor

@watercrossing watercrossing commented Jan 19, 2017

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, a few comments inline.

@@ -389,7 +389,7 @@ Bug Fixes

- Bug in compat for passing long integers to ``Timestamp.replace`` (:issue:`15030`)
- Bug in ``.loc`` that would not return the correct dtype for scalar access for a DataFrame (:issue:`11617`)

- Bug in ``CategoricalIndex``, a missing method is causing ``groupby`` to fail (:issue:`15155`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you re-phrase this as "Bug in GroupBy.get_group failing with a categorical grouper (:issue15155)"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

@@ -48,8 +48,9 @@ def setUp(self):
'D': np.array(
np.random.randn(8), dtype='float32')})

index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], ['one', 'two',
'three']],
index = MultiIndex(levels=[pd.CategoricalIndex(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this could break other tests that use self.mframe. Better to just change this in your tests.

def test_level_groupby_get_group(self):
    df = self.mframe.copy()
    index = ... # your new index w/ a Categorial
    df.index = index
    ...  # your test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well my test is the only one using self.mframe - in fact quite a few tests could use self.mframe now. I can change it back, but this feels more natural.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can also simply contruct the MI you need (e.g. from the original issue) directly in the test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, .mframe is used by several tests, create a new df local to the test, this is very specific.

def test_level_groupby_get_group(self):
# gh15155
testGroupBy = self.mframe.groupby(level=["first"])
assert_numpy_array_equal(testGroupBy.get_group("foo").values,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a huge deal since this is mostly a smoke test, but better to use assert_frame_equal(a, b) so that we test the indexes and names line up too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use assert_numpy_array_equal, rather directly test the get_group and assert the results

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't use assert_frame_equal(a,b) since the two have different structures, at least not as it is:

# testGroupBy.get_group("foo")
                     A         B         C
first second
foo   one    -0.529506 -1.282295  0.521247
      two     0.982640  0.428228  1.031026
      three  -0.753013  0.087009  1.082119

# self.mframe.loc["foo"]
               A         B         C
second
one    -0.529506 -1.282295  0.521247
two     0.982640  0.428228  1.031026
three  -0.753013  0.087009  1.082119

I agree that testing frames would be nicer, but how?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [3]: mframe.loc[['foo']]
Out[3]: 
                     A         B         C
first second                              
foo   one    -0.859892  0.873341  0.679335
      two    -1.751616  0.678056  0.118863
      three   0.055330  0.550937  1.545533

Copy link
Contributor Author

@watercrossing watercrossing Jan 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That still fails (now using the code from the example to generate the test data):

# assert_frame_equal(testGroupedBy.get_group("a"),test.loc[["a"]])
MultiIndex level [0] classes are not equivalent
[left]:  CategoricalIndex([u'a', u'a', u'a', u'a', u'a'], categories=[u'a', u'b'], ordered=False, name=u'Index1', dtype='category')
[right]: Index([u'a', u'a', u'a', u'a', u'a'], dtype='object', name=u'Index1')

Is this another bug?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a separate issue. please post an issue with a simpe repro.

for now you can simply construct the expected frame directly for comparision, and put a note with a reference to the new issue.

@jreback jreback changed the title Fix #15155 Bug in groupby on categoricalindex Jan 19, 2017
@jreback jreback added Bug Groupby Indexing Related to indexing on series/frames, not to indexes themselves labels Jan 19, 2017
@jreback jreback changed the title Bug in groupby on categoricalindex Bug in groupby.get_group on categoricalindex Jan 19, 2017
@codecov-io
Copy link

Current coverage is 85.54% (diff: 100%)

No coverage report found for master at 77518d8.

Powered by Codecov. Last update 77518d8...742d4a5

@jreback jreback closed this in 4c65d5f Jan 19, 2017
@jreback jreback added this to the 0.20.0 milestone Jan 19, 2017
@jreback
Copy link
Contributor

jreback commented Jan 19, 2017

thanks @watercrossing

if you are interested in working on #15166 would be great!

AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
closes pandas-dev#15155

Author: watercrossing <[email protected]>

Closes pandas-dev#15163 from watercrossing/indexgroup and squashes the following commits:

742d4a5 [watercrossing] BUG: GroupBy.get_group failing with a categorical grouper (pandas-dev#15155)
@watercrossing watercrossing deleted the indexgroup branch November 10, 2017 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Groupby level fails to enumerate groups
4 participants