Skip to content

BUG: get_group fails when multi-grouping with a categorical #10132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 5, 2015

Conversation

evanpw
Copy link
Contributor

@evanpw evanpw commented May 14, 2015

Example:

>>> df = pd.DataFrame({'a' : pd.Categorical('xyxy'), 'b' : 1, 'c' : 2})
>>> df.groupby(['a', 'b']).get_group(('x', 1))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/evanpw/Workspace/pandas/pandas/core/groupby.py", line 601, in get_group
    inds = self._get_index(name)
  File "/home/evanpw/Workspace/pandas/pandas/core/groupby.py", line 429, in _get_index
    sample = next(iter(self.indices))
  File "/home/evanpw/Workspace/pandas/pandas/core/groupby.py", line 414, in indices
    return self.grouper.indices
  File "pandas/src/properties.pyx", line 34, in pandas.lib.cache_readonly.__get__ (pandas/lib.c:41912)
  File "/home/evanpw/Workspace/pandas/pandas/core/groupby.py", line 1305, in indices
    return _get_indices_dict(label_list, keys)
  File "/home/evanpw/Workspace/pandas/pandas/core/groupby.py", line 3762, in _get_indices_dict
    return lib.indices_fast(sorter, group_index, keys, sorted_labels)
  File "pandas/lib.pyx", line 1385, in pandas.lib.indices_fast (pandas/lib.c:23843)
TypeError: Cannot convert Categorical to numpy.ndarray

The problem is that Grouping.group_index is a CategoricalIndex, so calling get_values() gives you a Categorical, which needs one more application of get_values() to get an ndarray

@jreback
Copy link
Contributor

jreback commented May 14, 2015

can you put the self-contained example in the top of the PR?

@jreback jreback added Bug Groupby Categorical Categorical Data Type labels May 14, 2015
if len(self.groupings) == 1:
return self.groupings[0].indices
else:
label_list = [ping.labels for ping in self.groupings]
keys = [_values_from_object(ping.group_index) for ping in self.groupings]
keys = [extract_values(ping.group_index) for ping in self.groupings]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of this, i think that .get_values() needs to be defined for CategoricalIndex (.get_values is defined for a regular index and CategoricalIndex is just inhertiting).

@jreback
Copy link
Contributor

jreback commented Jun 2, 2015

can you update according to comments

@evanpw
Copy link
Contributor Author

evanpw commented Jun 3, 2015

Done, and moved to 0.16.2 whatsnew. Does this need an entry in the API changes section now, or is this change too minor?

@@ -66,7 +66,9 @@ Bug Fixes
- Bug in getting timezone data with ``dateutil`` on various platforms ( :issue:`9059`, :issue:`8639`, :issue:`9663`, :issue:`10121`)
- Bug in display datetimes with mixed frequencies uniformly; display 'ms' datetimes to the proper precision. (:issue:`10170`)

- Bung in ``Series`` arithmetic methods may incorrectly hold names (:issue:`10068`)
- Bun in ``Series`` arithmetic methods may incorrectly hold names (:issue:`10068`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you removed the wrong character :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incremental progress :)

@jorisvandenbossche
Copy link
Member

This seems a bug fix to me, so the whatsnew entry is fine!

@jreback
Copy link
Contributor

jreback commented Jun 3, 2015

yep, pls squash. ping when green.

@evanpw
Copy link
Contributor Author

evanpw commented Jun 4, 2015

Tests are green

@jreback
Copy link
Contributor

jreback commented Jun 4, 2015

@evanpw thanks. Soon, waiting on travis to finish up its builds of a bunch of stuff.

@jreback jreback added this to the 0.16.2 milestone Jun 5, 2015
jreback added a commit that referenced this pull request Jun 5, 2015
BUG: get_group fails when multi-grouping with a categorical
@jreback jreback merged commit 08b1511 into pandas-dev:master Jun 5, 2015
@jreback
Copy link
Contributor

jreback commented Jun 5, 2015

@evanpw thanks!

@evanpw evanpw deleted the cat_multigroup branch June 5, 2015 22:08
@evanpw evanpw restored the cat_multigroup branch September 19, 2015 00:34
@evanpw evanpw deleted the cat_multigroup branch September 19, 2015 00:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type Groupby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants