groupby categorical column fails with unstack #11558

mikepqr · 2015-11-09T04:32:58Z

Replicating example

In [1]: df = pd.DataFrame([[1,2],[3,4]],columns=pd.CategoricalIndex(list('AB')))

In [2]: df.describe()
AttributeError: 'DataFrame' object has no attribute 'value_counts'

The behaviour in this notebook seems like a bug to me. This is pandas 0.17.0.

In it, g and gcat are the results of two df.groupby(['medium', 'artist']).count().unstack() operations. The only difference is that one of those operations is on df where one of the columns that the groupby operates over has been converted to Categorical.

g and gcat behave very differently. I've tried to pin this down to the exact operation in the split-apply-combine that causes the problem without much luck.

Slicing a column out of g returns a Series as expected, while slicing a column out of gcat returns a DataFrame (see cells 4 and 5).

g.describe() works as expected, but gcat.describe() raises the exception

AttributeError: 'DataFrame' object has no attribute 'value_counts'

and g['painting'] + g['sculpture'] works as expected but g['painting'] + g['sculpture'] raises

Exception: Data must be 1-dimensional

The text was updated successfully, but these errors were encountered:

jreback · 2015-11-09T12:05:47Z

this is a tricky bug actually; when indexing into a frame that has duplicates (or is a CategoricalIndex), you get a frame back from .iteritems even though it may be unique. So there are 2 paths here that need checking actually.

luispedro · 2016-01-06T20:57:15Z

This seems related to this incongruency I also ran into:

>>> data = pd.DataFrame([[1,2,3],[3,4,5]], index=['one', 'two'])
>>> print(data.ix['one'].shape)
(3,)
>>> data = pd.DataFrame([[1,2,3],[3,4,5]], index=pd.Categorical(['one', 'two']))
>>> print(data.ix['one'].shape)
(1, 3)

If this dataframe is coming from a groupby, then it's guaranteed to be uniquely indexed, so it's doubly inconsistent.

jreback · 2016-01-06T21:05:37Z

this has to do with how we handle uniques vs non-uniques. A Categorical Index is by definition non-unique (its actually unique in this case).

But this might be a a buggie.

In [36]: data1 = pd.DataFrame([[1,2,3],[3,4,5]], index=['one', 'two'])

In [37]: data2 = pd.DataFrame([[1,2,3],[3,4,5]], index=pd.Categorical(['one', 'two']))

In [40]: data1.ix['one']
Out[40]: 
0    1
1    2
2    3
Name: one, dtype: int64

In [41]: data2.ix['one']
Out[41]: 
     0  1  2
one  1  2  3

related to #11558 Author: sinhrks <[email protected]> Closes #12531 from sinhrks/cat_get_loc and squashes the following commits: 2749b62 [sinhrks] BUG: CategoricalIndex.get_loc returns array even if it is unique

jreback added Bug Prio-medium Indexing Related to indexing on series/frames, not to indexes themselves labels Nov 9, 2015

jreback added this to the 0.17.1 milestone Nov 9, 2015

jreback modified the milestones: Next Major Release, 0.17.1 Nov 13, 2015

jreback modified the milestones: 0.18.1, Next Major Release Mar 12, 2016

sinhrks mentioned this issue Mar 14, 2016

BUG: CategoricalIndex.get_loc returns array even if it is unique #12531

Closed

4 tasks

sinhrks mentioned this issue Mar 20, 2016

BUG: .describe lost CategoricalIndex info #12675

Closed

4 tasks

jreback closed this as completed in 896454e Mar 29, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

groupby categorical column fails with unstack #11558

groupby categorical column fails with unstack #11558

mikepqr commented Nov 9, 2015

jreback commented Nov 9, 2015

luispedro commented Jan 6, 2016

jreback commented Jan 6, 2016

groupby categorical column fails with unstack #11558

groupby categorical column fails with unstack #11558

Comments

mikepqr commented Nov 9, 2015

jreback commented Nov 9, 2015

luispedro commented Jan 6, 2016

jreback commented Jan 6, 2016