Skip to content

BUG: loc against CategoricalIndex may results in normal Index #11607

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 23, 2015

Conversation

sinhrks
Copy link
Member

@sinhrks sinhrks commented Nov 14, 2015

Closes #11586.

After the PR:

import pandas as pd
index = pd.CategoricalIndex(list('aabbca'), categories=list('cabe'))
df = pd.DataFrame({'A' : np.arange(6,dtype='int64')}, index=index)

# OK (not changed)
df.loc[['a', 'b']].index
# CategoricalIndex([u'a', u'a', u'a', u'b', u'b'], categories=[u'c', u'a', u'b', u'e'], ordered=False, dtype='category')

# Fixed to return Categorical Index if value exists in categories
df.loc[['a', 'b', 'e']].index
# CategoricalIndex([u'a', u'a', u'a', u'b', u'b', u'e'], categories=[u'a', u'b', u'e'], ordered=False, dtype='category')

# raise KeyError otherwise (not changed)
df.loc[['a', 'b', 'x']].index
# KeyError: 'a list-indexer must only include values that are in the categories'

There are separate paths when CategoricalIndex values (codes) are unique and not-unique and both fixed.

Also, .reindex intentionally returns normal Index when passed values are not Categorical, I kept the behavior as it is. If .reindex can always return CategoricalIndex, above 2 separate fixes are not required (fixing .reindex to return CategoricalIndex should work both paths).

# return normal Index(not changed)
df.index.reindex(['a', 'b'])
# (Index([u'a', u'a', u'a', u'b', u'b'], dtype='object'), array([0, 1, 5, 2, 3]))

# return CategoricalIndex(not changed)
df.index.reindex(pd.Categorical(['a', 'b']))
# (CategoricalIndex([u'a', u'a', u'a', u'b', u'b'], categories=[u'a', u'b'], ordered=False, dtype='category'), array([0, 1, 5, 2, 3]))

@jreback
Copy link
Contributor

jreback commented Nov 14, 2015

s reindex should give back a like index to the target / meaning if u put in a Categorical we try to give u a Categorical; if u out in an Index u get an Index

note that what u get is not dependent on what the actual index is! this is on purpose

this is essentially the difference between .loc and .reindex

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves Categorical Categorical Data Type labels Nov 15, 2015
@jreback jreback added this to the 0.18.0 milestone Nov 20, 2015
@sinhrks
Copy link
Member Author

sinhrks commented Nov 23, 2015

Ah yes, thus the behavior is unchanged (as in the example at the bottom of the body). The fix should only applied to .loc.

jreback added a commit that referenced this pull request Nov 23, 2015
BUG: loc against CategoricalIndex may results in normal Index
@jreback jreback merged commit ae212ff into pandas-dev:master Nov 23, 2015
@jreback
Copy link
Contributor

jreback commented Nov 23, 2015

thanks!

@sinhrks sinhrks deleted the loc_dtype_cat branch November 23, 2015 01:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: .loc against CategoricalIndex may result in normal Index
3 participants