Skip to content

Index.astype('category') does not work #20843

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
h-vetinari opened this issue Apr 27, 2018 · 2 comments
Closed

Index.astype('category') does not work #20843

h-vetinari opened this issue Apr 27, 2018 · 2 comments
Labels
Duplicate Report Duplicate issue or pull request

Comments

@h-vetinari
Copy link
Contributor

h-vetinari commented Apr 27, 2018

It's easy to make a Series categorical by using .astype('category'), and this yields the same result as calling the constructor with dtype='category'.

The same call fails for Index:

s = pd.Series(['a', 'b', 'a'])
sc = pd.Series(['a', 'b', 'a'], dtype='category')
s.astype('category')
# 0    a
# 1    b
# 2    a
# dtype: category
# Categories (2, object): [a, b]

# # both ways yield the same result
sc.equals(s.astype('category'))
# True

# # For Index ...
t = pd.Index(s.values)
t
# Index(['a', 'b', 'a'], dtype='object')
# # ... the direct constructor works
tc = pd.Index(['a', 'b', 'a'], dtype='category')
# # ... but not the conversion
t.astype('category')
# TypeError: data type "category" not understood
@jschendel
Copy link
Member

If I understand correctly, this looks to be fixed on master by #18677:

In [2]: pd.__version__
Out[2]: '0.23.0.dev0+807.g563a6ad'

In [3]: idx = pd.Index(list('aba'))

In [4]: idx
Out[4]: Index(['a', 'b', 'a'], dtype='object')

In [5]: idx.astype('category')
Out[5]: CategoricalIndex(['a', 'b', 'a'], categories=['a', 'b'], ordered=False, dtype='category')

With the same procedure failing on 0.22.0:

In [2]: pd.__version__
Out[2]: '0.22.0'

In [3]: idx = pd.Index(list('aba'))

In [4]: idx
Out[4]: Index(['a', 'b', 'a'], dtype='object')

In [5]: idx.astype('category')
---------------------------------------------------------------------------
TypeError: data type "category" not understood

@h-vetinari
Copy link
Contributor Author

@jschendel
Cool, didn't know this was fixed in v.0.23 already!

@TomAugspurger TomAugspurger added the Duplicate Report Duplicate issue or pull request label Apr 30, 2018
@TomAugspurger TomAugspurger added this to the No action milestone Apr 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

3 participants