Skip to content

Index(data=..., dtype=CategoricalDtype(...)) doesn't maintain dtype #19032

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
topper-123 opened this issue Jan 2, 2018 · 7 comments · Fixed by #19048
Closed

Index(data=..., dtype=CategoricalDtype(...)) doesn't maintain dtype #19032

topper-123 opened this issue Jan 2, 2018 · 7 comments · Fixed by #19048
Labels
Categorical Categorical Data Type Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@topper-123
Copy link
Contributor

topper-123 commented Jan 2, 2018

Code Sample, a copy-pastable example if possible

>>> c = pd.api.types.CategoricalDtype(categories=['German', 'English', 'French'],
...                                   ordered=True)
>>> pd.Index(['German', 'English', 'French'], dtype=c)
CategoricalIndex(['German', 'English', 'French'],
                 categories=['English', 'French', 'German'],
                 ordered=False, dtype='category')

Problem description

Notice that the index isn't ordered and the categories have changed location.

If construction goes through CategoricalIndex, everything is ok.

Expected Output

Expected output is CategoricalIndex(['German', 'English', 'French'], categories=['German', 'English', 'French'], ordered=True).

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 32
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.21.1
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170922
Cython: 0.26.1
numpy: 1.11.3
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.5
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@topper-123 topper-123 reopened this Jan 2, 2018
@topper-123 topper-123 changed the title Index(data, dtype=CategoricalDtype()) doesn't maintain dtype Index(data=..., dtype=CategoricalDtype(...)) doesn't maintain dtype Jan 2, 2018
@jreback
Copy link
Contributor

jreback commented Jan 2, 2018

thought we had an issue for this one, ok, IIRC cc @jschendel @jorisvandenbossche @TomAugspurger
discussed this on the last Cat PR.

@jreback jreback added Categorical Categorical Data Type Difficulty Intermediate Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jan 2, 2018
@jreback jreback added this to the Next Major Release milestone Jan 2, 2018
@jschendel
Copy link
Member

My previous PR's related to CDT were in the context of .astype(CDT), so this case wasn't something that was covered. Fix looks super simple though, dtype just isn't being passed through to the CI constructor here:

if is_categorical_dtype(data) or is_categorical_dtype(dtype):
from .category import CategoricalIndex
return CategoricalIndex(data, copy=copy, name=name, **kwargs)

Will open a PR to fix it later today.

@topper-123
Copy link
Contributor Author

Could you maybe also do the same for IntervalIndex? That is, if the dtype is a pd.api.types.IntervalDtype, the constructor should be passed to IntervalIndex. (this doesn't do anything useful ATM, but as a principle (IMO) if dtype=IntervalDtype() should delegate to IntervalIndex.

This should only be 2 lines more....

@TomAugspurger
Copy link
Contributor

Could you maybe also do the same for IntervalIndex?

FWIW, I'm thinking through this type of issue with #18767

I'd recommend avoiding more

if isinstance(foo, extension_type_1): ...
elif isinstance(foo, extension_type_2): ...

blocks for now.

@jschendel
Copy link
Member

I'd recommend avoiding more blocks for now

The block is actually already there, so it's really just a matter of passing dtype through to the IntervalIndex constructor. Won't do anything right now, as the IntervalIndex constructor accepts a dtype kwarg, but never actually uses it.

Planning to open an issue in the near future to allow subtype conversion for IntervalIndex, e.g. converting interval[int64] to interval[float64], which currently isn't possible via the constructor or astype. I believe this same issue would come into play then, so it'd need to be addressed at some point in the near term. Can do it in this PR or later, doesn't really matter to me.

@TomAugspurger
Copy link
Contributor

The block is actually already there, so it's really just a matter of passing dtype through to the IntervalIndex constructor

👍 then.

@topper-123
Copy link
Contributor Author

topper-123 commented Jan 2, 2018

Great, thanks.

FYI, my use case is combining several series of the same dtype, and there may be several such groups. If I can pre-register the dtype, I can covert to the result index simply by Index(..., dtype=my_type), while if I explicitly have to supply CategoricalIndex, IntervalIndex also, it gets more complicated to do this kind of thing.

@jreback jreback modified the milestones: Next Major Release, 0.23.0 Jan 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants