Skip to content

BUG: Index constructor does not maintain CategoricalDtype #19048

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 3, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -300,6 +300,7 @@ Conversion
- Bug in :class:`FY5253` where ``datetime`` addition and subtraction incremented incorrectly for dates on the year-end but not normalized to midnight (:issue:`18854`)
- Bug in :class:`DatetimeIndex` where adding or subtracting an array-like of ``DateOffset`` objects either raised (``np.array``, ``pd.Index``) or broadcast incorrectly (``pd.Series``) (:issue:`18849`)
- Bug in :class:`Series` floor-division where operating on a scalar ``timedelta`` raises an exception (:issue:`18846`)
- Bug in :class:`Index` constructor with ``dtype=CategoricalDtype(...)`` where ``categories`` and ``ordered`` are not maintained (issue:`19032`)


Indexing
Expand Down
8 changes: 4 additions & 4 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -197,13 +197,13 @@ def __new__(cls, data=None, dtype=None, copy=False, name=None,
# categorical
if is_categorical_dtype(data) or is_categorical_dtype(dtype):
from .category import CategoricalIndex
return CategoricalIndex(data, copy=copy, name=name, **kwargs)
return CategoricalIndex(data, dtype=dtype, copy=copy, name=name,
**kwargs)

# interval
if is_interval_dtype(data):
if is_interval_dtype(data) or is_interval_dtype(dtype):
from .interval import IntervalIndex
return IntervalIndex.from_intervals(data, name=name,
copy=copy)
return IntervalIndex(data, dtype=dtype, name=name, copy=copy)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IntervalIndex changes are more about future proofing; the dtype kwarg currently doesn't do anything for IntervalIndex, so there aren't really any tests that can be added. I plan to create a PR to allow conversion between subtypes and initializing with non-inferred subtype (e.g. interval[int64] to interval[float64]) where I believe the same issue will come into play, so figured I'd put this code in place now while it's fresh on my mind.

Also, switched from IntervalIndex.from_intervals to the IntervalIndex constructor since the constructor does everything from_intervals does, but has more features and better testing. Would eventually like to either remove from_intervals or just redirect it to the constructor.


# index-like
elif isinstance(data, (np.ndarray, Index, ABCSeries)):
Expand Down
29 changes: 19 additions & 10 deletions pandas/tests/indexes/test_category.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,18 +137,27 @@ def test_construction_with_categorical_dtype(self):
data, cats, ordered = 'a a b b'.split(), 'c b a'.split(), True
dtype = CategoricalDtype(categories=cats, ordered=ordered)

result = pd.CategoricalIndex(data, dtype=dtype)
expected = pd.CategoricalIndex(data, categories=cats,
ordered=ordered)
result = CategoricalIndex(data, dtype=dtype)
expected = CategoricalIndex(data, categories=cats, ordered=ordered)
tm.assert_index_equal(result, expected, exact=True)

# error to combine categories or ordered and dtype keywords args
with pytest.raises(ValueError, match="Cannot specify both `dtype` and "
"`categories` or `ordered`."):
pd.CategoricalIndex(data, categories=cats, dtype=dtype)
with pytest.raises(ValueError, match="Cannot specify both `dtype` and "
"`categories` or `ordered`."):
pd.CategoricalIndex(data, ordered=ordered, dtype=dtype)
# GH 19032
result = Index(data, dtype=dtype)
tm.assert_index_equal(result, expected, exact=True)

# error when combining categories/ordered and dtype kwargs
msg = 'Cannot specify both `dtype` and `categories` or `ordered`.'
with pytest.raises(ValueError, match=msg):
CategoricalIndex(data, categories=cats, dtype=dtype)

with pytest.raises(ValueError, match=msg):
Index(data, categories=cats, dtype=dtype)

with pytest.raises(ValueError, match=msg):
CategoricalIndex(data, ordered=ordered, dtype=dtype)

with pytest.raises(ValueError, match=msg):
Index(data, ordered=ordered, dtype=dtype)

def test_create_categorical(self):
# https://github.com/pandas-dev/pandas/pull/17513
Expand Down