Skip to content

Commit 4479cfe

Browse files
topper-123yeshsurya
authored andcommitted
BUG: Categorical.unique should keep dtype unchanged (pandas-dev#38140)
1 parent e56ee2e commit 4479cfe

File tree

2 files changed

+35
-6
lines changed

2 files changed

+35
-6
lines changed

doc/source/whatsnew/v1.3.0.rst

+32
Original file line numberDiff line numberDiff line change
@@ -230,6 +230,38 @@ Notable bug fixes
230230

231231
These are bug fixes that might have notable behavior changes.
232232

233+
``Categorical.unique`` now always maintains same dtype as original
234+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
235+
236+
Previously, when calling :meth:`~Categorical.unique` with categorical data, unused categories in the new array
237+
would be removed, meaning that the dtype of the new array would be different than the
238+
original, if some categories are not present in the unique array (:issue:`18291`)
239+
240+
As an example of this, given:
241+
242+
.. ipython:: python
243+
244+
dtype = pd.CategoricalDtype(['bad', 'neutral', 'good'], ordered=True)
245+
cat = pd.Categorical(['good', 'good', 'bad', 'bad'], dtype=dtype)
246+
original = pd.Series(cat)
247+
unique = original.unique()
248+
249+
*pandas < 1.3.0*:
250+
251+
.. code-block:: ipython
252+
253+
In [1]: unique
254+
['good', 'bad']
255+
Categories (2, object): ['bad' < 'good']
256+
In [2]: original.dtype == unique.dtype
257+
False
258+
259+
*pandas >= 1.3.0*
260+
261+
.. ipython:: python
262+
263+
unique
264+
original.dtype == unique.dtype
233265
234266
Preserve dtypes in :meth:`~pandas.DataFrame.combine_first`
235267
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

pandas/core/series.py

+3-6
Original file line numberDiff line numberDiff line change
@@ -1993,15 +1993,12 @@ def unique(self) -> ArrayLike:
19931993
['2016-01-01 00:00:00-05:00']
19941994
Length: 1, dtype: datetime64[ns, US/Eastern]
19951995
1996-
An unordered Categorical will return categories in the order of
1997-
appearance.
1996+
An Categorical will return categories in the order of
1997+
appearance and with the same dtype.
19981998
19991999
>>> pd.Series(pd.Categorical(list('baabc'))).unique()
20002000
['b', 'a', 'c']
2001-
Categories (3, object): ['b', 'a', 'c']
2002-
2003-
An ordered Categorical preserves the category ordering.
2004-
2001+
Categories (3, object): ['a', 'b', 'c']
20052002
>>> pd.Series(pd.Categorical(list('baabc'), categories=list('abc'),
20062003
... ordered=True)).unique()
20072004
['b', 'a', 'c']

0 commit comments

Comments
 (0)