REGR: Series.duplicated with category dtype and nulls raises ValueError #44351

simonjayhawkins · 2021-11-08T16:06:29Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd

print(pd.__version__)
tc = pd.Series(
    pd.Categorical(
        [True, False, True, False, pd.NA], categories=[True, False], ordered=True
    )
)
print(tc.duplicated())

Issue Description

xref #44292 (comment)

code sample based on test_drop_duplicates_categorical_bool

on 1.3.4 (and master) code sample gives

1.3.4
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_47357/1277064552.py in <module>
      7     )
      8 )
----> 9 print(tc.duplicated())

~/miniconda3/envs/pandas-1.3.4/lib/python3.9/site-packages/pandas/core/series.py in duplicated(self, keep)
   2215         dtype: bool
   2216         """
-> 2217         res = self._duplicated(keep=keep)
   2218         result = self._constructor(res, index=self.index)
   2219         return result.__finalize__(self, method="duplicated")

~/miniconda3/envs/pandas-1.3.4/lib/python3.9/site-packages/pandas/core/base.py in _duplicated(self, keep)
   1230         self, keep: Literal["first", "last", False] = "first"
   1231     ) -> np.ndarray:
-> 1232         return duplicated(self._values, keep=keep)

~/miniconda3/envs/pandas-1.3.4/lib/python3.9/site-packages/pandas/core/algorithms.py in duplicated(values, keep)
    925     duplicated : ndarray[bool]
    926     """
--> 927     values, _ = _ensure_data(values)
    928     return htable.duplicated(values, keep=keep)
    929 

~/miniconda3/envs/pandas-1.3.4/lib/python3.9/site-packages/pandas/core/algorithms.py in _ensure_data(values)
    139             # i.e. all-bool Categorical, BooleanArray
    140             try:
--> 141                 return np.asarray(values).astype("uint8", copy=False), values.dtype
    142             except TypeError:
    143                 # GH#42107 we have pd.NAs present

ValueError: cannot convert float NaN to integer

Expected Behavior

on 1.2.5 code sample gives

1.2.5
0    False
1    False
2     True
3     True
4    False
dtype: bool

Installed Versions

.

The text was updated successfully, but these errors were encountered:

simonjayhawkins added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Regression Functionality that used to work in a prior pandas version Categorical Categorical Data Type labels Nov 8, 2021

simonjayhawkins added this to the 1.3.5 milestone Nov 8, 2021

simonjayhawkins mentioned this issue Nov 8, 2021

TYP: _ensure_data and infer_dtype_from_array #44292

Closed

phofl mentioned this issue Nov 8, 2021

Fixed regression in Series.duplicated for categorical dtype with bool categories #44356

Merged

4 tasks

jreback closed this as completed in #44356 Nov 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REGR: Series.duplicated with category dtype and nulls raises ValueError #44351

REGR: Series.duplicated with category dtype and nulls raises ValueError #44351

simonjayhawkins commented Nov 8, 2021

REGR: Series.duplicated with category dtype and nulls raises ValueError #44351

REGR: Series.duplicated with category dtype and nulls raises ValueError #44351

Comments

simonjayhawkins commented Nov 8, 2021

Reproducible Example

Issue Description

Expected Behavior

Installed Versions