Skip to content

REGR: Series.duplicated with category dtype and nulls raises ValueError #44351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
simonjayhawkins opened this issue Nov 8, 2021 · 0 comments · Fixed by #44356
Closed
3 tasks done

REGR: Series.duplicated with category dtype and nulls raises ValueError #44351

simonjayhawkins opened this issue Nov 8, 2021 · 0 comments · Fixed by #44356
Labels
Bug Categorical Categorical Data Type Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@simonjayhawkins
Copy link
Member

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd

print(pd.__version__)
tc = pd.Series(
    pd.Categorical(
        [True, False, True, False, pd.NA], categories=[True, False], ordered=True
    )
)
print(tc.duplicated())

Issue Description

xref #44292 (comment)

code sample based on test_drop_duplicates_categorical_bool

on 1.3.4 (and master) code sample gives

1.3.4
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_47357/1277064552.py in <module>
      7     )
      8 )
----> 9 print(tc.duplicated())

~/miniconda3/envs/pandas-1.3.4/lib/python3.9/site-packages/pandas/core/series.py in duplicated(self, keep)
   2215         dtype: bool
   2216         """
-> 2217         res = self._duplicated(keep=keep)
   2218         result = self._constructor(res, index=self.index)
   2219         return result.__finalize__(self, method="duplicated")

~/miniconda3/envs/pandas-1.3.4/lib/python3.9/site-packages/pandas/core/base.py in _duplicated(self, keep)
   1230         self, keep: Literal["first", "last", False] = "first"
   1231     ) -> np.ndarray:
-> 1232         return duplicated(self._values, keep=keep)

~/miniconda3/envs/pandas-1.3.4/lib/python3.9/site-packages/pandas/core/algorithms.py in duplicated(values, keep)
    925     duplicated : ndarray[bool]
    926     """
--> 927     values, _ = _ensure_data(values)
    928     return htable.duplicated(values, keep=keep)
    929 

~/miniconda3/envs/pandas-1.3.4/lib/python3.9/site-packages/pandas/core/algorithms.py in _ensure_data(values)
    139             # i.e. all-bool Categorical, BooleanArray
    140             try:
--> 141                 return np.asarray(values).astype("uint8", copy=False), values.dtype
    142             except TypeError:
    143                 # GH#42107 we have pd.NAs present

ValueError: cannot convert float NaN to integer

Expected Behavior

on 1.2.5 code sample gives

1.2.5
0    False
1    False
2     True
3     True
4    False
dtype: bool

Installed Versions

.

@simonjayhawkins simonjayhawkins added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Regression Functionality that used to work in a prior pandas version Categorical Categorical Data Type labels Nov 8, 2021
@simonjayhawkins simonjayhawkins added this to the 1.3.5 milestone Nov 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant