Fixed regression in Series.duplicated for categorical dtype with bool categories #44356

phofl · 2021-11-08T22:10:59Z

closes REGR: Series.duplicated with category dtype and nulls raises ValueError #44351
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

… categories

simonjayhawkins

Thanks @phofl Generally LGTM.

I think could maybe also mention drop_duplicates in the release note, but probably don't need extra tests.

However, the code sample in the issue OP is based on test_drop_duplicates_categorical_bool in pandas/tests/series/methods/test_drop_duplicates.py so could make sense to co-locate the tests and also test for drop_duplicates

phofl · 2021-11-09T19:51:16Z

Added the test and modified the release note

jreback · 2021-11-12T03:09:17Z

pandas/tests/series/methods/test_drop_duplicates.py

+        )
+        result = ser.drop_duplicates()
+        expected = Series(
+            Categorical([True, False, np.nan], categories=[True, False], ordered=True),


this might be a bug as this is not preserving NA (but unrelated / not a regression, so pls open a new issue)

Done, #44405

The test case is actually a bit misleading. This regression was just about a boolean categorical with missing values, not specifically with NA (also if you create the data with np.nan, you had the same issue)

jreback · 2021-11-12T03:09:45Z

@meeseeksdev backport 1.3.x

jreback · 2021-11-12T03:09:50Z

thanks @phofl

… categories (pandas-dev#44356) (cherry picked from commit 9f54f70)

… categories (#44356) (#44402) (cherry picked from commit 9f54f70)

… categories (pandas-dev#44356)

Fixed regression in Series.duplicated for categorical dtype with bool…

fc73f90

… categories

phofl added Categorical Categorical Data Type Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Regression Functionality that used to work in a prior pandas version labels Nov 8, 2021

phofl added this to the 1.3.5 milestone Nov 8, 2021

simonjayhawkins reviewed Nov 9, 2021

View reviewed changes

Add test

eeba436

simonjayhawkins approved these changes Nov 9, 2021

View reviewed changes

Merge remote-tracking branch 'upstream/master' into 44351_new

7a93843

jreback approved these changes Nov 12, 2021

View reviewed changes

jreback merged commit 9f54f70 into pandas-dev:master Nov 12, 2021

This comment has been minimized.

Sign in to view

lumberbot-app bot added the Still Needs Manual Backport label Nov 12, 2021

This comment has been minimized.

Sign in to view

phofl added a commit to phofl/pandas that referenced this pull request Nov 12, 2021

Fixed regression in Series.duplicated for categorical dtype with bool…

8face04

… categories (pandas-dev#44356) (cherry picked from commit 9f54f70)

phofl deleted the 44351_new branch November 12, 2021 08:33

phofl removed the Still Needs Manual Backport label Nov 12, 2021

simonjayhawkins mentioned this pull request Nov 12, 2021

Backport PR #44356 on branch 1.3.x (Fixed regression in Series.duplicated for categorical dtype with bool categories) #44402

Merged

simonjayhawkins pushed a commit that referenced this pull request Nov 12, 2021

Fixed regression in Series.duplicated for categorical dtype with bool…

16d51c2

… categories (#44356) (#44402) (cherry picked from commit 9f54f70)

nickleus27 pushed a commit to nickleus27/pandas that referenced this pull request Nov 28, 2021

Fixed regression in Series.duplicated for categorical dtype with bool…

eca9f6c

… categories (pandas-dev#44356)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed regression in Series.duplicated for categorical dtype with bool categories #44356

Fixed regression in Series.duplicated for categorical dtype with bool categories #44356

phofl commented Nov 8, 2021

simonjayhawkins left a comment

phofl commented Nov 9, 2021

jreback Nov 12, 2021

phofl Nov 12, 2021

jorisvandenbossche Nov 12, 2021

jreback commented Nov 12, 2021

jreback commented Nov 12, 2021

This comment has been minimized.

This comment has been minimized.

Fixed regression in Series.duplicated for categorical dtype with bool categories #44356

Fixed regression in Series.duplicated for categorical dtype with bool categories #44356

Conversation

phofl commented Nov 8, 2021

simonjayhawkins left a comment

Choose a reason for hiding this comment

phofl commented Nov 9, 2021

jreback Nov 12, 2021

Choose a reason for hiding this comment

phofl Nov 12, 2021

Choose a reason for hiding this comment

jorisvandenbossche Nov 12, 2021

Choose a reason for hiding this comment

jreback commented Nov 12, 2021

jreback commented Nov 12, 2021

This comment has been minimized.

This comment has been minimized.