Skip to content

Commit 28a7ec7

Browse files
Backport PR #57329 on branch 2.2.x (REGR: CategoricalIndex.difference with null values) (#57336)
Backport PR #57329: REGR: CategoricalIndex.difference with null values Co-authored-by: Luke Manley <[email protected]>
1 parent 0443427 commit 28a7ec7

File tree

3 files changed

+24
-2
lines changed

3 files changed

+24
-2
lines changed

doc/source/whatsnew/v2.2.1.rst

+1
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ Fixed regressions
2020
- Fixed regression in :func:`wide_to_long` raising an ``AttributeError`` for string columns (:issue:`57066`)
2121
- Fixed regression in :meth:`.DataFrameGroupBy.idxmin`, :meth:`.DataFrameGroupBy.idxmax`, :meth:`.SeriesGroupBy.idxmin`, :meth:`.SeriesGroupBy.idxmax` ignoring the ``skipna`` argument (:issue:`57040`)
2222
- Fixed regression in :meth:`.DataFrameGroupBy.idxmin`, :meth:`.DataFrameGroupBy.idxmax`, :meth:`.SeriesGroupBy.idxmin`, :meth:`.SeriesGroupBy.idxmax` where values containing the minimum or maximum value for the dtype could produce incorrect results (:issue:`57040`)
23+
- Fixed regression in :meth:`CategoricalIndex.difference` raising ``KeyError`` when other contains null values other than NaN (:issue:`57318`)
2324
- Fixed regression in :meth:`DataFrame.loc` raising ``IndexError`` for non-unique, masked dtype indexes where result has more than 10,000 rows (:issue:`57027`)
2425
- Fixed regression in :meth:`DataFrame.sort_index` not producing a stable sort for a index with duplicates (:issue:`57151`)
2526
- Fixed regression in :meth:`DataFrame.to_dict` with ``orient='list'`` and datetime or timedelta types returning integers (:issue:`54824`)

pandas/core/indexes/base.py

+5-2
Original file line numberDiff line numberDiff line change
@@ -3663,9 +3663,12 @@ def difference(self, other, sort=None):
36633663

36643664
def _difference(self, other, sort):
36653665
# overridden by RangeIndex
3666+
this = self
3667+
if isinstance(self, ABCCategoricalIndex) and self.hasnans and other.hasnans:
3668+
this = this.dropna()
36663669
other = other.unique()
3667-
the_diff = self[other.get_indexer_for(self) == -1]
3668-
the_diff = the_diff if self.is_unique else the_diff.unique()
3670+
the_diff = this[other.get_indexer_for(this) == -1]
3671+
the_diff = the_diff if this.is_unique else the_diff.unique()
36693672
the_diff = _maybe_try_sort(the_diff, sort)
36703673
return the_diff
36713674

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
import numpy as np
2+
import pytest
3+
4+
from pandas import (
5+
CategoricalIndex,
6+
Index,
7+
)
8+
import pandas._testing as tm
9+
10+
11+
@pytest.mark.parametrize("na_value", [None, np.nan])
12+
def test_difference_with_na(na_value):
13+
# GH 57318
14+
ci = CategoricalIndex(["a", "b", "c", None])
15+
other = Index(["c", na_value])
16+
result = ci.difference(other)
17+
expected = CategoricalIndex(["a", "b"], categories=["a", "b", "c"])
18+
tm.assert_index_equal(result, expected)

0 commit comments

Comments
 (0)