Skip to content

Commit 35b20d8

Browse files
jankatinsjreback
authored andcommitted
BUG: Fix for comparisons of categorical and an scalar not in categories, xref GH9836
Up to now, a comparison of categorical data and a scalar, which is not in the categories would return `False` for all elements when it should raise a `TypeError`, which it now does. Also fix that `!=` comparisons would return `False` for all elements when the more logical choice would be `True`.
1 parent 3e7f21c commit 35b20d8

File tree

3 files changed

+37
-1
lines changed

3 files changed

+37
-1
lines changed

doc/source/whatsnew/v0.16.1.txt

+2
Original file line numberDiff line numberDiff line change
@@ -124,3 +124,5 @@ Bug Fixes
124124
- Bug in which ``SparseDataFrame`` could not take `nan` as a column name (:issue:`8822`)
125125

126126
- Bug in unequal comparisons between a ``Series`` of dtype `"category"` and a scalar (e.g. ``Series(Categorical(list("abc"), categories=list("cba"), ordered=True)) > "b"``, which wouldn't use the order of the categories but use the lexicographical order. (:issue:`9848`)
127+
128+
- Bug in unequal comparisons between categorical data and a scalar, which was not in the categories (e.g. ``Series(Categorical(list("abc"), ordered=True)) > "d"``. This returned ``False`` for all elements, but now raises a ``TypeError``. Equality comparisons also now return ``False`` for ``==`` and ``True`` for ``!=``. (:issue:`9848`)

pandas/core/categorical.py

+8-1
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,14 @@ def f(self, other):
6161
i = self.categories.get_loc(other)
6262
return getattr(self._codes, op)(i)
6363
else:
64-
return np.repeat(False, len(self))
64+
if op == '__eq__':
65+
return np.repeat(False, len(self))
66+
elif op == '__ne__':
67+
return np.repeat(True, len(self))
68+
else:
69+
msg = "Cannot compare a Categorical for op {op} with a scalar, " \
70+
"which is not a category."
71+
raise TypeError(msg.format(op=op))
6572
else:
6673

6774
# allow categorical vs object dtype array comparisons for equality

pandas/tests/test_categorical.py

+27
Original file line numberDiff line numberDiff line change
@@ -1087,6 +1087,20 @@ def test_reflected_comparison_with_scalars(self):
10871087
self.assert_numpy_array_equal(cat > cat[0], [False, True, True])
10881088
self.assert_numpy_array_equal(cat[0] < cat, [False, True, True])
10891089

1090+
def test_comparison_with_unknown_scalars(self):
1091+
# https://github.com/pydata/pandas/issues/9836#issuecomment-92123057 and following
1092+
# comparisons with scalars not in categories should raise for unequal comps, but not for
1093+
# equal/not equal
1094+
cat = pd.Categorical([1, 2, 3], ordered=True)
1095+
1096+
self.assertRaises(TypeError, lambda: cat < 4)
1097+
self.assertRaises(TypeError, lambda: cat > 4)
1098+
self.assertRaises(TypeError, lambda: 4 < cat)
1099+
self.assertRaises(TypeError, lambda: 4 > cat)
1100+
1101+
self.assert_numpy_array_equal(cat == 4 , [False, False, False])
1102+
self.assert_numpy_array_equal(cat != 4 , [True, True, True])
1103+
10901104

10911105
class TestCategoricalAsBlock(tm.TestCase):
10921106
_multiprocess_can_split_ = True
@@ -2440,6 +2454,19 @@ def f():
24402454
cat > "b"
24412455
self.assertRaises(TypeError, f)
24422456

2457+
# https://github.com/pydata/pandas/issues/9836#issuecomment-92123057 and following
2458+
# comparisons with scalars not in categories should raise for unequal comps, but not for
2459+
# equal/not equal
2460+
cat = Series(Categorical(list("abc"), ordered=True))
2461+
2462+
self.assertRaises(TypeError, lambda: cat < "d")
2463+
self.assertRaises(TypeError, lambda: cat > "d")
2464+
self.assertRaises(TypeError, lambda: "d" < cat)
2465+
self.assertRaises(TypeError, lambda: "d" > cat)
2466+
2467+
self.assert_series_equal(cat == "d" , Series([False, False, False]))
2468+
self.assert_series_equal(cat != "d" , Series([True, True, True]))
2469+
24432470

24442471
# And test NaN handling...
24452472
cat = Series(Categorical(["a","b","c", np.nan]))

0 commit comments

Comments
 (0)