Skip to content

BUG: Fix __ne__ comparison for Categorical #32304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 3, 2020
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ Bug fixes
Categorical
^^^^^^^^^^^
- Bug when passing categorical data to :class:`Index` constructor along with ``dtype=object`` incorrectly returning a :class:`CategoricalIndex` instead of object-dtype :class:`Index` (:issue:`32167`)
-
- Bug where :class:`Categorical` comparison operator ``__ne__`` would incorrectly evaluate to ``False`` when either element was missing (:issue:`32276`)
-

Datetimelike
Expand Down
5 changes: 4 additions & 1 deletion pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,10 @@ def func(self, other):
mask = (self._codes == -1) | (other_codes == -1)
if mask.any():
# In other series, the leads to False, so do that here too
ret[mask] = False
if opname == "__ne__":
ret[mask & (self._codes == other_codes)] = True
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This redundancy was intentional and an attempt to "reuse" the mask calculation above since I think it may short-circuit when mask is False, but could just make it (self._codes == -1) & (other_codes == -1) as well. I also think __ne__ is the only situation that'd have to be special-cased here out of the comparison operators. Also worth pointing out that this is assuming we're dealing with NaN rather than NA.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(self._codes == -1) & (other_codes == -1)

I think this would be more clear. Also there might be a cached _isnan attribute for this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsaxton im working on cleaning up these comparisons, am confused by this line. can we make this use the more common pattern

fill_value = True if op is operator.ne else False
[...]

ret = op(self._codes, other_codes)
mask = (self._codes == -1) | (other_codes == -1)
ret[mask] = fill_value

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel That is much simpler and seems correct to me

else:
ret[mask] = False
return ret

if is_scalar(other):
Expand Down
10 changes: 10 additions & 0 deletions pandas/tests/extension/test_categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,16 @@ def _compare_other(self, s, data, op_name, other):
with pytest.raises(TypeError, match=msg):
op(data, other)

def test_not_equal_with_na(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we get this right in cases with e.g. datetime64 or datetime64tz categories?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to work, added some parameterization over other category types

# https://github.com/pandas-dev/pandas/issues/32276
categories = ["a", "b"]
c1 = Categorical([None, "a"], categories=categories)
c2 = Categorical(["a", "b"], categories=categories)

result = c1 != c2

assert result.all()


class TestParsing(base.BaseParsingTests):
pass