Skip to content

Simplify bool ops closures, improve tests #19795

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -866,3 +866,4 @@ Other
^^^^^

- Improved error message when attempting to use a Python keyword as an identifier in a ``numexpr`` backed query (:issue:`18221`)
- Boolean operations ``&, |, ^`` between a ``Series`` and an ``Index`` will no longer raise ``TypeError`` (:issue:`19792`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a bug, rather an API breaking change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will move.

71 changes: 39 additions & 32 deletions pandas/core/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -1058,69 +1058,76 @@ def _bool_method_SERIES(cls, op, special):
Wrapper function for Series arithmetic operations, to avoid
code duplication.
"""
name = _get_op_name(op, special)

def na_op(x, y):
try:
result = op(x, y)
except TypeError:
if isinstance(y, list):
y = construct_1d_object_array_from_listlike(y)

if isinstance(y, (np.ndarray, ABCSeries)):
if (is_bool_dtype(x.dtype) and is_bool_dtype(y.dtype)):
result = op(x, y) # when would this be hit?
else:
x = _ensure_object(x)
y = _ensure_object(y)
result = lib.vec_binop(x, y, op)
assert not isinstance(y, (list, ABCSeries, pd.Index))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is oddly specific. what about a tuple?

basicaly you are trying to catch ndarray or non-list-like right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion is codifying existing behavior. ATM the line following this checks for isinstance(y, (np.ndarray, ABCSeries)) and the line above converts lists to ndarray. I removed the ABCSeries part, converted lists a step earlier, and added the assertion to make sure we don't accidentally miss either of these cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this and let's instead qualify the below with

if isinstance(y, np.ndarray):
   ...
elif is_list_like(.....):

else:
    # scalar

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add tuple here, or assert is_list_like and not np.ndarray

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe just remove the assertion and make the else into an is_scalar and make an else that raises an AssertionError

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, I’m not inclined to change any behavior until the docs/tests are fleshed out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then let's do that first (in this PR or a pre-cursor)

if isinstance(y, np.ndarray):
# This next assertion is here because there used to be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a useful comment. imagine a new reader coming along with no history

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, not sure what to put here. ATM there is a non-reachable branch with a TODO comment specifically asking how it would be reached. This seemed worth documenting somehow since someone at some point found it worth special-casing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the assertion pretty much explains this, though maybe add some text that we are trying to compare non-bool things here

# a branch that specifically caught this case.
assert not (is_bool_dtype(x) and is_bool_dtype(y))
x = _ensure_object(x)
y = _ensure_object(y)
result = lib.vec_binop(x, y, op)
else:
# let null fall thru
assert is_scalar(y)
if not isna(y):
y = bool(y)
try:
result = lib.scalar_binop(x, y, op)
except:
msg = ("cannot compare a dtyped [{dtype}] array "
"with a scalar of type [{type}]"
).format(dtype=x.dtype, type=type(y).__name__)
raise TypeError(msg)
raise TypeError("cannot compare a dtyped [{dtype}] array "
"with a scalar of type [{typ}]"
.format(dtype=x.dtype,
typ=type(y).__name__))

return result

fill_int = lambda x: x.fillna(0)
fill_bool = lambda x: x.fillna(False).astype(bool)

def wrapper(self, other):
is_self_int_dtype = is_integer_dtype(self.dtype)

fill_int = lambda x: x.fillna(0)
fill_bool = lambda x: x.fillna(False).astype(bool)

self, other = _align_method_SERIES(self, other, align_asobject=True)

res_name = _get_series_op_result_name(self, other)

if isinstance(other, ABCDataFrame):
# Defer to DataFrame implementation; fail early
return NotImplemented

elif isinstance(other, ABCSeries):
name = com._maybe_match_name(self, other)
elif isinstance(other, (ABCSeries, pd.Index)):
is_other_int_dtype = is_integer_dtype(other.dtype)
other = fill_int(other) if is_other_int_dtype else fill_bool(other)

filler = (fill_int if is_self_int_dtype and is_other_int_dtype
else fill_bool)

res_values = na_op(self.values, other.values)
unfilled = self._constructor(res_values,
index=self.index, name=name)
return filler(unfilled)
ovalues = other.values

else:
# scalars, list, tuple, np.array
filler = (fill_int if is_self_int_dtype and
is_integer_dtype(np.asarray(other)) else fill_bool)
is_other_int_dtype = is_integer_dtype(np.asarray(other))
if isinstance(other, list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a list only?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ATM lists are handled inside na_op on 1066-1067. This is just moving the same two lines up the call stack (and outside of an exception-handling block)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment also includes tuple and np.array, should not this check for also for tuple?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends on if we want to treat tuples as "an array the user forgot to wrap in an array". For lists we do that pretty consistently (e.g. on line 1034 and 1172 in core.ops), but not so much for tuples. I don't have a strong opinion as to what the ideal behavior is, but would prefer to address it separately from this cleanup/testing PR. (it could go in the same Issue as the TODO comment just below this isinstance check)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be is_list_like

# TODO: Can we do this before the is_integer_dtype check?
# could the is_integer_dtype check be checking the wrong
# thing? e.g. other = [[0, 1], [2, 3], [4, 5]]?
other = construct_1d_object_array_from_listlike(other)
ovalues = other

res_values = na_op(self.values, other)
unfilled = self._constructor(res_values, index=self.index)
return filler(unfilled).__finalize__(self)
filler = (fill_int if is_self_int_dtype and is_other_int_dtype
else fill_bool)

res_values = na_op(self.values, ovalues)
unfilled = self._constructor(res_values, index=self.index,
name=res_name)
result = filler(unfilled)
if not isinstance(other, ABCSeries):
result = result.__finalize__(self)
return result

wrapper.__name__ = name
return wrapper


Expand Down
Loading