Skip to content

REF: Simplify _comp_method_SERIES #27803

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 8, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 27 additions & 38 deletions pandas/core/ops/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -1029,6 +1029,15 @@ def wrapper(self, other, axis=None):
self._get_axis_number(axis)

res_name = get_op_result_name(self, other)
other = lib.item_from_zerodim(other)

# TODO: shouldn't we be applying finalize whenever
# not isinstance(other, ABCSeries)?
finalizer = (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to the renaming issue I commented about below.
IIUC, the rename is needed because the finalize sets back the original Series name, which might not be desired.
I think it would be good to make that clearer (eg in the comments)

shouldn't we be applying finalize whenever not isinstance(other, ABCSeries)?

Yes, that seems correct to me

lambda x: x.__finalize__(self)
if isinstance(other, (np.ndarray, ABCIndexClass))
else x
)

if isinstance(other, list):
# TODO: same for tuples?
Expand All @@ -1041,80 +1050,60 @@ def wrapper(self, other, axis=None):
elif isinstance(other, ABCSeries) and not self._indexed_same(other):
raise ValueError("Can only compare identically-labeled Series objects")

elif is_categorical_dtype(self):
elif (
is_list_like(other)
and len(other) != len(self)
and not isinstance(other, frozenset)
):
# TODO: why are we treating len-1 frozenset differently?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this coming from (a test?) I don't see this check in the existing code, it feels a bad idea to introduce such a check if we don't know for which reason

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a test, yes. I dont know which off the top of my head, but this was needed to maintain existing behavior. I'd like to change it to maintain/achieve consistency with do things elsewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the test that makes this necessary is in tests.series.test_operators.TestSeriesComparisons.test_comparison_tuples

s = Series([frozenset([1]), frozenset([1, 2])])
result = s == frozenset([1])

Supported listlikes in object-dtype is a hassle

Copy link
Member

@jorisvandenbossche jorisvandenbossche Aug 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so the frozenset was an example test case, but there might be others that previously worked like that. So I don't think the solution is to add an explicit frozenset check as done now. Do you know how it got through the checks in the previous version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously the length check was only done on ndarray/Index/Series

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously the length check was only done on ndarray/Index/Series

The let's go back to that logic? (or eg at least when the dtype is object?) As this will now break comparison of object dtypes that hold list-like scalar objects

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. we should have a larger discussion on how we can handle this more consistently across the codebase

raise ValueError("Lengths must match to compare")

if is_categorical_dtype(self):
# Dispatch to Categorical implementation; CategoricalIndex
# behavior is non-canonical GH#19513
res_values = dispatch_to_extension_op(op, self, other)
return self._constructor(res_values, index=self.index, name=res_name)

elif is_datetime64_dtype(self) or is_datetime64tz_dtype(self):
# Dispatch to DatetimeIndex to ensure identical
# Series/Index behavior
from pandas.core.arrays import DatetimeArray

res_values = dispatch_to_extension_op(op, DatetimeArray(self), other)
return self._constructor(res_values, index=self.index, name=res_name)

elif is_timedelta64_dtype(self):
from pandas.core.arrays import TimedeltaArray

res_values = dispatch_to_extension_op(op, TimedeltaArray(self), other)
return self._constructor(res_values, index=self.index, name=res_name)

elif is_extension_array_dtype(self) or (
is_extension_array_dtype(other) and not is_scalar(other)
):
# Note: the `not is_scalar(other)` condition rules out
# e.g. other == "category"
# e.g. other == "category"
res_values = dispatch_to_extension_op(op, self, other)
return self._constructor(res_values, index=self.index).rename(res_name)

elif isinstance(other, ABCSeries):
# By this point we have checked that self._indexed_same(other)
res_values = na_op(self.values, other.values)
# rename is needed in case res_name is None and res_values.name
# is not.
return self._constructor(
res_values, index=self.index, name=res_name
).rename(res_name)

elif isinstance(other, (np.ndarray, ABCIndexClass)):
# do not check length of zerodim array
# as it will broadcast
if other.ndim != 0 and len(self) != len(other):
raise ValueError("Lengths must match to compare")

res_values = na_op(self.values, np.asarray(other))
result = self._constructor(res_values, index=self.index)
# rename is needed in case res_name is None and self.name
# is not.
return result.__finalize__(self).rename(res_name)

elif is_scalar(other) and isna(other):
# numpy does not like comparisons vs None
if op is operator.ne:
res_values = np.ones(len(self), dtype=bool)
else:
res_values = np.zeros(len(self), dtype=bool)
return self._constructor(
res_values, index=self.index, name=res_name, dtype="bool"
)

else:
values = self.to_numpy()
lvalues = extract_array(self, extract_numpy=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the extract_numpy=True needed?

rvalues = extract_array(other, extract_numpy=True)

with np.errstate(all="ignore"):
res = na_op(values, other)
if is_scalar(res):
res_values = na_op(lvalues, rvalues)
if is_scalar(res_values):
raise TypeError(
"Could not compare {typ} type with Series".format(typ=type(other))
)

# always return a full value series here
res_values = extract_array(res, extract_numpy=True)
return self._constructor(
res_values, index=self.index, name=res_name, dtype="bool"
)
result = self._constructor(res_values, index=self.index)
# rename is needed in case res_name is None and result.name
# is not.
return finalizer(result).rename(res_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what case can res_values have a name? (that seems the only reason that result can have a name)

I think we should try to avoid doing this rename in general (it makes yet another copy of the data, which could of course also be avoided by using an inplace method, but I would rather avoid doing the rename in general)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that is the wrong comment (there were multiple similar in the original code, but they all differed slightly. Here it is not about result.name but self.name)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could put logic for check-if-rename-is-necessary into something resembling _construct_result

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__finalize__ is used to propagate metadata of subclasses, so I suppose we should keep it. In this case, we somehow need a way to signal to finalize to not set the name again ..


wrapper.__name__ = op_name
return wrapper
Expand Down