-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF: Simplify _comp_method_SERIES #27803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1029,6 +1029,15 @@ def wrapper(self, other, axis=None): | |
self._get_axis_number(axis) | ||
|
||
res_name = get_op_result_name(self, other) | ||
other = lib.item_from_zerodim(other) | ||
|
||
# TODO: shouldn't we be applying finalize whenever | ||
# not isinstance(other, ABCSeries)? | ||
finalizer = ( | ||
lambda x: x.__finalize__(self) | ||
if isinstance(other, (np.ndarray, ABCIndexClass)) | ||
else x | ||
) | ||
|
||
if isinstance(other, list): | ||
# TODO: same for tuples? | ||
|
@@ -1041,80 +1050,60 @@ def wrapper(self, other, axis=None): | |
elif isinstance(other, ABCSeries) and not self._indexed_same(other): | ||
raise ValueError("Can only compare identically-labeled Series objects") | ||
|
||
elif is_categorical_dtype(self): | ||
elif ( | ||
is_list_like(other) | ||
and len(other) != len(self) | ||
and not isinstance(other, frozenset) | ||
): | ||
# TODO: why are we treating len-1 frozenset differently? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where is this coming from (a test?) I don't see this check in the existing code, it feels a bad idea to introduce such a check if we don't know for which reason There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. a test, yes. I dont know which off the top of my head, but this was needed to maintain existing behavior. I'd like to change it to maintain/achieve consistency with do things elsewhere. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like the test that makes this necessary is in tests.series.test_operators.TestSeriesComparisons.test_comparison_tuples
Supported listlikes in object-dtype is a hassle There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, so the frozenset was an example test case, but there might be others that previously worked like that. So I don't think the solution is to add an explicit frozenset check as done now. Do you know how it got through the checks in the previous version? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. previously the length check was only done on ndarray/Index/Series There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The let's go back to that logic? (or eg at least when the dtype is object?) As this will now break comparison of object dtypes that hold list-like scalar objects There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sure. we should have a larger discussion on how we can handle this more consistently across the codebase |
||
raise ValueError("Lengths must match to compare") | ||
|
||
if is_categorical_dtype(self): | ||
# Dispatch to Categorical implementation; CategoricalIndex | ||
# behavior is non-canonical GH#19513 | ||
res_values = dispatch_to_extension_op(op, self, other) | ||
return self._constructor(res_values, index=self.index, name=res_name) | ||
|
||
elif is_datetime64_dtype(self) or is_datetime64tz_dtype(self): | ||
# Dispatch to DatetimeIndex to ensure identical | ||
# Series/Index behavior | ||
from pandas.core.arrays import DatetimeArray | ||
|
||
res_values = dispatch_to_extension_op(op, DatetimeArray(self), other) | ||
return self._constructor(res_values, index=self.index, name=res_name) | ||
|
||
elif is_timedelta64_dtype(self): | ||
from pandas.core.arrays import TimedeltaArray | ||
|
||
res_values = dispatch_to_extension_op(op, TimedeltaArray(self), other) | ||
return self._constructor(res_values, index=self.index, name=res_name) | ||
|
||
elif is_extension_array_dtype(self) or ( | ||
is_extension_array_dtype(other) and not is_scalar(other) | ||
): | ||
# Note: the `not is_scalar(other)` condition rules out | ||
# e.g. other == "category" | ||
# e.g. other == "category" | ||
res_values = dispatch_to_extension_op(op, self, other) | ||
return self._constructor(res_values, index=self.index).rename(res_name) | ||
|
||
elif isinstance(other, ABCSeries): | ||
# By this point we have checked that self._indexed_same(other) | ||
res_values = na_op(self.values, other.values) | ||
# rename is needed in case res_name is None and res_values.name | ||
# is not. | ||
return self._constructor( | ||
res_values, index=self.index, name=res_name | ||
).rename(res_name) | ||
|
||
elif isinstance(other, (np.ndarray, ABCIndexClass)): | ||
# do not check length of zerodim array | ||
# as it will broadcast | ||
if other.ndim != 0 and len(self) != len(other): | ||
raise ValueError("Lengths must match to compare") | ||
|
||
res_values = na_op(self.values, np.asarray(other)) | ||
result = self._constructor(res_values, index=self.index) | ||
# rename is needed in case res_name is None and self.name | ||
# is not. | ||
return result.__finalize__(self).rename(res_name) | ||
|
||
elif is_scalar(other) and isna(other): | ||
# numpy does not like comparisons vs None | ||
if op is operator.ne: | ||
res_values = np.ones(len(self), dtype=bool) | ||
else: | ||
res_values = np.zeros(len(self), dtype=bool) | ||
return self._constructor( | ||
res_values, index=self.index, name=res_name, dtype="bool" | ||
) | ||
|
||
else: | ||
values = self.to_numpy() | ||
lvalues = extract_array(self, extract_numpy=True) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the |
||
rvalues = extract_array(other, extract_numpy=True) | ||
|
||
with np.errstate(all="ignore"): | ||
res = na_op(values, other) | ||
if is_scalar(res): | ||
res_values = na_op(lvalues, rvalues) | ||
if is_scalar(res_values): | ||
raise TypeError( | ||
"Could not compare {typ} type with Series".format(typ=type(other)) | ||
) | ||
|
||
# always return a full value series here | ||
res_values = extract_array(res, extract_numpy=True) | ||
return self._constructor( | ||
res_values, index=self.index, name=res_name, dtype="bool" | ||
) | ||
result = self._constructor(res_values, index=self.index) | ||
# rename is needed in case res_name is None and result.name | ||
# is not. | ||
return finalizer(result).rename(res_name) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In what case can I think we should try to avoid doing this rename in general (it makes yet another copy of the data, which could of course also be avoided by using an inplace method, but I would rather avoid doing the rename in general) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also think that is the wrong comment (there were multiple similar in the original code, but they all differed slightly. Here it is not about result.name but self.name) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could put logic for check-if-rename-is-necessary into something resembling _construct_result There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
wrapper.__name__ = op_name | ||
return wrapper | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is related to the renaming issue I commented about below.
IIUC, the rename is needed because the finalize sets back the original Series name, which might not be desired.
I think it would be good to make that clearer (eg in the comments)
Yes, that seems correct to me