-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF: Simplify _comp_method_SERIES #27803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
thanks @jbrockmendel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice clean-up! Some comments / questions
result = self._constructor(res_values, index=self.index) | ||
# rename is needed in case res_name is None and result.name | ||
# is not. | ||
return finalizer(result).rename(res_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what case can res_values
have a name? (that seems the only reason that result
can have a name)
I think we should try to avoid doing this rename in general (it makes yet another copy of the data, which could of course also be avoided by using an inplace method, but I would rather avoid doing the rename in general)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think that is the wrong comment (there were multiple similar in the original code, but they all differed slightly. Here it is not about result.name but self.name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could put logic for check-if-rename-is-necessary into something resembling _construct_result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__finalize__
is used to propagate metadata of subclasses, so I suppose we should keep it. In this case, we somehow need a way to signal to finalize to not set the name again ..
and len(other) != len(self) | ||
and not isinstance(other, frozenset) | ||
): | ||
# TODO: why are we treating len-1 frozenset differently? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this coming from (a test?) I don't see this check in the existing code, it feels a bad idea to introduce such a check if we don't know for which reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a test, yes. I dont know which off the top of my head, but this was needed to maintain existing behavior. I'd like to change it to maintain/achieve consistency with do things elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the test that makes this necessary is in tests.series.test_operators.TestSeriesComparisons.test_comparison_tuples
s = Series([frozenset([1]), frozenset([1, 2])])
result = s == frozenset([1])
Supported listlikes in object-dtype is a hassle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so the frozenset was an example test case, but there might be others that previously worked like that. So I don't think the solution is to add an explicit frozenset check as done now. Do you know how it got through the checks in the previous version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously the length check was only done on ndarray/Index/Series
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously the length check was only done on ndarray/Index/Series
The let's go back to that logic? (or eg at least when the dtype is object?) As this will now break comparison of object dtypes that hold list-like scalar objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. we should have a larger discussion on how we can handle this more consistently across the codebase
|
||
else: | ||
values = self.to_numpy() | ||
lvalues = extract_array(self, extract_numpy=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the extract_numpy=True
needed?
|
||
# TODO: shouldn't we be applying finalize whenever | ||
# not isinstance(other, ABCSeries)? | ||
finalizer = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is related to the renaming issue I commented about below.
IIUC, the rename is needed because the finalize sets back the original Series name, which might not be desired.
I think it would be good to make that clearer (eg in the comments)
shouldn't we be applying finalize whenever not isinstance(other, ABCSeries)?
Yes, that seems correct to me
na_op
is only called in one place, meaning it will be easier to make further de-nesting/de-closuring simplifications in follow-ups.