Skip to content

REF: Simplify _comp_method_SERIES #27803

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 8, 2019
Merged

Conversation

jbrockmendel
Copy link
Member

  • Do raising checks earlier on, matching how this is done elsewhere (moving towards REF/CLN: ops boilerplate #23853)
  • Comment on line 1034 notes possibly-inconsistent behavior that we should either change or add an explanatory comment for.
  • Ditto line 1058
  • Before long we'll be able to collapse all of the dispatch_to_extension_op cases into one case; I'm still troubleshooting some corner cases of this on another branch.
  • After this, na_op is only called in one place, meaning it will be easier to make further de-nesting/de-closuring simplifications in follow-ups.
  • Putting all the finalize/_constructor business at the end will make it easier to separate out the array-specific middle and apply it to a) PandasArray and b) Block-wise.

@jreback jreback added the Clean label Aug 8, 2019
@jreback jreback added this to the 1.0 milestone Aug 8, 2019
@jreback jreback merged commit f00905e into pandas-dev:master Aug 8, 2019
@jreback
Copy link
Contributor

jreback commented Aug 8, 2019

thanks @jbrockmendel

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice clean-up! Some comments / questions

result = self._constructor(res_values, index=self.index)
# rename is needed in case res_name is None and result.name
# is not.
return finalizer(result).rename(res_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what case can res_values have a name? (that seems the only reason that result can have a name)

I think we should try to avoid doing this rename in general (it makes yet another copy of the data, which could of course also be avoided by using an inplace method, but I would rather avoid doing the rename in general)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that is the wrong comment (there were multiple similar in the original code, but they all differed slightly. Here it is not about result.name but self.name)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could put logic for check-if-rename-is-necessary into something resembling _construct_result

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__finalize__ is used to propagate metadata of subclasses, so I suppose we should keep it. In this case, we somehow need a way to signal to finalize to not set the name again ..

and len(other) != len(self)
and not isinstance(other, frozenset)
):
# TODO: why are we treating len-1 frozenset differently?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this coming from (a test?) I don't see this check in the existing code, it feels a bad idea to introduce such a check if we don't know for which reason

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a test, yes. I dont know which off the top of my head, but this was needed to maintain existing behavior. I'd like to change it to maintain/achieve consistency with do things elsewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the test that makes this necessary is in tests.series.test_operators.TestSeriesComparisons.test_comparison_tuples

s = Series([frozenset([1]), frozenset([1, 2])])
result = s == frozenset([1])

Supported listlikes in object-dtype is a hassle

Copy link
Member

@jorisvandenbossche jorisvandenbossche Aug 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so the frozenset was an example test case, but there might be others that previously worked like that. So I don't think the solution is to add an explicit frozenset check as done now. Do you know how it got through the checks in the previous version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously the length check was only done on ndarray/Index/Series

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously the length check was only done on ndarray/Index/Series

The let's go back to that logic? (or eg at least when the dtype is object?) As this will now break comparison of object dtypes that hold list-like scalar objects

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. we should have a larger discussion on how we can handle this more consistently across the codebase


else:
values = self.to_numpy()
lvalues = extract_array(self, extract_numpy=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the extract_numpy=True needed?


# TODO: shouldn't we be applying finalize whenever
# not isinstance(other, ABCSeries)?
finalizer = (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to the renaming issue I commented about below.
IIUC, the rename is needed because the finalize sets back the original Series name, which might not be desired.
I think it would be good to make that clearer (eg in the comments)

shouldn't we be applying finalize whenever not isinstance(other, ABCSeries)?

Yes, that seems correct to me

@jbrockmendel jbrockmendel deleted the boiler1 branch August 8, 2019 15:19
quintusdias pushed a commit to quintusdias/pandas_dev that referenced this pull request Aug 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants