REF: Simplify _comp_method_SERIES #27803

jbrockmendel · 2019-08-07T16:04:41Z

Do raising checks earlier on, matching how this is done elsewhere (moving towards REF/CLN: ops boilerplate #23853)
Comment on line 1034 notes possibly-inconsistent behavior that we should either change or add an explanatory comment for.
Ditto line 1058
Before long we'll be able to collapse all of the dispatch_to_extension_op cases into one case; I'm still troubleshooting some corner cases of this on another branch.
After this, na_op is only called in one place, meaning it will be easier to make further de-nesting/de-closuring simplifications in follow-ups.
Putting all the finalize/_constructor business at the end will make it easier to separate out the array-specific middle and apply it to a) PandasArray and b) Block-wise.

jreback · 2019-08-08T12:35:35Z

jorisvandenbossche

Nice clean-up! Some comments / questions

jorisvandenbossche · 2019-08-08T10:20:04Z

pandas/core/ops/__init__.py

+        result = self._constructor(res_values, index=self.index)
+        # rename is needed in case res_name is None and result.name
+        #  is not.
+        return finalizer(result).rename(res_name)


In what case can res_values have a name? (that seems the only reason that result can have a name)

I think we should try to avoid doing this rename in general (it makes yet another copy of the data, which could of course also be avoided by using an inplace method, but I would rather avoid doing the rename in general)

I also think that is the wrong comment (there were multiple similar in the original code, but they all differed slightly. Here it is not about result.name but self.name)

could put logic for check-if-rename-is-necessary into something resembling _construct_result

__finalize__ is used to propagate metadata of subclasses, so I suppose we should keep it. In this case, we somehow need a way to signal to finalize to not set the name again ..

jorisvandenbossche · 2019-08-08T12:37:26Z

pandas/core/ops/__init__.py

+            and len(other) != len(self)
+            and not isinstance(other, frozenset)
+        ):
+            # TODO: why are we treating len-1 frozenset differently?


Where is this coming from (a test?) I don't see this check in the existing code, it feels a bad idea to introduce such a check if we don't know for which reason

a test, yes. I dont know which off the top of my head, but this was needed to maintain existing behavior. I'd like to change it to maintain/achieve consistency with do things elsewhere.

Looks like the test that makes this necessary is in tests.series.test_operators.TestSeriesComparisons.test_comparison_tuples

s = Series([frozenset([1]), frozenset([1, 2])]) result = s == frozenset([1])

Supported listlikes in object-dtype is a hassle

OK, so the frozenset was an example test case, but there might be others that previously worked like that. So I don't think the solution is to add an explicit frozenset check as done now. Do you know how it got through the checks in the previous version?

previously the length check was only done on ndarray/Index/Series

previously the length check was only done on ndarray/Index/Series

The let's go back to that logic? (or eg at least when the dtype is object?) As this will now break comparison of object dtypes that hold list-like scalar objects

sure. we should have a larger discussion on how we can handle this more consistently across the codebase

jorisvandenbossche · 2019-08-08T12:39:32Z

pandas/core/ops/__init__.py


        else:
-            values = self.to_numpy()
+            lvalues = extract_array(self, extract_numpy=True)


Is the extract_numpy=True needed?

jorisvandenbossche · 2019-08-08T13:01:09Z

pandas/core/ops/__init__.py

+
+        # TODO: shouldn't we be applying finalize whenever
+        #  not isinstance(other, ABCSeries)?
+        finalizer = (


This is related to the renaming issue I commented about below.
IIUC, the rename is needed because the finalize sets back the original Series name, which might not be desired.
I think it would be good to make that clearer (eg in the comments)

shouldn't we be applying finalize whenever not isinstance(other, ABCSeries)?

Yes, that seems correct to me

jbrockmendel added 2 commits August 7, 2019 08:53

refactor comp_method_SERIES to de-duplicate

ed6f36b

Remove not-yet-necessary

d095d60

jbrockmendel mentioned this pull request Aug 8, 2019

REF: implement should_extension_dispatch #27815

Merged

jreback added the Clean label Aug 8, 2019

jreback added this to the 1.0 milestone Aug 8, 2019

jreback merged commit f00905e into pandas-dev:master Aug 8, 2019

jorisvandenbossche reviewed Aug 8, 2019

View reviewed changes

jbrockmendel deleted the boiler1 branch August 8, 2019 15:19

jbrockmendel mentioned this pull request Aug 12, 2019

requested edit to comparison method #27873

Merged

quintusdias pushed a commit to quintusdias/pandas_dev that referenced this pull request Aug 16, 2019

REF: Simplify _comp_method_SERIES (pandas-dev#27803)

3196af4

Uh oh!

REF: Simplify _comp_method_SERIES #27803

REF: Simplify _comp_method_SERIES #27803

Uh oh!

Conversation

jbrockmendel commented Aug 7, 2019

Uh oh!

jreback commented Aug 8, 2019

Uh oh!

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Aug 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jorisvandenbossche Aug 8, 2019 •

edited

Loading