REF: implement arithmetic, comparison, logical ops on arrays #28431

jbrockmendel · 2019-09-13T16:20:35Z

Following #28413, #28396, and #28395, these are all pretty straightforward refactors.

arithmetic_op, comparison_op, and logical_op are going to become the ops that we call Block-wise. Hopefully also PandasArray will use them directly.

Other follow-ups I have in mind:

Move should_extension_dispatch and dispatch_to_extension_op so we dont need runtime imports
get rid of eval_kwargs entirely, they're not really necessary at this point
see if we can use numexpr for comparison/logical ops
docstrings
address inconsistencies in when we call extract_array
address inconsistencies in how Series methods handle __finalize__ and alignment
simplify logical_op's na_op

…ray_ops

TomAugspurger · 2019-09-13T20:47:47Z

pandas/core/ops/array_ops.py

@@ -132,3 +152,171 @@ def na_arithmetic_op(left, right, op, str_rep, eval_kwargs):
        result = masked_arith_op(left, right, op)

    return missing.dispatch_fill_zeros(op, left, right, result)
+
+
+def arithmetic_op(left, right, op, str_rep, eval_kwargs):


a docstring would be nice.

Can you explain the flow a bit here?
When do you end up here? What has already been done?
Eg if you have a left and right where left.__op__(right) would return NotImplemeted, but right.__op__(left) works, do you end up here and in what form?

Maybe a few examples of left and right types and how they are extracted before ending up here would help.

Can you explain the flow a bit here?
When do you end up here? What has already been done?

This will all go in a docstring before long (hopefully in a separate pass, as this is a blocker for arithmetic-perf which I'm eager to get done). The idea:

Everything Series-specific is done outside of this func (same with the comparison and logical funcs): potentially returning NotImplemented, finding res_name, checking/ensuring Series alignment (comparison is different from the other two), calling _constructor and __finalize__.

We should never get here (for any of these funcs) with a DataFrame, Series, or Index for left. We should never get here with a DataFrame for right. We can almost rule out Series or Index for right, the unfortunate exception being L176-180.

This will all go in a docstring before long (hopefully in a separate pass, as this is a blocker for arithmetic-perf which I'm eager to get done)

Why not in this PR? Writing and reviewing a docstring should not take weeks, and it makes reviewing it here easier

Doing a simple docstring would be easy, but since we are dealing with very central methods with a number of corner cases, doing a thorough docstring is not. This PR is intended as a just-refactor, with a number of follow-ups when this blocker is done.

TomAugspurger · 2019-09-13T20:49:50Z

pandas/core/ops/array_ops.py

+        ),
+    )
+
+    lvalues = extract_array(left, extract_numpy=True)


So we're completely done with deferring via NotImplemented by the time we get here?

Yes. So for e.g. PandasArray we will be able to use these directly only after #28037 is in.

pandas/core/ops/array_ops.py

…ray_ops

jbrockmendel · 2019-09-13T22:18:47Z

Added requested comments and types. Holding off on docstrings since doing those right will be non-trivial.

jbrockmendel · 2019-09-14T00:35:03Z

reverted types because mypy was complaining about indexes.base

jbrockmendel · 2019-09-16T14:18:56Z

@jreback gentle ping. This is the last blocker before I can implement block-wise ops for the scalar case (and one of the two Series cases)

jorisvandenbossche

This is really nice! Added a bunch of questions / comments

jorisvandenbossche · 2019-09-17T12:02:44Z

pandas/core/ops/array_ops.py

@@ -132,3 +152,171 @@ def na_arithmetic_op(left, right, op, str_rep, eval_kwargs):
        result = masked_arith_op(left, right, op)

    return missing.dispatch_fill_zeros(op, left, right, result)
+
+
+def arithmetic_op(left, right, op, str_rep, eval_kwargs):


Can you explain the flow a bit here?
When do you end up here? What has already been done?
Eg if you have a left and right where left.__op__(right) would return NotImplemeted, but right.__op__(left) works, do you end up here and in what form?

Maybe a few examples of left and right types and how they are extracted before ending up here would help.

jorisvandenbossche · 2019-09-17T12:03:58Z

pandas/core/ops/array_ops.py

+    #  cannot make the same assumption about `right`.  This is because we need
+    #  to define `keep_null_freq` before calling extract_array on it.
+    lvalues = left
+    rvalues = extract_array(right, extract_numpy=True)


but DatetimelikeArrays keep the freq ? So why is it not possible to extract up to the array level for those?

And for a scalar Timestamp, does this freq actually matter?

And for a scalar Timestamp, does this freq actually matter?

The behavior is deprecated, but ts + 5 currently behaves like ts + 5*ts.freq, so yes, freq does matter.

but DatetimelikeArrays keep the freq ? So why is it not possible to extract up to the array level for those?

Suppose we have self = Series(range(5)), so lvalues = np.array(range(5)). Consider two cases for other that we are going to add to self:

other = DatetimeIndex(["2016-01-01"] * 5, tz="UTC")

other = Series(DatetimeIndex(["2016-01-01"] * 5, tz="UTC"))

In the first case, other has a freq attribute that is None, so the correct thing to do is raise NullFrequencyError (btw this behavior is deprecated as with Timestamp). In the second, there is no freq attribute, so the correct thing to do is raise TypeError. If we did the unwrapping before getting to arithmetic_op, we would incorrectly raise NullFrequencyError in case 2.

can you move both extract_array's in here? would make groking this simpler

can you move both extract_array's in here? would make groking this simpler

I think the idea is to move both out longer term (once the freq business is fixed). If that is the case, I think it is fine for now to already have extract_array of left higher up.

In the first case, other has a freq attribute that is None, so the correct thing to do is raise NullFrequencyError (btw this behavior is deprecated as with Timestamp). In the second, there is no freq attribute, so the correct thing to do is raise TypeError. If we did the unwrapping before getting to arithmetic_op, we would incorrectly raise NullFrequencyError in case 2.

OK, I see. But once the deprecation is removed, this will always be a type error? And then this does not need a special case here? (just for understanding, not expecting it be done in this PR ;))

But once the deprecation is removed, this will always be a type error? And then this does not need a special case here? (just for understanding, not expecting it be done in this PR ;))

Correct x 2.

pandas/core/ops/array_ops.py

jorisvandenbossche · 2019-09-17T12:12:44Z

pandas/core/ops/array_ops.py

+    lvalues = left
+    rvalues = right
+
+    rvalues = lib.item_from_zerodim(rvalues)


This isn't done yet higher up? (arithmetic_op is not doing this?)

arithmetic_op doesnt, but probably should. Things like this are the motivation for #28037.

pandas/core/ops/array_ops.py

jorisvandenbossche · 2019-09-17T12:16:51Z

pandas/core/ops/array_ops.py

+    rvalues = maybe_upcast_for_op(rvalues, lvalues.shape)
+
+    if should_extension_dispatch(left, rvalues) or isinstance(
+        rvalues, (ABCTimedeltaArray, ABCDatetimeArray, Timestamp)


Why is the check for rvalues being ABCTimedeltaArray or ABCDatetimeArray not incluced in should_extension_dispatch?

historically should_extension_dispatch was more focused on dispatching to EA implementations, while the TDA/DTA/Timestamp check is specific to handling the freq attribute. Combining them wouldn't be unreasonable.

jorisvandenbossche · 2019-09-17T12:25:00Z

pandas/core/ops/array_ops.py

+def logical_op(left, right, op):
+    from pandas.core.ops import should_extension_dispatch, dispatch_to_extension_op
+
+    def na_op(x, y):


make this a non-nested function like na_arithmetic_op ?

good idea. the logical op is kind of a mess internal-consistency-wise, so I'm planning to do this in a dedicated PR

Can you do that here, since you are already doing the move from the other file in this PR? (it's just a copy paste I assume?)
Then a next PR editing it will be easier to review.

…ray_ops

jbrockmendel · 2019-09-18T00:03:45Z

updated with types and bare-bones docstrings

WillAyd · 2019-09-18T00:11:30Z

pandas/core/ops/array_ops.py

+
+
+def comparison_op(
+    left: Union[np.ndarray, ABCExtensionArray], right: Any, op


Haven't reviewed method body in detail but you might want to use ArrayLike from pandas._typing instead of the Union here. The former is a TypeVar which maintains type generically through the function

For example, if left is an ndarray using the TypeVar would mean that the function also returns an ndarray. By contrast a Union would allow left to be a ndarray but the function to return either an ndarray or a ExtensionArray

NB: ndarray and ABC* classes both resolve to Any so this helps documentation, but doesn't actually type check

For example, if left is an ndarray using the TypeVar would mean that the function also returns an ndarray

We do not have that guarantee for these functions.

jreback

this is mostly just a cut/paste yes?

jreback · 2019-09-18T12:29:56Z

pandas/core/ops/array_ops.py

+    #  cannot make the same assumption about `right`.  This is because we need
+    #  to define `keep_null_freq` before calling extract_array on it.
+    lvalues = left
+    rvalues = extract_array(right, extract_numpy=True)


can you move both extract_array's in here? would make groking this simpler

jreback · 2019-09-18T12:30:31Z

pandas/core/ops/array_ops.py

+    """
+    from pandas.core.ops import should_extension_dispatch, dispatch_to_extension_op
+
+    # NB: We assume extract_array has already been called on left and right


same comment as above about extract_array

jbrockmendel · 2019-09-18T13:50:43Z

this is mostly just a cut/paste yes?

Correct. types and basic docstrings have been added.

jbrockmendel · 2019-09-18T13:56:46Z

can you move both extract_array's in here? would make groking this simpler

I agree that having extract_array in exactly one place is weird. We'll be able to get rid of it after the integer-addition deprecation is enforced (and keep_null_freq becomes unnecessary).

The tradeoff is that doing all 6 extract_array calls inside the functions would break the typing "guarantees", which I'm starting to like.

jorisvandenbossche · 2019-09-18T13:56:39Z

pandas/core/ops/array_ops.py

+    #  cannot make the same assumption about `right`.  This is because we need
+    #  to define `keep_null_freq` before calling extract_array on it.
+    lvalues = left
+    rvalues = extract_array(right, extract_numpy=True)


can you move both extract_array's in here? would make groking this simpler

I think the idea is to move both out longer term (once the freq business is fixed). If that is the case, I think it is fine for now to already have extract_array of left higher up.

jorisvandenbossche · 2019-09-18T13:58:14Z

pandas/core/ops/array_ops.py

+    #  cannot make the same assumption about `right`.  This is because we need
+    #  to define `keep_null_freq` before calling extract_array on it.
+    lvalues = left
+    rvalues = extract_array(right, extract_numpy=True)


In the first case, other has a freq attribute that is None, so the correct thing to do is raise NullFrequencyError (btw this behavior is deprecated as with Timestamp). In the second, there is no freq attribute, so the correct thing to do is raise TypeError. If we did the unwrapping before getting to arithmetic_op, we would incorrectly raise NullFrequencyError in case 2.

OK, I see. But once the deprecation is removed, this will always be a type error? And then this does not need a special case here? (just for understanding, not expecting it be done in this PR ;))

pandas/core/ops/array_ops.py

jorisvandenbossche · 2019-09-18T14:01:40Z

pandas/core/ops/array_ops.py

+    ----------
+    left : np.ndarray or ExtensionArray
+    right : object
+        Cannot be a DataFrame, Series, or Index.


Is it possible to be more precise and say that it is a ndarray, ExtensionArray or scalar?
Or are there still other types of objects that can be passed?

any wacky thing that you can pass to Series.__add__ can end up here with the exception of those three classes.

(maybe __eq__ would be a better example)

But eg a list, is that already coerced to an array by the time it is here?
Depending on what work has been done before this, the "any wacky thing you can pass" can still be considered as "a scalar"

Some coercion (including list) is done a few lines down on L245-248.

Yes to everything up through "applied consistently". For the "or scalar" I'm reticent to make a definitive statement because we can brainstorm scalar-like things that wouldn't satisfy lib.is_scalar (possible also some special cases for tuple)

scalar-like things that wouldn't satisfy lib.is_scalar

That's only a problem for object / generic EAs (ones we don't know) right? I mean for all other types you could have that guarantee at this location in the code?

No. A user can pass something like ser + type(ser) or ser + pd and we an get here.

But for all built-in types except object dtype, we know that such objects are invalid, so that could (consistently) be checked before getting here? (similarly like you want to move the length check out I thought)

(again, about long term plan, not this PR)

I guess we could in principle. I don't see it as a priority.

jorisvandenbossche · 2019-09-18T14:03:24Z

pandas/core/ops/array_ops.py

+def logical_op(left, right, op):
+    from pandas.core.ops import should_extension_dispatch, dispatch_to_extension_op
+
+    def na_op(x, y):


Can you do that here, since you are already doing the move from the other file in this PR? (it's just a copy paste I assume?)
Then a next PR editing it will be easier to review.

…ray_ops

jorisvandenbossche · 2019-09-18T18:42:40Z

[@jreback] can you move both extract_array's in here? would make groking this simpler

I agree that having extract_array in exactly one place is weird. We'll be able to get rid of it after the integer-addition deprecation is enforced (and keep_null_freq becomes unnecessary).

The tradeoff is that doing all 6 extract_array calls inside the functions would break the typing "guarantees", which I'm starting to like.

I agree with @jbrockmendel that it is nice to already know what types are passed to those functions. So I think what is done in the PR (do extract_array outside, except for the one case where it is needed for the deprecated functionality) is fine. It's only temporary (we should remove that deprecation for 1.0), and gives the least code flux afterwards (only that one case can be moved out afterwards.

jorisvandenbossche

This looks good to me

…ray_ops

TomAugspurger

Agreed. Let's give @jreback a day or two to see if he has time to look and then merge otherwise.

…ray_ops

jbrockmendel · 2019-09-21T21:56:14Z

rebased+green

jreback

just some minor comments, ping on green.

jreback · 2019-09-23T11:54:20Z

pandas/core/ops/__init__.py

 from pandas.core.ops.docstrings import (
    _arith_doc_FRAME,
    _flex_comp_doc_FRAME,
    _make_flex_doc,
    _op_descriptions,
 )
-from pandas.core.ops.invalid import invalid_comparison
+from pandas.core.ops.invalid import invalid_comparison  # noqa:F401


is this still needed? (also on comp_method_OBJECT_ARRAY)

Yes. They are imported elsewhere. This will get cleaned up as we get more stuff out of __init__

jreback · 2019-09-23T11:56:29Z

pandas/core/ops/array_ops.py

+
+
+def na_logical_op(x, y, op):
+    try:


if you can type / doc-string at some point

Sure. This function is headed for a pretty major re-write in follow-ups.

jreback · 2019-09-23T11:57:12Z

pandas/core/ops/array_ops.py

+    if should_extension_dispatch(lvalues, rvalues):
+        res_values = dispatch_to_extension_op(op, lvalues, rvalues)
+
+    else:


can be direct else/if

Not without duplicating L378-383 i think

jreback · 2019-09-23T12:55:13Z

thanks @jbrockmendel couple of followup requests noted above

…dev#28431)

jbrockmendel added 4 commits September 12, 2019 15:25

REF: implement logical and comparison array ops

9a617e3

Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…

3f414b1

…ray_ops

Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…

396b4a8

…ray_ops

implement arithmetic_op

56dff20

TomAugspurger reviewed Sep 13, 2019

View reviewed changes

jbrockmendel added 2 commits September 13, 2019 15:03

Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…

77e3241

…ray_ops

add comments, types

148a8e8

jbrockmendel added 2 commits September 13, 2019 15:48

typo fixup

fcf9735

revert types

fec86de

gfyoung added Clean Internals Related to non-user accessible pandas implementation labels Sep 15, 2019

jorisvandenbossche reviewed Sep 17, 2019

View reviewed changes

TomAugspurger added this to the 1.0 milestone Sep 17, 2019

jbrockmendel added 5 commits September 17, 2019 15:48

add types

2abdccb

docstrings

121d783

Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…

5faa820

…ray_ops

ignore type

267c7ca

revert technically-incorrect type

0b5aa34

WillAyd reviewed Sep 18, 2019

View reviewed changes

jreback requested changes Sep 18, 2019

View reviewed changes

jorisvandenbossche reviewed Sep 18, 2019

View reviewed changes

jbrockmendel added 2 commits September 18, 2019 07:17

Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…

ad6da57

…ray_ops

REF: move na_op out

8ced97b

jorisvandenbossche approved these changes Sep 18, 2019

View reviewed changes

jbrockmendel added 2 commits September 18, 2019 16:14

Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…

6b9bce0

…ray_ops

Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…

b0d6263

…ray_ops

TomAugspurger approved these changes Sep 19, 2019

View reviewed changes

jbrockmendel mentioned this pull request Sep 20, 2019

TST: parametrize test_expressions #28493

Merged

jbrockmendel added 2 commits September 20, 2019 13:20

Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…

3226401

…ray_ops

Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…

dc003e4

…ray_ops

jreback reviewed Sep 23, 2019

View reviewed changes

jreback approved these changes Sep 23, 2019

View reviewed changes

jreback merged commit b106108 into pandas-dev:master Sep 23, 2019

jbrockmendel deleted the array_ops branch September 23, 2019 14:29

simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this pull request Sep 24, 2019

update pandas.core.ops.array_ops following pandas-dev#28431

dec5be7

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019

REF: implement arithmetic, comparison, logical ops on arrays (pandas-…

80126f9

…dev#28431)

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019

REF: implement arithmetic, comparison, logical ops on arrays (pandas-…

2eea9d3

…dev#28431)

bongolegend pushed a commit to bongolegend/pandas that referenced this pull request Jan 1, 2020

REF: implement arithmetic, comparison, logical ops on arrays (pandas-…

ffaabb8

…dev#28431)



		def comparison_op(
		left: Union[np.ndarray, ABCExtensionArray], right: Any, op

REF: implement arithmetic, comparison, logical ops on arrays #28431

REF: implement arithmetic, comparison, logical ops on arrays #28431

Conversation

jbrockmendel commented Sep 13, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Sep 13, 2019

jbrockmendel commented Sep 14, 2019

jbrockmendel commented Sep 16, 2019

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Sep 18, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Sep 18, 2019

jbrockmendel commented Sep 18, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Sep 18, 2019

jorisvandenbossche left a comment

Choose a reason for hiding this comment

TomAugspurger left a comment

Choose a reason for hiding this comment

jbrockmendel commented Sep 21, 2019

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel Sep 23, 2019 • edited Loading

Choose a reason for hiding this comment

jreback commented Sep 23, 2019

jbrockmendel Sep 23, 2019 •

edited

Loading