Skip to content

COMPAT: Return NotImplemented for subclassing #31136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jan 28, 2020

Conversation

TomAugspurger
Copy link
Contributor

This changes index ops to check the type of the argument in index
ops, rather than just the dtype. This lets index subclasses take control
of binary ops when they know better what the result should be.

Closes #31109

cc @jbrockmendel, I've fixed this as close to the root of the problem as I could, but perhaps this case should be checked elsewhere? As written currently, it only affects add & sub between an Index and a datelike index, but perhaps it should be done more generally?

This changes index ops to check the *type* of the argument in index
ops, rather than just the dtype. This lets index subclasses take control
of binary ops when they know better what the result should be.

Closes pandas-dev#31109
@TomAugspurger TomAugspurger added the Numeric Operations Arithmetic, Comparison, and Logical operations label Jan 19, 2020
@TomAugspurger TomAugspurger added this to the 1.0.0 milestone Jan 19, 2020
@TomAugspurger
Copy link
Contributor Author

Mmm, this is the wrong place to fix it. We've already unboxed the array in other by the time we get to my changes... Will take another crack.

@jbrockmendel
Copy link
Member

LGTM, azure error looks unrelated

@TomAugspurger
Copy link
Contributor Author

Just a regression on master.

@jreback
Copy link
Contributor

jreback commented Jan 20, 2020

k merge on green

@@ -110,6 +114,15 @@ def wrapper(self, other):

def make_wrapped_arith_op(opname):
def method(self, other):
if (
isinstance(other, Index)
and is_object_dtype(other.dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only object dtype? You could have Index subclasses with a different dtype? (or is that not realistic?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my suspicion. It's at least true for the xarray case.

@dcherian does xarray have any Index subclasses that have a non object dtype?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a question for @shoyer

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Xarray only has one Index subclass, which is of object dtype. It's definitely a little fragile, though, so we would like to eventually get rid of it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we would like to eventually get rid of it.

IIRC CFTime is for the "UT" timezone, which is always within 1 second of UTC, right? Can this be represented as a tzinfo object?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@@ -323,3 +324,51 @@ def test_rsub_object(self):

with pytest.raises(TypeError):
np.array([True, pd.Timestamp.now()]) - index

@pytest.mark.parametrize(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should move this to:

tests/indexes/test_subclass.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see an indexes/test_subclass.py. Start a new file for this?

@TomAugspurger
Copy link
Contributor Author

Moving the conversation in https://github.com/pandas-dev/pandas/pull/31136/files/35549e681845d8505250dda00a9c22cff915f67f#diff-8ccfea8b354e9d2156ebc3150f69a18b out of inline.

Python has a behavior I hadn't realized before. Typically, for A + B, that calls A.__add__(B), and calls B.__radd__(A) if needed. However, if B is a subclass of A, then B.__radd__(A) is called first.

In [8]: class A:
   ...:     def __add__(self, other):
   ...:         print('a-add')
   ...:         return 0
   ...:     def __radd__(self, other):
   ...:         print('a-radd')
   ...:         return 1
   ...:

In [9]: class B:
   ...:     def __add__(self, other):
   ...:         print('b-add')
   ...:         return 2
   ...:     def __radd__(self, other):
   ...:         print("b-radd")
   ...:         return 3
   ...:

In [10]: A() + B()
a-add
Out[10]: 0

In [11]: class B(A):  # subclass
    ...:     def __add__(self, other):
    ...:         print('b-add')
    ...:         return 2
    ...:     def __radd__(self, other):
    ...:         print("b-radd")
    ...:         return 3
    ...:

In [12]: A() + B()
b-radd
Out[12]: 3

This means that an object dtype Index would have deferred to an Index subclass, just implicitly since Python does that. So numeric is the only behavior change we need to consider.

@jbrockmendel
Copy link
Member

So numeric is the only behavior change we need to consider.

Numeric shouldn't be relevant for the CFDatetimeIndex case of interest right? I think the fix here should stay narrow and targeted at that for 1.0.

@TomAugspurger
Copy link
Contributor Author

@jorisvandenbossche can you comment on #31136 (comment)? Does it address your concerns in #31136 (comment)?

@TomAugspurger
Copy link
Contributor Author

Merging later today if there aren't any objections.

@jorisvandenbossche
Copy link
Member

This means that an object dtype Index would have deferred to an Index subclass, just implicitly since Python does that. So numeric is the only behavior change we need to consider.

You mean because if you subclass Index, ops with Index class (i.e. object dtype) already get dispatched fine by python's mechanism. But for ops with numeric indices, the Index subclass is not a subclass of the numeric Index in that case, so therefore we need to handle it specifically there?

@TomAugspurger
Copy link
Contributor Author

But for ops with numeric indices, the Index subclass is not a subclass of the numeric Index in that case[...]

All this is correct. As an example, if you have

class MyIndex(Index):
    pass

Then Int64Index() + MyIndex() will use Int64Index.__add__ since MyIndex isn't a subclass of Int64Index.

[...], so therefore we need to handle it specifically there?

This is where I think we disagree. In general, I don't think that all our Index subclasses can / should be doing these checks.

Ideally in the future we have an ExtensionIndex API and CFTimeIndex is an instance of that. But for now I think we should just restore the 0.25 behavior.

@jorisvandenbossche
Copy link
Member

[...], so therefore we need to handle it specifically there?

This is where I think we disagree. In general, I don't think that all our Index subclasses can / should be doing these checks.

Ah, sorry, I thought this was what you were doing in this PR. So now you do the check not in general for numeric index, but only for EA-based indexes (to have it for datetime-like indexes) ?

@TomAugspurger
Copy link
Contributor Author

So now you do the check not in general for numeric index, but only for EA-based indexes

Anything using make_wrapped_arith_op, which in practice is just the Datetimelike indexes.

@jorisvandenbossche
Copy link
Member

OK, I was just going from the fact it being in pandas/core/indexes/extension.py.

Anyway, I am fine with focusing on getting the compatibility with 0.25.
And you are right that longer term the better way is to allow storing general ExtensionArrays in the Index.

@TomAugspurger TomAugspurger merged commit 3dc59d4 into pandas-dev:master Jan 28, 2020
@TomAugspurger TomAugspurger deleted the 31109-object-ops branch January 28, 2020 20:57
@TomAugspurger
Copy link
Contributor Author

@meeseeksdev backport to 1.0.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Return NotImpelmented for arith ops with object-dtype Index subclasses
6 participants