-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Fix Series[timedelta64]+DatetimeIndex[tz] bugs #18884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ser + index lossed timezone closes pandas-dev#13905 index + ser retained timezone but returned a DatetimeIndex
pandas/core/ops.py
Outdated
result = op(pd.TimedeltaIndex(left), right) | ||
return construct_result(left, result, | ||
index=left.index, name=left.name, | ||
dtype=result.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should name
be passed using com._maybe_match_name
here? Not sure what the convention is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
pandas/core/ops.py
Outdated
result = op(pd.TimedeltaIndex(left), right) | ||
return construct_result(left, result, | ||
index=left.index, name=left.name, | ||
dtype=result.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
pandas/core/ops.py
Outdated
@@ -709,6 +709,15 @@ def wrapper(left, right, name=name, na_op=na_op): | |||
|
|||
left, right = _align_method_SERIES(left, right) | |||
|
|||
if is_timedelta64_dtype(left) and isinstance(right, pd.DatetimeIndex): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't special case things here, this is the point of the _TimeOp class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the appropriate place. Adding additional wrapping/unwrapping in _TimeOp obscures whats going on. Even if it wasn't a third(!) level of wrapping, b/c of namespacing/closure issues it is essentially impossible to handle overflow checks correctly with the current setup.
Eventually this is going to have to look like:
def wrapper(...)
if isinstance(right, ABCDataFrame):
return NotImplemented
elif is_timedelta64_dtype(left):
[dispatch to TimedeltaIndex op]
elif (is_datetime64_dtype(left) or is_datetime64tz_dtype(left)):
[dispatch to DatetimeIndex op]
[everything else]
Everything _TimeOp does is done (better) in the index classes. Anything other than dispatching to those methods is unnecessary duplication and an invitation to inconsistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so let's untangle that first. This bug fix just addsmore things to undo later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bug fix just addsmore things to undo later.
This won't need to be undone. Eventually the and isinstance(right, pd.DatetimeIndex)
part of if is_timedelta64_dtype(left) and isinstance(right, pd.DatetimeIndex):
is removed and the timedelta64_dtype case is complete.
I've given the order of edits quite a bit of thought. Transitioning towards the dispatch approach case-by-case and implementing corresponding tests along the way is the way to go.
so let's untangle that first.
Untangle which first? The mess of closures that prevents overflow checks?
I guess if you want to untangle _TimeOp independently, #18832 is a step in that direction, is orthogonal to the other outstanding PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, until #18832 this needs to change as I have indicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#18832 is orthogonal.
What change have you indicated? Not clear on what "untangle this first" refers to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well I don't want this here. put it where the other conversions are. that can be reactored at some point if its worthile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess if I squint it kind of makes sense to avoid fixing the overflow-check bug in this PR and instead do it separately. Will change.
index=index, name=names[1]) | ||
expected = pd.Series(index + pd.Timedelta(seconds=5), | ||
index=index, name=names[2]) | ||
# passing name arg isn't enough when names[2] is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add blank lines before comments
expected.name = names[2] | ||
assert expected.dtype == index.dtype | ||
res = ser + index | ||
tm.assert_series_equal(res, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls use result=
Codecov Report
@@ Coverage Diff @@
## master #18884 +/- ##
==========================================
+ Coverage 91.57% 91.57% +<.01%
==========================================
Files 150 150
Lines 48941 48948 +7
==========================================
+ Hits 44817 44824 +7
Misses 4124 4124
Continue to review full report at Codecov.
|
pandas/core/ops.py
Outdated
@@ -709,6 +709,15 @@ def wrapper(left, right, name=name, na_op=na_op): | |||
|
|||
left, right = _align_method_SERIES(left, right) | |||
|
|||
if is_timedelta64_dtype(left) and isinstance(right, pd.DatetimeIndex): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well I don't want this here. put it where the other conversions are. that can be reactored at some point if its worthile.
I'm in the process of implementing this, but please reconsider. Moving this into |
The most recent attempt to kludge this into TimeOp still got the dtypes wrong. The way presented is the right way to do this. |
then let’s refactor this first |
Huh? The change here is in the direction of the long-term solution, is not going to be removed, is not sub-optimal. |
Pls reopen. |
Got the requested (but much worse) version working. Please reopen. |
ok let's see what you have. |
Pushed yesterday, looks like CI just finished. |
pandas/core/ops.py
Outdated
dtype=dtype, | ||
) | ||
|
||
return wrapper | ||
|
||
|
||
def _get_series_result_name(left, right): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than right new code, this should be threaded into the existing routine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Threading it in separates name-convention logic into multiple places and makes the existing kludge kludgier, but will change.
pandas/core/ops.py
Outdated
@@ -515,7 +515,8 @@ def _convert_to_array(self, values, name=None, other=None): | |||
|
|||
# a datelike | |||
elif isinstance(values, pd.DatetimeIndex): | |||
values = values.to_series() | |||
# TODO: why are we casting to_series in the first place? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and if you change this does it work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tentative yes, but only with the change made by _get_series_result_name
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: tinkering with this and doing the threading requested below don't play nicely together. We should keep this as-is and address separately. This fixes a bug and we should call it a win. There are a bunch of these bug-fixes to get in before we can clean up the mess that is _TimeOp.
pandas/core/ops.py
Outdated
@@ -515,7 +515,8 @@ def _convert_to_array(self, values, name=None, other=None): | |||
|
|||
# a datelike | |||
elif isinstance(values, pd.DatetimeIndex): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to use ABCDatetimeIndex
pandas/core/ops.py
Outdated
dtype=dtype, | ||
) | ||
|
||
return wrapper | ||
|
||
|
||
def _get_series_op_result_name(left, right): | ||
# `left` is always a Series object | ||
if isinstance(right, (pd.Series, pd.Index)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use generic instance checks, thought why can't you just duck type, e.g.
name = _maybe_match_name(left, getattr(right, 'name', None))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the generics, sure. On the ducktype:
-
pretty sure what you have in mind is
name = _maybe_match_name(left, right)
-
That would represent a non-trivial change in pandas convention/policy for name propagation. Up until a few days ago even
Index
names were ignored (except for unintentional corner cases caused by conversion within_TimeOp
). Allowing through anything with aname
attribute would include e.g. mostDateOffset
subclasses, which I don't think is desired. If this is something you'd like to see changed, I'd ask you to open an issue or something and consider it out of scope for this bug-fixing PR.
@jbrockmendel FYI I cancelled a couple of your builds on travis. trying to get 0.22 out. most of these PR's have comments anyhow. |
Sounds good. I'll flag any comments that need clarification. BTW appveyor usually seems to be the constraining factor when the pipeline gets clogged. In cases where I screw up is there a way to cancel build for my own PRs there? |
when you push again appveyor (and travis) both cancel the previous. so it doesn't matter really. travis does it as soon as you push, while appveyor won't cancel until it actually runs (so it looks like its taking longer). |
Just pushed fixes, including for #18989. You may need to cancel the build again. |
Repush or hang tight? |
all good now |
pandas/core/ops.py
Outdated
@@ -535,6 +536,11 @@ def _convert_to_array(self, values, name=None, other=None): | |||
elif inferred_type in ('timedelta', 'timedelta64'): | |||
# have a timedelta, convert to to ns here | |||
values = to_timedelta(values, errors='coerce', box=False) | |||
if isinstance(other, pd.DatetimeIndex): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use ABC here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change.
pandas/core/ops.py
Outdated
dtype=dtype, | ||
) | ||
|
||
return wrapper | ||
|
||
|
||
def _get_series_op_result_name(left, right): | ||
# `left` is always a Series object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i suggested a simplification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, rather than creating another function like this, just in-line it as its only used once. .
tz=tz, name=names[0]) | ||
ser = pd.Series([pd.Timedelta(seconds=5)] * 2, | ||
index=index, name=names[1]) | ||
expected = pd.Series(index + pd.Timedelta(seconds=5), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convention is not to use pd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change.
pandas/core/ops.py
Outdated
dtype=dtype, | ||
) | ||
|
||
return wrapper | ||
|
||
|
||
def _get_series_op_result_name(left, right): | ||
# `left` is always a Series object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, rather than creating another function like this, just in-line it as its only used once. .
|
||
# passing name arg isn't enough when names[2] is None | ||
expected.name = names[2] | ||
assert expected.dtype == index.dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also test adding with ser.values
as well (obviously will be index result). was part of the OP as an example.
rebase and push again, fixed some hanging by Travis CI |
This should be ready, is orthogonal to #19024. |
thanks! |
ser + index lost timezone
index + ser retained timezone but returned a DatetimeIndex
git diff upstream/master -u -- "*.py" | flake8 --diff