-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
FIX BUG: Timestamp __add__/__sub__ DateOffset with nanoseconds lost. #43968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX BUG: Timestamp __add__/__sub__ DateOffset with nanoseconds lost. #43968
Conversation
tushushu
commented
Oct 11, 2021
•
edited
Loading
edited
- closes BUG: DateOffset with nanoseconds does not apply to Timestamp #43892
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry
Hey guys, not only |
Doing it in the same PR is fine |
pandas/_libs/tslibs/offsets.pyx
Outdated
@@ -83,6 +83,10 @@ from .timestamps import Timestamp | |||
# --------------------------------------------------------------------- | |||
# Misc Helpers | |||
|
|||
cdef bint is_dateoffset_object(object obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a docstring, specifically make it clear when to use this vs is_offset_object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment you have here is not useful, add a doc-string not dependent on a particular patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this function is not that necessary. It could be enough by using is_offset_object(obj) and hasattr(obj, "nanoseconds")
to determine the need to deal with "nanoseconds".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When calling _TimeStamp.__add__(self, other)
with other is an Offset object, it can not meet any condition of the if-else chain. And it will call _TimeStamp.__add__(self, other._offset)
instead. I cannot find where this logic is implemented, but I guess we could explicitly check the Offset type in the if-else chain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs tests
pandas/_libs/tslibs/offsets.pyx
Outdated
@@ -83,6 +83,10 @@ from .timestamps import Timestamp | |||
# --------------------------------------------------------------------- | |||
# Misc Helpers | |||
|
|||
cdef bint is_dateoffset_object(object obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment you have here is not useful, add a doc-string not dependent on a particular patch.
I am going to put some tests later. |
Found another bug: the from pandas.tseries.offsets import DateOffset
print(DateOffset(nanoseconds=1)._offset.__repr__())
# datetime.timedelta(1) Which should be "datetime.timedelta(0)". |
pandas/_libs/tslibs/timestamps.pyx
Outdated
|
||
if is_any_td_scalar(other): | ||
nanos = delta_to_nanoseconds(other) | ||
nanos += delta_to_nanoseconds(other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there any cases that go through the hasattr(other, "nanoseconds")
and also through here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a case go through the hasattr(other, "nanoseconds")
, it must go through here. For example, Timestampe(0) + DateOffset(nanoseconds=1)
will go through both branches. Because line 289 other = other._offset
converts other to a td_scalar
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it must go through here [...] Because line 289 other = other._offset converts other to a td_scalar.
Doesn't it sometimes convert to a relativedelta object (which isnt a td_scalar)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel According to the codes below, we can see self._offset is a relativedelta object only if there is no nanosecond / nanoseconds keyword parameter.
kwds_no_nanos = dict(
(k, v) for k, v in kwds.items()
if k not in ('nanosecond', 'nanoseconds')
)
# TODO: Are nanosecond and nanoseconds allowed somewhere?
_kwds_use_relativedelta = ('years', 'months', 'weeks', 'days',
'year', 'month', 'week', 'day', 'weekday',
'hour', 'minute', 'second', 'microsecond')
use_relativedelta = False
if len(kwds_no_nanos) > 0:
if any(k in _kwds_use_relativedelta for k in kwds_no_nanos):
offset = relativedelta(**kwds_no_nanos)
use_relativedelta = True
pandas/_libs/tslibs/timestamps.pyx
Outdated
@@ -282,9 +282,17 @@ cdef class _Timestamp(ABCTimestamp): | |||
def __add__(self, other): | |||
cdef: | |||
int64_t nanos = 0 | |||
# In order to correctly handle DateOffset object with nanoseconds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: can you put this inside the if is_offset_object
and make clear it is a RelativeDelta offset, not any DateOffset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines have been moved to RelativeDelta.apply
method.
pandas/_libs/tslibs/timestamps.pyx
Outdated
other = other._offset | ||
else: | ||
nanos -= other.nanoseconds | ||
other = -other._offset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so in principle Timestamp.__add__
shouldn't need to know anything about the RelativeDeltaOffset implementation. Could we change RelativeDeltaOffset.apply
to handle nanoseconds correctly instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let me read the RelativeDeltaOffset
implementation to see how should I move the logic away from here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the RelativeDeltaOffset.apply
can handle nanoseconds correctly.
) | ||
def test_dateoffset_add_sub(cases): | ||
time_unit, expected = cases | ||
offset = DateOffset(**{time_unit: 1}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if im reading RelativeDeltaOffset.apply right, i think this might be off with n!=1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me read the RelativeDeltaOffset.apply
codes first, and make more test cases with n!=1 here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the codes, aslo add test case with n == 5, and it works.
@jbrockmendel over to you, merge when ready |
({"months": -1}, 0, "1969-12-01 00:00:00"), | ||
({"years": 1}, 0, "1971-01-01 00:00:00"), | ||
({"years": -1}, 0, "1969-01-01 00:00:00"), | ||
({"minutes": 2, "nanoseconds": 9}, 4, "1970-01-01 00:02:00.000000013"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel Here is the test case you suggest. The test can be passed in my local machine, but the travis is too slow, will look at the result later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: id make a separate test for this specific case (could have useful name/comment) and avoid the extra clutter in the parametrize here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, just done.
hmm tests seems to be failing, can you merge master. these don't seem related, but it is possible something changed. |
yah i think merging master should fix most/all of the CI complaints |
After merging the master PR to current, the previous tests passed, but got a new one
|
…pandas into fix-bug-dateoffset-nanoseconds
hmm lots of things are failing |
…pandas into fix-bug-dateoffset-nanoseconds
I accidentally deleted a colon -_- |
hmm something must have changed recently as https://github.com/pandas-dev/pandas/runs/4457135516?check_suite_focus=true are failing and look related |
actually @tushushu these are failing elsewhere so nothing to do rn. will have you rebase once this is ironed out. |
@tushushu the py310 failures were also unrelated, have now been fixed. one more rebase pls. |
Let me have a try~ |
thanks @tushushu for sticking with this! |
Thanks @jreback and @jbrockmendel for reviewing this PR. |