Skip to content

FIX BUG: Timestamp __add__/__sub__ DateOffset with nanoseconds lost. #43968

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 38 commits into from
Dec 17, 2021

Conversation

tushushu
Copy link
Contributor

@tushushu tushushu commented Oct 11, 2021

@tushushu
Copy link
Contributor Author

Hey guys, not only Timestamp.__add__(DateOffset), but also Timestamp.__sub__(DateOffset) not works. Should I fix the __sub__ method in this PR or should I open another PR? Thanks~

@jbrockmendel
Copy link
Member

Should I fix the sub method in this PR or should I open another PR

Doing it in the same PR is fine

@@ -83,6 +83,10 @@ from .timestamps import Timestamp
# ---------------------------------------------------------------------
# Misc Helpers

cdef bint is_dateoffset_object(object obj):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a docstring, specifically make it clear when to use this vs is_offset_object

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment you have here is not useful, add a doc-string not dependent on a particular patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this function is not that necessary. It could be enough by using is_offset_object(obj) and hasattr(obj, "nanoseconds") to determine the need to deal with "nanoseconds".

Copy link
Contributor Author

@tushushu tushushu Oct 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When calling _TimeStamp.__add__(self, other) with other is an Offset object, it can not meet any condition of the if-else chain. And it will call _TimeStamp.__add__(self, other._offset) instead. I cannot find where this logic is implemented, but I guess we could explicitly check the Offset type in the if-else chain.

@tushushu tushushu changed the title FIX BUG: Timestamp adds DateOffset with nanoseconds lost. FIX BUG: Timestamp __add__/__sub__ DateOffset with nanoseconds lost. Oct 16, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs tests

@@ -83,6 +83,10 @@ from .timestamps import Timestamp
# ---------------------------------------------------------------------
# Misc Helpers

cdef bint is_dateoffset_object(object obj):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment you have here is not useful, add a doc-string not dependent on a particular patch.

@jreback jreback added the Frequency DateOffsets label Oct 16, 2021
@tushushu
Copy link
Contributor Author

this needs tests

I am going to put some tests later.

@tushushu
Copy link
Contributor Author

tushushu commented Oct 17, 2021

Found another bug: the DateOffset(nanoseconds=1) object is not constructed correctly.

from pandas.tseries.offsets import DateOffset

print(DateOffset(nanoseconds=1)._offset.__repr__())

# datetime.timedelta(1)

Which should be "datetime.timedelta(0)".


if is_any_td_scalar(other):
nanos = delta_to_nanoseconds(other)
nanos += delta_to_nanoseconds(other)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any cases that go through the hasattr(other, "nanoseconds") and also through here?

Copy link
Contributor Author

@tushushu tushushu Oct 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a case go through the hasattr(other, "nanoseconds"), it must go through here. For example, Timestampe(0) + DateOffset(nanoseconds=1) will go through both branches. Because line 289 other = other._offset converts other to a td_scalar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it must go through here [...] Because line 289 other = other._offset converts other to a td_scalar.

Doesn't it sometimes convert to a relativedelta object (which isnt a td_scalar)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel According to the codes below, we can see self._offset is a relativedelta object only if there is no nanosecond / nanoseconds keyword parameter.

    kwds_no_nanos = dict(
        (k, v) for k, v in kwds.items()
        if k not in ('nanosecond', 'nanoseconds')
    )
    # TODO: Are nanosecond and nanoseconds allowed somewhere?

    _kwds_use_relativedelta = ('years', 'months', 'weeks', 'days',
                               'year', 'month', 'week', 'day', 'weekday',
                               'hour', 'minute', 'second', 'microsecond')

    use_relativedelta = False
    if len(kwds_no_nanos) > 0:
        if any(k in _kwds_use_relativedelta for k in kwds_no_nanos):
            offset = relativedelta(**kwds_no_nanos)
            use_relativedelta = True

@@ -282,9 +282,17 @@ cdef class _Timestamp(ABCTimestamp):
def __add__(self, other):
cdef:
int64_t nanos = 0
# In order to correctly handle DateOffset object with nanoseconds.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: can you put this inside the if is_offset_object and make clear it is a RelativeDelta offset, not any DateOffset

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines have been moved to RelativeDelta.applymethod.

other = other._offset
else:
nanos -= other.nanoseconds
other = -other._offset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so in principle Timestamp.__add__ shouldn't need to know anything about the RelativeDeltaOffset implementation. Could we change RelativeDeltaOffset.apply to handle nanoseconds correctly instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let me read the RelativeDeltaOffset implementation to see how should I move the logic away from here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the RelativeDeltaOffset.apply can handle nanoseconds correctly.

)
def test_dateoffset_add_sub(cases):
time_unit, expected = cases
offset = DateOffset(**{time_unit: 1})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if im reading RelativeDeltaOffset.apply right, i think this might be off with n!=1?

Copy link
Contributor Author

@tushushu tushushu Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me read the RelativeDeltaOffset.apply codes first, and make more test cases with n!=1 here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the codes, aslo add test case with n == 5, and it works.

@jreback
Copy link
Contributor

jreback commented Dec 3, 2021

@jbrockmendel over to you, merge when ready

({"months": -1}, 0, "1969-12-01 00:00:00"),
({"years": 1}, 0, "1971-01-01 00:00:00"),
({"years": -1}, 0, "1969-01-01 00:00:00"),
({"minutes": 2, "nanoseconds": 9}, 4, "1970-01-01 00:02:00.000000013"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel Here is the test case you suggest. The test can be passed in my local machine, but the travis is too slow, will look at the result later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: id make a separate test for this specific case (could have useful name/comment) and avoid the extra clutter in the parametrize here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, just done.

@tushushu tushushu requested a review from jbrockmendel December 4, 2021 14:53
@jreback
Copy link
Contributor

jreback commented Dec 5, 2021

hmm tests seems to be failing, can you merge master. these don't seem related, but it is possible something changed.

@jbrockmendel
Copy link
Member

yah i think merging master should fix most/all of the CI complaints

@tushushu
Copy link
Contributor Author

tushushu commented Dec 7, 2021

After merging the master PR to current, the previous tests passed, but got a new one

'Hypothesis test_where_inplace_casting(data=[None, None, -4399835, -5869]) produces unreliable results: 
Falsified on the first call but did not on a subsequent one

@jreback
Copy link
Contributor

jreback commented Dec 7, 2021

hmm lots of things are failing

@tushushu
Copy link
Contributor Author

tushushu commented Dec 8, 2021

I accidentally deleted a colon -_-

@jreback
Copy link
Contributor

jreback commented Dec 8, 2021

hmm something must have changed recently as https://github.com/pandas-dev/pandas/runs/4457135516?check_suite_focus=true are failing and look related

@jreback
Copy link
Contributor

jreback commented Dec 8, 2021

actually @tushushu these are failing elsewhere so nothing to do rn. will have you rebase once this is ironed out.

@jbrockmendel
Copy link
Member

@tushushu the py310 failures were also unrelated, have now been fixed. one more rebase pls.

@tushushu
Copy link
Contributor Author

@tushushu the py310 failures were also unrelated, have now been fixed. one more rebase pls.

Let me have a try~

@jreback jreback merged commit cd5a124 into pandas-dev:master Dec 17, 2021
@jreback
Copy link
Contributor

jreback commented Dec 17, 2021

thanks @tushushu for sticking with this!

@tushushu
Copy link
Contributor Author

Thanks @jreback and @jbrockmendel for reviewing this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Frequency DateOffsets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: DateOffset with nanoseconds does not apply to Timestamp
4 participants