Skip to content

FIX BUG: Timestamp __add__/__sub__ DateOffset with nanoseconds lost. #43968

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 38 commits into from
Dec 17, 2021
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
2208507
fix Timestamp + Dateoffset bugs.
tushushu Oct 11, 2021
ee19a4d
Add doc string and fix UT.
tushushu Oct 16, 2021
3e38ab2
fix UT
tushushu Oct 16, 2021
5718e4f
Trim Trailing Whitespace
tushushu Oct 16, 2021
e1ae149
remove is_dateoffset_obj func
tushushu Oct 17, 2021
326790b
explicitly check the Offset type
tushushu Oct 17, 2021
83453da
fix bugs.
tushushu Oct 17, 2021
ae4b133
fix bugs
tushushu Oct 17, 2021
a43030d
try to fix __sub__ method.
tushushu Oct 17, 2021
22cadbf
fix flake8 check.
tushushu Oct 17, 2021
1c89c03
add tests for dateoffset add method.
tushushu Oct 17, 2021
a2ad274
fix offset init.
tushushu Oct 17, 2021
9bc6e97
fix black code style check.
tushushu Oct 18, 2021
c9ddf14
test sub method.
tushushu Oct 18, 2021
4780316
add an comment.
tushushu Oct 19, 2021
1fecfa6
move the nano handling codes to Offsets.
tushushu Oct 23, 2021
e92c1a9
remove unnecessary codes.
tushushu Oct 23, 2021
006c073
fix bugs
tushushu Oct 24, 2021
ab064b4
more test cases for dateoffset add/sub
tushushu Oct 24, 2021
8e42c64
test radd.
tushushu Oct 24, 2021
5ca3472
code style.
tushushu Oct 25, 2021
37d19b5
remove unused line.
tushushu Oct 30, 2021
5acad37
Merge branch 'pandas-dev:master' into fix-bug-dateoffset-nanoseconds
tushushu Dec 2, 2021
9cda54a
what's new
tushushu Dec 2, 2021
0ee9a6f
remove unnecessary brackets
tushushu Dec 2, 2021
860a62b
what's new.
tushushu Dec 3, 2021
4600a64
makes the comment more detailed.
tushushu Dec 3, 2021
8cf5746
test case as jbrockmendel demand
tushushu Dec 4, 2021
a210baf
trigger the CI without a commit
tushushu Dec 5, 2021
98646cd
Merge branch 'pandas-dev:master' into fix-bug-dateoffset-nanoseconds
tushushu Dec 5, 2021
e899fbe
Merge branch 'pandas-dev:master' into fix-bug-dateoffset-nanoseconds
tushushu Dec 6, 2021
906f546
Merge branch 'pandas-dev:master' into fix-bug-dateoffset-nanoseconds
tushushu Dec 7, 2021
24fe768
split dateoffset test func
tushushu Dec 7, 2021
209ba9e
Merge branch 'fix-bug-dateoffset-nanoseconds' of github.com:tushushu/…
tushushu Dec 7, 2021
f56f52b
Merge branch 'pandas-dev:master' into fix-bug-dateoffset-nanoseconds
tushushu Dec 7, 2021
3f49876
fix typo
tushushu Dec 8, 2021
b7297ef
Merge branch 'fix-bug-dateoffset-nanoseconds' of github.com:tushushu/…
tushushu Dec 8, 2021
5cbae96
Merge branch 'pandas-dev:master' into fix-bug-dateoffset-nanoseconds
tushushu Dec 17, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pandas/_libs/tslibs/offsets.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -326,7 +326,7 @@ cdef _determine_offset(kwds):
# sub-daily offset - use timedelta (tz-aware)
offset = timedelta(**kwds_no_nanos)
else:
offset = timedelta(1)
offset = timedelta(0)
return offset, use_relativedelta


Expand Down
14 changes: 11 additions & 3 deletions pandas/_libs/tslibs/timestamps.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -282,9 +282,17 @@ cdef class _Timestamp(ABCTimestamp):
def __add__(self, other):
cdef:
int64_t nanos = 0
# In order to correctly handle DateOffset object with nanoseconds.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: can you put this inside the if is_offset_object and make clear it is a RelativeDelta offset, not any DateOffset

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines have been moved to RelativeDelta.applymethod.

if is_offset_object(other) and hasattr(other, "nanoseconds"):
if other.n > 0:
nanos += other.nanoseconds
other = other._offset
else:
nanos -= other.nanoseconds
other = -other._offset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so in principle Timestamp.__add__ shouldn't need to know anything about the RelativeDeltaOffset implementation. Could we change RelativeDeltaOffset.apply to handle nanoseconds correctly instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let me read the RelativeDeltaOffset implementation to see how should I move the logic away from here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the RelativeDeltaOffset.apply can handle nanoseconds correctly.


if is_any_td_scalar(other):
nanos = delta_to_nanoseconds(other)
nanos += delta_to_nanoseconds(other)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any cases that go through the hasattr(other, "nanoseconds") and also through here?

Copy link
Contributor Author

@tushushu tushushu Oct 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a case go through the hasattr(other, "nanoseconds"), it must go through here. For example, Timestampe(0) + DateOffset(nanoseconds=1) will go through both branches. Because line 289 other = other._offset converts other to a td_scalar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it must go through here [...] Because line 289 other = other._offset converts other to a td_scalar.

Doesn't it sometimes convert to a relativedelta object (which isnt a td_scalar)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel According to the codes below, we can see self._offset is a relativedelta object only if there is no nanosecond / nanoseconds keyword parameter.

    kwds_no_nanos = dict(
        (k, v) for k, v in kwds.items()
        if k not in ('nanosecond', 'nanoseconds')
    )
    # TODO: Are nanosecond and nanoseconds allowed somewhere?

    _kwds_use_relativedelta = ('years', 'months', 'weeks', 'days',
                               'year', 'month', 'week', 'day', 'weekday',
                               'hour', 'minute', 'second', 'microsecond')

    use_relativedelta = False
    if len(kwds_no_nanos) > 0:
        if any(k in _kwds_use_relativedelta for k in kwds_no_nanos):
            offset = relativedelta(**kwds_no_nanos)
            use_relativedelta = True

result = type(self)(self.value + nanos, tz=self.tzinfo)
if result is not NaT:
result._set_freq(self._freq) # avoid warning in constructor
Expand All @@ -307,12 +315,12 @@ cdef class _Timestamp(ABCTimestamp):
elif not isinstance(self, _Timestamp):
# cython semantics, args have been switched and this is __radd__
return other.__add__(self)

return NotImplemented

def __sub__(self, other):

if is_any_td_scalar(other) or is_integer_object(other):
if (is_any_td_scalar(other) or is_integer_object(other) or
is_offset_object(other)):
neg_other = -other
return self + neg_other

Expand Down
39 changes: 31 additions & 8 deletions pandas/tests/tseries/offsets/test_offsets.py
Original file line number Diff line number Diff line change
Expand Up @@ -657,14 +657,6 @@ def test_rule_code(self):
assert alias == (_get_offset(alias) * 5).rule_code


def test_dateoffset_misc():
oset = offsets.DateOffset(months=2, days=4)
# it works
oset.freqstr

assert not offsets.DateOffset(months=2) == 2


def test_freq_offsets():
off = BDay(1, offset=timedelta(0, 1800))
assert off.freqstr == "B+30Min"
Expand Down Expand Up @@ -780,6 +772,29 @@ def test_tick_normalize_raises(tick_classes):
cls(n=3, normalize=True)


@pytest.mark.parametrize(
"cases",
[
("nanoseconds", Timestamp("1970-01-01 00:00:00.000000001")),
("microseconds", Timestamp("1970-01-01 00:00:00.000001")),
("seconds", Timestamp("1970-01-01 00:00:01")),
("minutes", Timestamp("1970-01-01 00:01:00")),
("hours", Timestamp("1970-01-01 01:00:00")),
("days", Timestamp("1970-01-02 00:00:00")),
("weeks", Timestamp("1970-01-08 00:00:00")),
("months", Timestamp("1970-02-01 00:00:00")),
("years", Timestamp("1971-01-01 00:00:00")),
],
)
def test_dateoffset_add_sub(cases):
time_unit, expected = cases
offset = DateOffset(**{time_unit: 1})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if im reading RelativeDeltaOffset.apply right, i think this might be off with n!=1?

Copy link
Contributor Author

@tushushu tushushu Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me read the RelativeDeltaOffset.apply codes first, and make more test cases with n!=1 here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the codes, aslo add test case with n == 5, and it works.

ts = Timestamp(0) + offset
assert ts == expected
ts -= offset
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel For the question "do we have any test cases with both nano !=0 and hasattr(self, "nanoseconds")?"
When the test case is ("nanoseconds", 1, Timestamp("1970-01-01 00:00:00.000000001")), running L816, thus Timestamp("1970-01-01 00:00:00.000000001") - DateOffset(nanoseconds=1) >>> nano == 1 and hasattr(self, "nanoseconds")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. The case I think is uncovered is where both have a nanosecond and the DateOffset has a relativedelta component:

ts = pd.Timestamp(4)
off = pd.DateOffset(minute=2, nanoseconds=9)

>>> ts + off
Timestamp('1970-01-01 00:02:00')  # <- PR
Timestamp('1970-01-01 00:02:00.000000004')  # <- master
Timestamp('1970-01-01 00:02:00.000000013')   # <- expected

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks and I think the PR can pass this test and let me add a new test case for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test failed, maybe I need rebase my local branch in docker from master to see why.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to see why

In DateOffset._apply we call _as_datetime which drops nanos, so we need to either a) not do that or b) add them back in at some point

Copy link
Contributor Author

@tushushu tushushu Dec 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In DateOffset._apply we call _as_datetime which drops nanos, so we need to either a) not do that or b) add them back in at some point
I read the error message carefully today, the CI failed not because of the unit tests. This PR could handle the nano correctly.

The building doc process failed maybe because it ran the command with flag --warnings-are-errors as below:
doc/make.py --warnings-are-errors,

and there is a warning:
warning in /home/runner/work/pandas/pandas/doc/source/whatsnew/v0.10.0.rst at block ending on line 209

Copy link
Contributor Author

@tushushu tushushu Dec 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the checks again, 2 of them failed because of Extension Array, which seems to have nothing to do with current PR.
I actually did not change anything, but the checks failed due to building doc first and then due to Extension Array.

Copy link
Contributor Author

@tushushu tushushu Dec 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased from master branch, the errors listed as below:

  1. The CI still failed due to Extension Array does not have _data attribute.
  2. "FAILED pandas/tests/io/formats/test_to_excel.py::test_css_to_excel_good_colors[#708090-7080901]."
  3. "E FutureWarning: The pandas.SparseArray class is deprecated and will be removed from pandas in a future version. Use pandas.arrays.SparseArray instead."

@jbrockmendel It seems that all the errors above are not related with my PR, could you take a look?

assert ts == Timestamp(0)


@pytest.mark.parametrize(
"attribute",
[
Expand All @@ -795,3 +810,11 @@ def test_dateoffset_immutable(attribute):
msg = "DateOffset objects are immutable"
with pytest.raises(AttributeError, match=msg):
setattr(offset, attribute, 5)


def test_dateoffset_misc():
oset = offsets.DateOffset(months=2, days=4)
# it works
oset.freqstr

assert not offsets.DateOffset(months=2) == 2