-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Fixes rounding error in Timestamp.floor() #19240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #19240 +/- ##
==========================================
+ Coverage 91.6% 91.62% +0.02%
==========================================
Files 150 150
Lines 48750 48742 -8
==========================================
+ Hits 44657 44661 +4
+ Misses 4093 4081 -12
Continue to review full report at Codecov.
|
Timestamp.floor()
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -462,6 +461,7 @@ Numeric | |||
- Bug in :class:`DatetimeIndex`, :class:`TimedeltaIndex` addition and subtraction of zero-dimensional integer arrays gave incorrect results (:issue:`19012`) | |||
- Bug in :func:`Series.__add__` adding Series with dtype ``timedelta64[ns]`` to a timezone-aware ``DatetimeIndex`` incorrectly dropped timezone information (:issue:`13905`) | |||
- Bug in :func:`Timedelta.__floordiv__` and :func:`Timedelta.__rfloordiv__` dividing by many incompatible numpy objects was incorrectly allowed (:issue:`18846`) | |||
- Bug in :func:`Timestamp.floor` where time stamps far in the future and past were not rounded correctly (:issue:`19206`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see below, need to add for ceil and for DatetimeIndex floor/ceil as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need tests for DTI as well (these are in pandas/tests/indexes/datetimes/test_ops pls parametrize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect you need a separate fix there as well.
pandas/_libs/tslibs/timestamps.pyx
Outdated
divisor = 10 ** int(np.log10(unit / 1e7)) | ||
else: | ||
divisor = 1 | ||
|
||
if unit < 1000 and unit % 1000 != 0: | ||
# for nano rounding, work with the last 6 digits separately | ||
# due to float precision |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this case be incorporated as well in the same divisor check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe set both divisor & buff in a single if that handles really small, really large and other units (divisor=1, buff=1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a simple way of doing this. We would essentially have to move the logic into the arithmetic, which I think would make the code more opaque. I can move the divisor check into the case were unit >= 1000 and refactor some of it to make it a bit cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's ok, just trying to make it more clear.
@@ -695,6 +695,27 @@ def test_round(self): | |||
expected = Timestamp('20130101') | |||
assert result == expected | |||
|
|||
# GH 19206 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment here
result = dt.floor('10ns') | ||
expected = Timestamp('1823-01-01 00:00:01.000000010') | ||
assert result == expected | ||
|
||
# ceil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we test for ceil as well. If you can parametrize this all the better (if not ok too).
Also ok with splitting this giant test to test_round, test_floor, test_ceil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still working on other change requests, but just wanted to get feedback on refactoring for timestamp tests. There are so many tests that it seemed like a mixed approach of splitting tests so that failed assertions could be easily traced, and parametrizing to consolidate common code was the way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see below. we need to parameterize as much as possible to cover the cases. and you can still run them individually as well (optionally you can pass ids
to create them)
assert result == expected | ||
|
||
dt = Timestamp('1823-01-01 00:00:01') | ||
result = dt.floor('1s') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls test for datetimeindex as well
result = dt.round('D') | ||
expected = Timestamp('20130202') | ||
assert result == expected | ||
TStestcase = namedtuple('Tstestcase', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls use parametrize rather rolling your own:http://pandas.pydata.org/pandas-docs/stable/contributing.html#using-pytest
pandas/_libs/tslibs/timestamps.pyx
Outdated
divisor = 10 ** int(np.log10(unit / 1e7)) | ||
else: | ||
divisor = 1 | ||
|
||
if unit < 1000 and unit % 1000 != 0: | ||
# for nano rounding, work with the last 6 digits separately | ||
# due to float precision |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do this?
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -462,6 +461,7 @@ Numeric | |||
- Bug in :class:`DatetimeIndex`, :class:`TimedeltaIndex` addition and subtraction of zero-dimensional integer arrays gave incorrect results (:issue:`19012`) | |||
- Bug in :func:`Series.__add__` adding Series with dtype ``timedelta64[ns]`` to a timezone-aware ``DatetimeIndex`` incorrectly dropped timezone information (:issue:`13905`) | |||
- Bug in :func:`Timedelta.__floordiv__` and :func:`Timedelta.__rfloordiv__` dividing by many incompatible numpy objects was incorrectly allowed (:issue:`18846`) | |||
- Bug in :func:`Timestamp.floor` where time stamps far in the future and past were not rounded correctly (:issue:`19206`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need tests for DTI as well (these are in pandas/tests/indexes/datetimes/test_ops pls parametrize
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -462,6 +461,7 @@ Numeric | |||
- Bug in :class:`DatetimeIndex`, :class:`TimedeltaIndex` addition and subtraction of zero-dimensional integer arrays gave incorrect results (:issue:`19012`) | |||
- Bug in :func:`Series.__add__` adding Series with dtype ``timedelta64[ns]`` to a timezone-aware ``DatetimeIndex`` incorrectly dropped timezone information (:issue:`13905`) | |||
- Bug in :func:`Timedelta.__floordiv__` and :func:`Timedelta.__rfloordiv__` dividing by many incompatible numpy objects was incorrectly allowed (:issue:`18846`) | |||
- Bug in :func:`Timestamp.floor` where time stamps far in the future and past were not rounded correctly (:issue:`19206`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect you need a separate fix there as well.
68dd7da
to
5eb3ae3
Compare
Probably a separate issue, but there appears to be no warning if you specify a Timestamp with a string that causes the underlying int64 to overflow. This works:
but this is silent:
Shouldn't this throw an exception? |
Hello @cbertinato! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on February 07, 2018 at 14:29 Hours UTC |
@cbertinato this is all correct
[138] & [139] might be some nanosecond rounding; I suppose you could make an issue about it |
pandas/core/indexes/datetimelike.py
Outdated
# GH19206 | ||
# to deal with round-off when unit is large | ||
if unit >= 1e9: | ||
divisor = 10 ** int(np.log10(unit / 1e7)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably pull out all of this rounding logic to a separate function that can be called on a scalar & and array, so we can use it both for Timestamp and DTI (could do it in this PR or as a followup)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a good idea. We can do it here while this is still open.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great!
'1823-01-01 00:00:01.000000020') | ||
]) | ||
def test_round_ops(self, test_input, rounder, freq, expected): | ||
for tz in self.tz: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why tz was not made into a fixture, but I guess ok here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take care of that too. I don't know why either. Must have overlooked it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
|
||
@pytest.mark.parametrize('test_input, rounder, freq, expected, kw', [ | ||
('20130101 09:10:11', 'floor', 'D', '20130101', {}), | ||
# GH 19206 - times far in the future and past rounding incorrectly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could just pass None for kw (and then turn it in it into an empty dict in the test)
2f5e9a8
to
e0aadac
Compare
…r dates far in the future and past (GH19206)
e0aadac
to
414a2d5
Compare
Moved the function common to both Timestamps and DTIs out of the Timestamp round function, but left it in timestamp.pyx. Not sure whether to move it out of there completely. Also moved all tests into the refactored test modules. |
'1823-01-01 00:00:01.000000020'), | ||
('1823-01-01 00:00:01', 'floor', '1s', '1823-01-01 00:00:01'), | ||
('1823-01-01 00:00:01', 'ceil', '1s', '1823-01-01 00:00:01')]) | ||
def test_ceil_floor_edge(self, tz, test_input, rounder, freq, expected): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a NaT and another element here (the 2nd can be the same as the first)
@@ -93,6 +93,22 @@ def test_round_frequencies(self, freq, expected): | |||
result = stamp.round(freq=freq) | |||
assert result == expected | |||
|
|||
@pytest.mark.parametrize('test_input, rounder, freq, expected', [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a NaT here as well
thanks @cbertinato nice patch! keep em coming! |
git diff upstream/master -u -- "*.py" | flake8 --diff