-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: date_range issue with sub-second granularity #24129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @gabrielreid! Thanks for submitting the PR.
|
Codecov Report
@@ Coverage Diff @@
## master #24129 +/- ##
==========================================
- Coverage 92.2% 92.2% -0.01%
==========================================
Files 162 162
Lines 51701 51701
==========================================
- Hits 47672 47671 -1
- Misses 4029 4030 +1
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #24129 +/- ##
=======================================
Coverage 92.21% 92.21%
=======================================
Files 162 162
Lines 51721 51721
=======================================
Hits 47692 47692
Misses 4029 4029
Continue to review full report at Codecov.
|
pandas/core/arrays/datetimes.py
Outdated
# pass to np.linspace as much as possible | ||
arr = np.linspace( | ||
0, end.value - start.value, | ||
periods).astype('int64') + start.value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just pass in int64
into the dtype argument of linspace
instead of astype
ing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve been wrestling with something similar (#24123). Does this get fixed if you cast “endpoint” to int64 in generate_range_overflow_safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel generate_range_overflow_save isn't getting called in this situation, because no frequency has been supplied (which I assume is why np.linspace is used).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good overall. Also needs a whatsnew entry (v0.24.0).
You mentioned that this fix doesn't solve all resolutions. Do you mind adding another test where this still fails, and xfail that test. We can create a followup issue to address that, but overall I think this is a net improvement.
d5d936f
to
56510e3
Compare
Updated this PR to include the whatsnew entry. I also tried to write a test that demonstrates a failing scenario, but it appears that it's not possible to create Timestamp objects that are far enough in the future with sub-second granularity to trigger this issue, which means that this fix seems to cover all scenarios. I've updated the commit message to reflect this as well. |
doc/source/whatsnew/v0.24.0.rst
Outdated
@@ -1305,6 +1305,7 @@ Datetimelike | |||
- Bug in :class:`DatetimeIndex` where calling ``np.array(dtindex, dtype=object)`` would incorrectly return an array of ``long`` objects (:issue:`23524`) | |||
- Bug in :class:`Index` where passing a timezone-aware :class:`DatetimeIndex` and `dtype=object` would incorrectly raise a ``ValueError`` (:issue:`23524`) | |||
- Bug in :class:`Index` where calling ``np.array(dtindex, dtype=object)`` on a timezone-naive :class:`DatetimeIndex` would return an array of ``datetime`` objects instead of :class:`Timestamp` objects, potentially losing nanosecond portions of the timestamps (:issue:`23524`) | |||
- Bug in :func:`date_range` where using dates with sub-second granularity could return incorrect values or the wrong number of values in the index (:issue:`24110`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of sub-second granularity
could you say microsecond resolution or higher
?
After that LGTM
@@ -769,3 +769,11 @@ def test_all_custom_freq(self, freq): | |||
msg = 'invalid custom frequency string: {freq}' | |||
with pytest.raises(ValueError, match=msg.format(freq=bad_freq)): | |||
bdate_range(START, END, freq=bad_freq) | |||
|
|||
def test_range_with_millisecond_granularity(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add all 3 of the cases from the OP via parameterizeation here
lgtm. @mroeschke over to you. |
Fixes pandas-dev#24110, by avoid floating-point rounding issues with millisecond resolution or higher timestamps when creating a date range.
56510e3
to
3918053
Compare
Updated the whatsnew entry as requested, but specified Also updated the test as requested to be parametrized with all three cases as reported in the original issue. |
Thanks @gabrielreid! Great pull request. |
Fixes pandas-dev#24110, by avoid floating-point rounding issues with millisecond resolution or higher timestamps when creating a date range.
Fixes pandas-dev#24110, by avoid floating-point rounding issues with millisecond resolution or higher timestamps when creating a date range.
Improves (but doesn't completely resolve) #24110, to avoid rounding
issues with sub-second granularity timestamps when creating a
date range.
git diff upstream/master -u -- "*.py" | flake8 --diff