Skip to content

BUG: date_range issue with sub-second granularity #24129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 9, 2018

Conversation

gabrielreid
Copy link
Contributor

@gabrielreid gabrielreid commented Dec 6, 2018

Improves (but doesn't completely resolve) #24110, to avoid rounding
issues with sub-second granularity timestamps when creating a
date range.

@pep8speaks
Copy link

Hello @gabrielreid! Thanks for submitting the PR.

@codecov
Copy link

codecov bot commented Dec 6, 2018

Codecov Report

Merging #24129 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24129      +/-   ##
==========================================
- Coverage    92.2%    92.2%   -0.01%     
==========================================
  Files         162      162              
  Lines       51701    51701              
==========================================
- Hits        47672    47671       -1     
- Misses       4029     4030       +1
Flag Coverage Δ
#multiple 90.6% <100%> (ø) ⬆️
#single 43.02% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/datetimes.py 98.56% <100%> (ø) ⬆️
pandas/io/json/json.py 92.61% <0%> (-0.48%) ⬇️
pandas/util/testing.py 87.51% <0%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b78aa8d...d5d936f. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 6, 2018

Codecov Report

Merging #24129 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #24129   +/-   ##
=======================================
  Coverage   92.21%   92.21%           
=======================================
  Files         162      162           
  Lines       51721    51721           
=======================================
  Hits        47692    47692           
  Misses       4029     4029
Flag Coverage Δ
#multiple 90.61% <100%> (ø) ⬆️
#single 43.03% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/arrays/datetimes.py 98.56% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0ac1197...3918053. Read the comment docs.

# pass to np.linspace as much as possible
arr = np.linspace(
0, end.value - start.value,
periods).astype('int64') + start.value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just pass in int64 into the dtype argument of linspace instead of astypeing here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve been wrestling with something similar (#24123). Does this get fixed if you cast “endpoint” to int64 in generate_range_overflow_safe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel generate_range_overflow_save isn't getting called in this situation, because no frequency has been supplied (which I assume is why np.linspace is used).

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good overall. Also needs a whatsnew entry (v0.24.0).

You mentioned that this fix doesn't solve all resolutions. Do you mind adding another test where this still fails, and xfail that test. We can create a followup issue to address that, but overall I think this is a net improvement.

@mroeschke mroeschke added the Datetime Datetime data dtype label Dec 6, 2018
@gabrielreid
Copy link
Contributor Author

Updated this PR to include the whatsnew entry.

I also tried to write a test that demonstrates a failing scenario, but it appears that it's not possible to create Timestamp objects that are far enough in the future with sub-second granularity to trigger this issue, which means that this fix seems to cover all scenarios. I've updated the commit message to reflect this as well.

@@ -1305,6 +1305,7 @@ Datetimelike
- Bug in :class:`DatetimeIndex` where calling ``np.array(dtindex, dtype=object)`` would incorrectly return an array of ``long`` objects (:issue:`23524`)
- Bug in :class:`Index` where passing a timezone-aware :class:`DatetimeIndex` and `dtype=object` would incorrectly raise a ``ValueError`` (:issue:`23524`)
- Bug in :class:`Index` where calling ``np.array(dtindex, dtype=object)`` on a timezone-naive :class:`DatetimeIndex` would return an array of ``datetime`` objects instead of :class:`Timestamp` objects, potentially losing nanosecond portions of the timestamps (:issue:`23524`)
- Bug in :func:`date_range` where using dates with sub-second granularity could return incorrect values or the wrong number of values in the index (:issue:`24110`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of sub-second granularity could you say microsecond resolution or higher?

After that LGTM

@jreback jreback added this to the 0.24.0 milestone Dec 7, 2018
@@ -769,3 +769,11 @@ def test_all_custom_freq(self, freq):
msg = 'invalid custom frequency string: {freq}'
with pytest.raises(ValueError, match=msg.format(freq=bad_freq)):
bdate_range(START, END, freq=bad_freq)

def test_range_with_millisecond_granularity(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add all 3 of the cases from the OP via parameterizeation here

@jreback
Copy link
Contributor

jreback commented Dec 9, 2018

lgtm. @mroeschke over to you.

Fixes pandas-dev#24110, by avoid floating-point rounding issues with
millisecond resolution or higher timestamps when creating a date
range.
@gabrielreid
Copy link
Contributor Author

Updated the whatsnew entry as requested, but specified millisecond resolution or higher, as the millisecond resolution is sufficient to trigger this issue (and it's what was used to report the issue).

Also updated the test as requested to be parametrized with all three cases as reported in the original issue.

@mroeschke mroeschke merged commit 77278ba into pandas-dev:master Dec 9, 2018
@mroeschke
Copy link
Member

Thanks @gabrielreid! Great pull request.

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Fixes pandas-dev#24110, by avoid floating-point rounding issues with
millisecond resolution or higher timestamps when creating a date
range.
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Fixes pandas-dev#24110, by avoid floating-point rounding issues with
millisecond resolution or higher timestamps when creating a date
range.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

date_range() with closed=left and sub-second granularity returns wrong number of elements
5 participants