Skip to content

[BUG]: Fix regression when adding timeldeta_range to timestamp #36582

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 24, 2020

Conversation

phofl
Copy link
Member

@phofl phofl commented Sep 23, 2020

@phofl phofl added Regression Functionality that used to work in a prior pandas version Timedelta Timedelta data type labels Sep 23, 2020
@jbrockmendel
Copy link
Member

This is half of the fix i suggested in #35897, but we need both halves.

@phofl
Copy link
Member Author

phofl commented Sep 23, 2020

@jbrockmendel Why is the freq incorrect? We get 0.033333333, what would you expect here?

@jbrockmendel
Copy link
Member

Because of rounding error

tdi = pd.timedelta_range('0s','1s', periods=31)

>>> tdi - tdi.shift(1)
TimedeltaIndex(['-1 days +23:59:59.966666667', '-1 days +23:59:59.966666667',
                '-1 days +23:59:59.966666667', '-1 days +23:59:59.966666668',
                '-1 days +23:59:59.966666668', '-1 days +23:59:59.966666668',
                '-1 days +23:59:59.966666669', '-1 days +23:59:59.966666669',
                '-1 days +23:59:59.966666669', '-1 days +23:59:59.966666670',
                '-1 days +23:59:59.966666670', '-1 days +23:59:59.966666670',
                '-1 days +23:59:59.966666671', '-1 days +23:59:59.966666671',
                '-1 days +23:59:59.966666671', '-1 days +23:59:59.966666672',
                '-1 days +23:59:59.966666672', '-1 days +23:59:59.966666672',
                '-1 days +23:59:59.966666673', '-1 days +23:59:59.966666673',
                '-1 days +23:59:59.966666673', '-1 days +23:59:59.966666674',
                '-1 days +23:59:59.966666674', '-1 days +23:59:59.966666674',
                '-1 days +23:59:59.966666675', '-1 days +23:59:59.966666675',
                '-1 days +23:59:59.966666675', '-1 days +23:59:59.966666676',
                '-1 days +23:59:59.966666676', '-1 days +23:59:59.966666676',
                '-1 days +23:59:59.966666677'],
               dtype='timedelta64[ns]', freq=None)

If tdi.freq is non-None, then we would expect to see tdi - tdi.shift(1) to be constant

@jreback jreback added this to the 1.1.3 milestone Sep 24, 2020
@simonjayhawkins
Copy link
Member

If tdi.freq is non-None, then we would expect to see tdi - tdi.shift(1) to be constant

so are there cases where a frequency should not be inferred by timedelta_range?

if so, maybe revert #32377 and open a issue for discussion on how inference should work?

The docs begin 'Return a fixed frequency TimedeltaIndex...'

so having None as in 1.0.5 maybe not desirable, but can perhaps also discuss the api breaking implications.

for instance should pd.timedelta_range('0s','1s', periods=31) perhaps raise with a message along the lines 'ValueError: cannot infer a fixed frequency from start, end and periods due to rounding errors` or is a fractional freq a feasible option?

did #32377 actually fix/address any reported bugs/enhancement requests? It appears it was a pre-cursor to a PR that was closed without merging #31195.

This is half of the fix i suggested in #35897, but we need both halves.

is it worth splitting #35897

or is the inferred freq for pd.timedelta_range('0s','1s', periods=31) really the only issue here?

@simonjayhawkins
Copy link
Member

simonjayhawkins commented Sep 24, 2020

is the solution to only infer freq if

> c:\users\simon\pandas\pandas\core\arrays\timedeltas.py(271)_generate_range()
-> freq = to_offset(td)
(Pdb) td
Timedelta('0 days 00:00:00.033333333')
(Pdb) td.value
33333333
(Pdb) start.value
0
(Pdb) end.value
1000000000
(Pdb) periods
31
(Pdb) assert start.value + td.value * (periods - 1) == end.value
*** AssertionError
(Pdb) start.value + td.value * (periods -1)
999999990

@jreback jreback merged commit dee2c55 into pandas-dev:master Sep 24, 2020
@jreback
Copy link
Contributor

jreback commented Sep 24, 2020

thanks @phofl good patch.

@simonjayhawkins can certainly discuss on the issue but agree inferring None here is right.

@jbrockmendel
Copy link
Member

@jreback pls revert, this is 100% wrong, #36595 is the correct solution.

@jreback
Copy link
Contributor

jreback commented Sep 25, 2020

ok sure (pls push it if you can).

jbrockmendel added a commit that referenced this pull request Sep 25, 2020
@jorisvandenbossche jorisvandenbossche modified the milestones: 1.1.3, 1.2 Sep 25, 2020
simonjayhawkins pushed a commit that referenced this pull request Sep 25, 2020
simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this pull request Sep 25, 2020
simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this pull request Sep 25, 2020
@phofl phofl deleted the 35897 branch September 29, 2020 21:07
kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Regression Functionality that used to work in a prior pandas version Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regression in pandas 1.1.0: no longer allows adding a timestamp to timedelta range to create a datetime range.
5 participants