Skip to content

Strange output from timedelta_range for integer start/end #8886

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
shoyer opened this issue Nov 24, 2014 · 6 comments
Open

Strange output from timedelta_range for integer start/end #8886

shoyer opened this issue Nov 24, 2014 · 6 comments
Labels
Enhancement Error Reporting Incorrect or improved errors from pandas Timedelta Timedelta data type

Comments

@shoyer
Copy link
Member

shoyer commented Nov 24, 2014

Here I expected a TimedeltaIndex with 11 elements from 0 hours through 11 hours:

>>> pd.timedelta_range(0, 10, freq='H')
<class 'pandas.tseries.tdi.TimedeltaIndex'>
['0 days']
Length: 1, Freq: <Hour>

It appears that freq is taken to refer only to the step size, not the start/stop, and the unit defaults to ns like Timedelta itself? I think we should either:

  1. add a separate unit argument -- but this is a subtle distinction
  2. use freq to also imply the unit
  3. raise on integer input (possibly in addition to 1?)

Similarly strange and somewhat inconsistent with the above example:

>>> pd.timedelta_range(0, 10, periods=100)
ValueError: Must specify two of start, end, or periods
@jorisvandenbossche
Copy link
Member

I think your expected output is this, correct? (but then with specifying the end instead of periods)

In [129]: pd.timedelta_range(0, periods=11, freq='H')
Out[129]:
<class 'pandas.tseries.tdi.TimedeltaIndex'>
['00:00:00', ..., '10:00:00']
Length: 11, Freq: <Hour>

I think adding a unit kwarg could be useful, so you can do unit='h' instead of multiplying:

In [127]: pd.timedelta_range(0, 10*3600*1000000000, freq='H')
Out[127]:
<class 'pandas.tseries.tdi.TimedeltaIndex'>
['00:00:00', ..., '10:00:00']
Length: 11, Freq: <Hour>

I don't think using freq for this would be a good idea, as eg what should be done if you specify freq=2H? How to interpret the integer then?

In [128]: pd.timedelta_range(0, 10*3600*1000000000, freq='2H')
Out[128]:
<class 'pandas.tseries.tdi.TimedeltaIndex'>
['00:00:00', ..., '10:00:00']
Length: 6, Freq: <2 * Hours>

Also raising on integer input is not a good idea IMHO, as it is valid input (although maybe not used that much).

Possible counterargument: date_range does not have the unit kwarg that Timestamp does have, but I think for timedelta's it is much more natural to specify it as an integer.

@jreback
Copy link
Contributor

jreback commented Nov 24, 2014

I think specifying an integer and a unit is ok, as it is allowed in pd.to_timedelta. But would have to be somewhat restricted and apply ONLY to integer input (so it would default to 'ns'). The start/end just need to be convertible to Timedelta, so can be datetime.timedelta/np.timedelta64/Timedelta/string-like/offset. So integer need a helper.

Though not sure how harder it is to do:

pd.timedelta_range(Timedelta(0,unit='d'),Timedelta(10,unit='d'),freq='d')

@jreback jreback added this to the 0.16.0 milestone Nov 24, 2014
@shoyer
Copy link
Member Author

shoyer commented Nov 24, 2014

There are definitely other ways to get this done, but it is (in my opinion) rather confusing that integers arguments are treated as offsets in ns, especially given that the default frequency is different. The use of ns internally is a rather arcane implementation detail...

So my vote is that this should raise and we don't need any other changes. The main reason I like having a unit argument is that it makes it clear that the units for the bounds are not the same thing as the frequency.

@jreback
Copy link
Contributor

jreback commented Nov 24, 2014

ok on raising for integers. The problem with a sane default is that not everyone thinks is sane. ns is quite sane as that is the actual impl (and how numpy does it too).

@jreback jreback added the Error Reporting Incorrect or improved errors from pandas label Nov 24, 2014
@jorisvandenbossche
Copy link
Member

In my opinion raising on integers in not an option. It is a perfectly valid input (as it is also the default for Timedelta), so why would we raise? OK, because we think it will confuse a lot of users. And in that case it is a bit of a balance we should seek here, but I lean to better documenting it (and adding the unit kwarg).

Also, when would you raise? Only if the end is an integer? Or also if the start is an integer? Because I don't think the following should raise:

In [9]: pd.timedelta_range(0, periods=10, freq='H')
Out[9]:
<class 'pandas.tseries.tdi.TimedeltaIndex'>
['00:00:00', ..., '09:00:00']
Length: 10, Freq: <Hour>

@jreback
Copy link
Contributor

jreback commented Nov 24, 2014

I think you have to have start and end specified with no freq THEN it will require a unit (or raise)
as unspecified/ambiguous operation (e.g. would be ns by default but that should be explicit).

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Error Reporting Incorrect or improved errors from pandas Timedelta Timedelta data type
Projects
None yet
Development

No branches or pull requests

4 participants