Skip to content

BUG: regression in Timedelta.__rfloordiv__ with integer data #19761

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Feb 19, 2018 · 12 comments · Fixed by #21036
Closed

BUG: regression in Timedelta.__rfloordiv__ with integer data #19761

jorisvandenbossche opened this issue Feb 19, 2018 · 12 comments · Fixed by #21036
Labels
Blocker Blocking issue or pull request for an upcoming release Regression Functionality that used to work in a prior pandas version Timedelta Timedelta data type
Milestone

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Feb 19, 2018

From a failing example in the docs (http://pandas-docs.github.io/pandas-docs-travis/timeseries.html#from-timestamps-to-epoch):

In [46]: stamps = pd.date_range('2012-10-08 18:15:05', periods=4, freq='D')

In [47]: stamps.view('int64') // pd.Timedelta(1, unit='s')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-47-9893b9bae6cf> in <module>()
----> 1 stamps.view('int64') // pd.Timedelta(1, unit='s')

/home/joris/scipy/pandas/pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.Timedelta.__rfloordiv__()

TypeError: Invalid dtype int64 for __floordiv__

This was working correctly in 0.22, and is mentioned in the docs explicitly as a way to get unix epoch data.

The previously working example (http://pandas.pydata.org/pandas-docs/stable/timeseries.html#from-timestamps-to-epoch) looks like:

In [42]: stamps.view('int64') // pd.Timedelta(1, unit='s')
Out[42]: array([1349720105, 1349806505, 1349892905, 1349979305])

cc @jbrockmendel

@jorisvandenbossche jorisvandenbossche added Regression Functionality that used to work in a prior pandas version Timedelta Timedelta data type labels Feb 19, 2018
@jorisvandenbossche jorisvandenbossche added this to the 0.23.0 milestone Feb 19, 2018
@jorisvandenbossche
Copy link
Member Author

I think this was introduced (on purpose, tests were added) in #18961.
I agree the above example that was included in the docs is a bit strange, as when dividing 'int / timedelta', the timedelta is implicitly cast to an integer as it nanoseconds. But, on the other hand, this was clearly in our docs as a way to get epoch data, so we cannot just break it I think

@TomAugspurger TomAugspurger added the Blocker Blocking issue or pull request for an upcoming release label Mar 28, 2018
@jorisvandenbossche
Copy link
Member Author

So we need to make a decision on this one. Options:

  1. Keep the behaviour on master (= keep limitation of behaviour of Timedelta.__rfloordiv__ ) and update the docs with an alternative solution. What would this alternative be?

  2. Undo the regression and re-enable the previous behaviour of Timedelta.__rfloordiv__

  3. Re-enable the previous behaviour of Timedelta.__rfloordiv__ but with a deprecation warning it will be removed in the future. Provide alternative solution in the warning and docs.

cc @jreback @TomAugspurger

Given that the method is not very intuitive, I would be in favor of option 3 if we have a good alternative.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 29, 2018 via email

@jbrockmendel
Copy link
Member

For 1, the alternative solution would be to also use the integer value for the Timedelta

stamps.view('int64') // pd.Timedelta(1, unit='s').value

It's also a bit weird to have __floordiv__ work but not __div__ (as is the case in 0.22.0)

@jreback
Copy link
Contributor

jreback commented Apr 15, 2018

so either use .value on the rhs operand (as @jbrockmendel suggest) or

In [12]: (stamps-pd.Timestamp(0)) // pd.Timedelta(1, 's')
Out[12]: Int64Index([1349720105, 1349806505, 1349892905, 1349979305], dtype='int64')

is a reasonable soln (This is more natural as you are dividing a TDI / Timedelta which is a bit more clear)

@jreback jreback modified the milestones: 0.23.0, 0.24.0 Apr 24, 2018
@jorisvandenbossche jorisvandenbossche modified the milestones: 0.24.0, 0.23.0 Apr 24, 2018
@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Apr 24, 2018

Are we comfortable with advertising the use of .value for Timestamp/Timedeltas ?

In that case, the stamps.view('int64') // pd.Timedelta(1, unit='s').value seems maybe best. Although we also had discussion regarding deprecating view (of course we can also use .astype('int') (but that does an extra copy) or .asi8)

@jreback
Copy link
Contributor

jreback commented Apr 24, 2018

you can use .view if u want it’s ok here

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Apr 24, 2018

@jreback note that you actually proposed to deprecate view: #20251. If we still plan to do that, we should not use it here. If we think this is actually a good use case for view, we can also close that issue.

@jreback
Copy link
Contributor

jreback commented Apr 24, 2018

i know

we have .asi8 which is slightly non standard
maybe just .astype(‘i8’) is canonical

@TomAugspurger
Copy link
Contributor

I have a fix for this using .value, but I think we should just implement #14772 quick (to_epoch). Will give that a shot now.

@TomAugspurger
Copy link
Contributor

We'll still need the warning for backwards compat, but we'll recommend to_epoch instead of doing a floordiv. I assume epoch conversion is the main use case for integer array / Timedelta.

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue May 14, 2018
Closes pandas-dev#19761.

```python
In [2]: pd.DatetimeIndex(['1931', '1970', '2017']).view('i8') // pd.Timedelta(1, unit='s')
pandas-dev/bin/ipython:1: FutureWarning: Floor division between integer array and Timedelta is
deprecated. Use 'array // timedelta.value' instead.

Out[2]: array([-1230768000,           0,  1483228800])
```
TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue May 14, 2018
Closes pandas-dev#19761.

```python
In [2]: pd.DatetimeIndex(['1931', '1970', '2017']).view('i8') // pd.Timedelta(1, unit='s')
pandas-dev/bin/ipython:1: FutureWarning: Floor division between integer array and Timedelta is
deprecated. Use 'array // timedelta.value' instead.

Out[2]: array([-1230768000,           0,  1483228800])
```
@jorisvandenbossche
Copy link
Member Author

I was thinking, would actually allowing a `Timestamp(..) / Timedelta(..) = number" operation be another option?
Probably not, because it is not a reversible operation, and it also not necessarily clear that such an operation interprets the Timestamp as timedelta since 1970, but just throwing out here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocker Blocking issue or pull request for an upcoming release Regression Functionality that used to work in a prior pandas version Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants