BUG: GH11349 where Series.apply and Series.map did not box timedelta64 #11564

kawochen · 2015-11-10T05:53:32Z

closes #11349
closes #11925

kawochen · 2015-11-10T15:10:12Z

One test failing test_timedelta64_conversions (pandas.tests.test_series.TestSeries)

OK:

import pandas as pd
import numpy as np
startdate = pd.Series(pd.date_range('2013-01-01', '2013-01-03'))
enddate = pd.Series(pd.date_range('2013-03-01', '2013-03-03'))
s1 = enddate - startdate
s1[2] = np.nan
s1.apply(lambda x: x / pd.Timedelta(1, 'D'))

Fail:

s1.apply(lambda x: x / np.timedelta64(1, 'D'))

I think because
pd.NaT/np.timedelta64(1, 'D') fails

Solutions I can think of all seem quite intrusive:

define __truediv__ for Timestamp (maybe too small a use case to justify the change IMO)
define __truediv__ for NaTType (confusing because time series operations might succeed or fail depending on whether it is all NaT)

Besides this, is there any need to have NaT support other np.timedelta or pd.Timedelta operations?

jreback · 2015-11-11T03:17:06Z

why would this be failing?

In [23]: s1
Out[23]: 
0   59 days
1   59 days
2       NaT
dtype: timedelta64[ns]

In [22]: s1.apply(lambda x: x / pd.Timedelta(1, 'D'))
Out[22]: 
0    59
1    59
2   NaN
dtype: float64

In [25]: s1.apply(lambda x: x / np.timedelta64(1, 'D'))
Out[25]: 
0    59
1    59
2   NaN
dtype: float64

kawochen · 2015-11-11T04:44:41Z

I meant if you are to box timedelta64 on apply.
It would fail for the same reason this fails in master

In [9]: s1[2]/np.timedelta64(1, 'D')
TypeError: ufunc true_divide cannot use operands with types dtype('O') and dtype('<m8[D]')

jreback · 2015-11-11T12:42:46Z

@kawochen I c.

yes, I think that NaT should have these methods (__div__/__floordiv__/__truediv__/__mul__), and either raise TypeError (e.g. if the rhs is a datetime64[ns]/Timestamp) or return a NaT if its a timedelta64[ns]/Timedelta

In [17]: s1[2]/np.timedelta64(1,'s')
TypeError: ufunc divide cannot use operands with types dtype('O') and dtype('<m8[s]')

In [18]: s1[2]/pd.Timedelta(np.timedelta64(1,'s'))
Out[18]: NaT

I think what happens now is that the reversed methods are called and for example Timedelta knows how to handle NaT so it works, but of course timedeltat64[ns] has no clue.

jreback · 2015-11-11T12:46:57Z

Addtl cases which should all be the same (e.g. should all raise a TypeError). I think [20] is wrong. We don't allow division for Timestamps at all. But this is incorrectly converted to Timedelta division.

Ideally also would have more uniform / better error messages (e.g. [21]) is hard to understand and [22] is deferring to numpy and even more tricky to understand.

In [20]: Timestamp('nat')/Timedelta('1s')
Out[20]: NaT

In [21]: Timestamp('nat')/Timedelta('nat')
TypeError: unsupported operand type(s) for /: 'NaTType' and 'NaTType'

In [22]: Timestamp('nat')/np.timedelta64(1,'s')
TypeError: ufunc divide cannot use operands with types dtype('O') and dtype('<m8[s]')

Now [20] actually may be impossible to fix as we have the same NaT for both Timestamp/Timedelta.

jreback · 2015-11-13T19:04:55Z

@kawochen lmk progress on this.

jreback · 2015-11-29T18:32:13Z

@kawochen can you update

kawochen · 2015-11-30T02:57:31Z

will update soon. but I think __mul__ should never succeed, and the divisions should return nan if Timedelta/timedelta64?

jreback · 2015-11-30T11:52:32Z

@kawochen these interactions are already defined in core/ops.py/_Timeop, which handle Series interactions. So these should all be uniform (prob best to look at the tests for this, mostly in tseries/tests_timeseries.py and test_series.py

kawochen · 2015-11-30T18:29:26Z

seems to depend on whether NaT is the divisor or the dividend.

#timedelta/NaT -> float
In [19]: pandas.Series(pandas.Timedelta('0 days 00:00:04'))/pandas.NaT
Out[19]:
0   NaN
dtype: float64

#NaT/Timedelta -> NaT
In [20]: pandas.NaT/pandas.Series(pandas.Timedelta('0 days 00:00:04'))
Out[20]:
0   NaT
dtype: timedelta64[ns]

#Timedelta/Timedelta -> float
In [25]: pandas.Series(pandas.Timedelta('0 days 00:00:08'))/pandas.Series(pandas.Timedelta('0 days 00:00:04'))
Out[25]:
0    2
dtype: float64

#Timedelta/NaT -> float
In [26]: pandas.Series(pandas.Timedelta('0 days 00:00:04'))/pandas.Series(pandas.NaT)
Out[26]:
0   NaN
dtype: float64

In [27]: pandas.Series(pandas.NaT) / pandas.Timedelta('0 days 00:00:02')
TypeError: can only operate on a datetime with a rhs of a timedelta/DateOffset for addition and subtraction, but the operator [__truediv__]
was passed

Maybe this is better, since a Timedelta divided by a Timedelta is a float now.

/	`Timedelta`	`Timestamp`	`NaT`
`Timedelta`	`float`	`TypeError`	`nan`
`Timestamp`	`TypeError`	`TypeError`	`TypeError`
`NaT`	`nan`	`TypeError`	?

Since NaT plays the dual role of Timestamp null and Timedelta null, it's not clear whether NaT/NaT should error or not.

jreback · 2015-12-01T02:52:51Z

NaT / NaT should prob be nan or ok with error as well. its a bit inconsitent but not should can avoid that.

kawochen · 2015-12-01T13:57:45Z

OK. How about these? Should all of them be NaN?

#all current behavior
In [83]: Timedelta('1s')/NaT
Out[83]: NaT

In [84]: NaT/Timedelta('1s')
Out[84]: NaT

In [85]: Series([Timedelta('1s')])/Series([NaT])
Out[85]:
0   NaN
dtype: float64

In [86]: Series([NaT])/Series([Timedelta('1s')])
TypeError: ...

jreback · 2015-12-01T14:31:56Z

So if we have timedelta/timedelta it returns a float, otherwise tries to preserve type.

In [2]: pd.Timedelta('1s')/10
Out[2]: Timedelta('0 days 00:00:00.100000')

In [3]: pd.Timedelta('1s')/np.nan
Out[3]: NaT

In [5]: Timedelta('1s')/Timedelta('2s')
Out[5]: 0.5

I think would be ok to make [3] return np.nan though (and all of your examples). Generally if that is intermixed with other Timedeltas it would be eventually coerced as well, e.g.

kawochen · 2015-12-03T04:20:43Z

In [18]: pandas.Series([pandas.Timedelta('1s')])/2
Out[18]: 
0   00:00:00.500000
dtype: timedelta64[ns]

In [19]: pandas.Series([pandas.Timedelta('1s')])/2.0
TypeError: ...

Should [19] work?

Should Series([pd.NaT])*2 , Series([pd.NaT])*2.0, Series([pd.NaT])/2.0 work? (I think they should, since the analogous numpy ops work)

jreback · 2015-12-03T04:24:47Z

yes in all

kawochen · 2015-12-03T04:34:32Z

OK. Lots to change 😄

kawochen · 2015-12-03T04:51:26Z

Just realized pd.Timedelta('1s')/2.0 doesn't work. Should it?

jreback · 2015-12-03T11:20:08Z

yep suppose that dividing by floats should also work

kawochen · 2015-12-06T04:30:51Z

pandas/tseries/tests/test_timedeltas.py

@@ -879,10 +879,10 @@ def test_timedelta_ops_with_missing_values(self):
        actual = s1 - timedelta_NaT
        assert_series_equal(actual, sn)

-        actual = s1 + NA


I don't see why this should work? Does it make certain operations convenient?

this just proves coercion of nan -> NaT when the dtype is appropriate (e.g. M8[ns] or m8[ns].

I would expect this to work (as opposed to raising an error)

-> actual = s1 + NA (Pdb) p s1 0 00:00:01 dtype: timedelta64[ns] (Pdb) p NA nan (Pdb) p s1 + NA 0 NaT dtype: timedelta64[ns] (Pdb) p s1 + pd.NaT 0 NaT dtype: timedelta64[ns]

with dates

(Pdb) Series(pd.date_range('20130101',periods=2)) 0 2013-01-01 1 2013-01-02 dtype: datetime64[ns] (Pdb) Series(pd.date_range('20130101',periods=2))+pd.NaT *** TypeError: can only operate on a datetimes for subtraction, but the operator [__add__] was passed (Pdb) Series(pd.date_range('20130101',periods=2))+np.nan *** TypeError: can only operate on a datetimes for subtraction, but the operator [__add__] was passed (Pdb) Series(pd.date_range('20130101',periods=2))-np.nan 0 NaT 1 NaT dtype: timedelta64[ns] (Pdb) Series(pd.date_range('20130101',periods=2))-pd.NaT 0 NaT 1 NaT dtype: timedelta64[ns]

kawochen · 2015-12-07T01:25:19Z

So a naked pd.NaT or np.nan is always the same as Series([pd.NaT], dtype='timedelta64[ns]')?

jreback · 2015-12-16T14:04:56Z

pandas/tests/test_series.py

                    result = s1.astype("timedelta64[{0}]".format(unit))
                    assert_series_equal(result, expected)

                # reverse op
-                expected = s1.apply(lambda x: np.timedelta64(m,unit) / x)
+                expected = s1.apply(lambda x: Timedelta(np.timedelta64(m,unit)) / x)


should you just add a test here? (e.g. for both wrapped np.timedelta64 and Timedelta)

without wrapping it doesn't work, since np.timedelta64 doesn't return NotImplemented -- it raises TypeError directly

kawochen · 2015-12-28T21:22:36Z

updated

jreback · 2015-12-29T21:45:33Z

doc/source/whatsnew/v0.18.0.txt

@@ -167,6 +167,56 @@ Backwards incompatible API changes
 - The parameter ``out`` has been removed from the ``Series.round()`` method. (:issue:`11763`)
 - ``DataFrame.round()`` leaves non-numeric columns unchanged in its return, rather than raises. (:issue:`11885`)

+NaT and timedelta64 operations
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^


make underline same len as title (avoids build errors)

jreback · 2015-12-29T22:27:08Z

@kawochen really just some minor comments. looks really good! lots of edge cases you figured out!

jreback · 2015-12-29T22:38:34Z

@kawochen can you check this:

This is on master

In [12]: s2
Out[12]:
0   1 days
1   2 days
2   3 days
dtype: timedelta64[ns]

In [9]: pd.Timestamp('2012-01-01') + s2
Out[9]:
0   2012-01-02
1   2012-01-03
2   2012-01-04
dtype: datetime64[ns]

In [10]: pd.Timestamp('2012-01-01') - s2
TypeError: can only operate on a timedelta/DateOffset and a datetime for addition, but the operator [__rsub__] was passed

In [11]: pd.Timestamp('2012-01-01') + (-s2)
Out[11]:
0   2011-12-31
1   2011-12-30
2   2011-12-29
dtype: datetime64[ns]

jreback · 2015-12-29T22:41:57Z

this last fails, xref #11925

kawochen · 2015-12-30T07:32:31Z

updated

jreback · 2015-12-30T15:36:32Z

pandas/core/ops.py

@@ -299,7 +300,18 @@ def __init__(self, left, right, name, na_op):
        self.is_datetime64tz_rhs = is_datetime64tz_dtype(rvalues)
        self.is_datetime_rhs = self.is_datetime64_rhs or self.is_datetime64tz_rhs
        self.is_timedelta_rhs = is_timedelta64_dtype(rvalues)
+        if not (isinstance(rvalues, pd.Series) or


can you put a comment indicating why this is necessary here (the expression to determine if its a timedelta)

updated. the checking was actually not needed, so I removed it

jreback · 2015-12-30T15:38:28Z

@kawochen very minor comment. ping when pushed.

BUG: GH11349 where Series.apply and Series.map did not box timedelta64

jreback · 2015-12-31T13:46:51Z

@kawochen awesome fixes! This really cleaned lots of edge cases.

keep em coming!

jreback · 2016-01-05T11:54:15Z

http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#nat-and-timedelta-operations

There is this naked exception.

In [3]: pd.Timestamp('1990315') + pd.Timestamp('19900315')
ValueError:

can you raise an informative message here?

kawochen · 2016-01-05T18:44:24Z

yes sorry I made a mistake in the example. will get to it tonight.

jreback added Bug Timedelta Timedelta data type labels Nov 11, 2015

jreback added this to the 0.17.1 milestone Nov 13, 2015

jreback modified the milestones: Next Major Release, 0.17.1 Nov 13, 2015

jreback mentioned this pull request Nov 30, 2015

Timestamp subtraction of NaT with timezones #11718

Closed

kawochen force-pushed the BUG-FIX-11349 branch from c2e6ae5 to fd4f128 Compare December 5, 2015 15:17

kawochen reviewed Dec 6, 2015
View reviewed changes

jreback mentioned this pull request Dec 9, 2015

Series.apply loses timezone information #11800

Closed

jreback reviewed Dec 16, 2015
View reviewed changes

kawochen force-pushed the BUG-FIX-11349 branch from fd4f128 to 18dd422 Compare December 28, 2015 20:54

kawochen force-pushed the BUG-FIX-11349 branch from 18dd422 to 46bd000 Compare December 29, 2015 21:17

jreback reviewed Dec 29, 2015
View reviewed changes

kawochen force-pushed the BUG-FIX-11349 branch 2 times, most recently from 8f43e6d to 7f07269 Compare December 30, 2015 05:14

jreback reviewed Dec 30, 2015
View reviewed changes

jreback modified the milestones: 0.18.0, Next Major Release Dec 30, 2015

kawochen force-pushed the BUG-FIX-11349 branch from 7f07269 to 296c042 Compare December 31, 2015 06:25

BUG: GH11349 where Series.apply and Series.map did not box timedelta64

513c5c8

kawochen force-pushed the BUG-FIX-11349 branch from 296c042 to 513c5c8 Compare December 31, 2015 06:37

jreback added a commit that referenced this pull request Dec 31, 2015

Merge pull request #11564 from kawochen/BUG-FIX-11349

a89f489

BUG: GH11349 where Series.apply and Series.map did not box timedelta64

jreback merged commit a89f489 into pandas-dev:master Dec 31, 2015

rockg mentioned this pull request Jan 25, 2016

BUG: .date and timezones #11757

Closed

Uh oh!

BUG: GH11349 where Series.apply and Series.map did not box timedelta64 #11564

BUG: GH11349 where Series.apply and Series.map did not box timedelta64 #11564

Uh oh!

Conversation

kawochen commented Nov 10, 2015

Uh oh!

kawochen commented Nov 10, 2015

Uh oh!

jreback commented Nov 11, 2015

Uh oh!

kawochen commented Nov 11, 2015

Uh oh!

jreback commented Nov 11, 2015

Uh oh!

jreback commented Nov 11, 2015

Uh oh!

jreback commented Nov 13, 2015

Uh oh!

jreback commented Nov 29, 2015

Uh oh!

kawochen commented Nov 30, 2015

Uh oh!

jreback commented Nov 30, 2015

Uh oh!

kawochen commented Nov 30, 2015

Uh oh!

jreback commented Dec 1, 2015

Uh oh!

kawochen commented Dec 1, 2015

Uh oh!

jreback commented Dec 1, 2015

Uh oh!

kawochen commented Dec 3, 2015

Uh oh!

jreback commented Dec 3, 2015

Uh oh!

kawochen commented Dec 3, 2015

Uh oh!

kawochen commented Dec 3, 2015

Uh oh!

jreback commented Dec 3, 2015

Uh oh!

kawochen Dec 6, 2015

Choose a reason for hiding this comment

Uh oh!

jreback Dec 6, 2015

Choose a reason for hiding this comment

Uh oh!

kawochen commented Dec 7, 2015

Uh oh!

jreback Dec 16, 2015

Choose a reason for hiding this comment

Uh oh!

kawochen Dec 28, 2015

Choose a reason for hiding this comment

Uh oh!

kawochen commented Dec 28, 2015

Uh oh!

jreback Dec 29, 2015

Choose a reason for hiding this comment

Uh oh!

jreback commented Dec 29, 2015

Uh oh!

jreback commented Dec 29, 2015

Uh oh!

jreback commented Dec 29, 2015

Uh oh!

kawochen commented Dec 30, 2015

Uh oh!

jreback Dec 30, 2015

Choose a reason for hiding this comment

Uh oh!

kawochen Dec 31, 2015

Choose a reason for hiding this comment

Uh oh!

jreback commented Dec 30, 2015

Uh oh!

jreback commented Dec 31, 2015

Uh oh!