Skip to content

BUG: GH11349 where Series.apply and Series.map did not box timedelta64 #11564

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 31, 2015

Conversation

kawochen
Copy link
Contributor

closes #11349
closes #11925

@kawochen
Copy link
Contributor Author

One test failing test_timedelta64_conversions (pandas.tests.test_series.TestSeries)

OK:

import pandas as pd
import numpy as np
startdate = pd.Series(pd.date_range('2013-01-01', '2013-01-03'))
enddate = pd.Series(pd.date_range('2013-03-01', '2013-03-03'))
s1 = enddate - startdate
s1[2] = np.nan
s1.apply(lambda x: x / pd.Timedelta(1, 'D'))

Fail:

s1.apply(lambda x: x / np.timedelta64(1, 'D'))

I think because
pd.NaT/np.timedelta64(1, 'D') fails

Solutions I can think of all seem quite intrusive:

  1. define __truediv__ for Timestamp (maybe too small a use case to justify the change IMO)
  2. define __truediv__ for NaTType (confusing because time series operations might succeed or fail depending on whether it is all NaT)

Besides this, is there any need to have NaT support other np.timedelta or pd.Timedelta operations?

@jreback
Copy link
Contributor

jreback commented Nov 11, 2015

why would this be failing?

In [23]: s1
Out[23]: 
0   59 days
1   59 days
2       NaT
dtype: timedelta64[ns]

In [22]: s1.apply(lambda x: x / pd.Timedelta(1, 'D'))
Out[22]: 
0    59
1    59
2   NaN
dtype: float64

In [25]: s1.apply(lambda x: x / np.timedelta64(1, 'D'))
Out[25]: 
0    59
1    59
2   NaN
dtype: float64

@kawochen
Copy link
Contributor Author

I meant if you are to box timedelta64 on apply.
It would fail for the same reason this fails in master

In [9]: s1[2]/np.timedelta64(1, 'D')
TypeError: ufunc true_divide cannot use operands with types dtype('O') and dtype('<m8[D]')

@jreback
Copy link
Contributor

jreback commented Nov 11, 2015

@kawochen I c.

yes, I think that NaT should have these methods (__div__/__floordiv__/__truediv__/__mul__), and either raise TypeError (e.g. if the rhs is a datetime64[ns]/Timestamp) or return a NaT if its a timedelta64[ns]/Timedelta

In [17]: s1[2]/np.timedelta64(1,'s')
TypeError: ufunc divide cannot use operands with types dtype('O') and dtype('<m8[s]')

In [18]: s1[2]/pd.Timedelta(np.timedelta64(1,'s'))
Out[18]: NaT

I think what happens now is that the reversed methods are called and for example Timedelta knows how to handle NaT so it works, but of course timedeltat64[ns] has no clue.

@jreback
Copy link
Contributor

jreback commented Nov 11, 2015

Addtl cases which should all be the same (e.g. should all raise a TypeError). I think [20] is wrong. We don't allow division for Timestamps at all. But this is incorrectly converted to Timedelta division.

Ideally also would have more uniform / better error messages (e.g. [21]) is hard to understand and [22] is deferring to numpy and even more tricky to understand.

In [20]: Timestamp('nat')/Timedelta('1s')
Out[20]: NaT

In [21]: Timestamp('nat')/Timedelta('nat')
TypeError: unsupported operand type(s) for /: 'NaTType' and 'NaTType'

In [22]: Timestamp('nat')/np.timedelta64(1,'s')
TypeError: ufunc divide cannot use operands with types dtype('O') and dtype('<m8[s]')

Now [20] actually may be impossible to fix as we have the same NaT for both Timestamp/Timedelta.

@jreback jreback added Bug Timedelta Timedelta data type labels Nov 11, 2015
@jreback jreback added this to the 0.17.1 milestone Nov 13, 2015
@jreback
Copy link
Contributor

jreback commented Nov 13, 2015

@kawochen lmk progress on this.

@jreback jreback modified the milestones: Next Major Release, 0.17.1 Nov 13, 2015
@jreback
Copy link
Contributor

jreback commented Nov 29, 2015

@kawochen can you update

@kawochen
Copy link
Contributor Author

will update soon. but I think __mul__ should never succeed, and the divisions should return nan if Timedelta/timedelta64?

@jreback
Copy link
Contributor

jreback commented Nov 30, 2015

@kawochen these interactions are already defined in core/ops.py/_Timeop, which handle Series interactions. So these should all be uniform (prob best to look at the tests for this, mostly in tseries/tests_timeseries.py and test_series.py

@kawochen
Copy link
Contributor Author

seems to depend on whether NaT is the divisor or the dividend.

#timedelta/NaT -> float
In [19]: pandas.Series(pandas.Timedelta('0 days 00:00:04'))/pandas.NaT
Out[19]:
0   NaN
dtype: float64

#NaT/Timedelta -> NaT
In [20]: pandas.NaT/pandas.Series(pandas.Timedelta('0 days 00:00:04'))
Out[20]:
0   NaT
dtype: timedelta64[ns]

#Timedelta/Timedelta -> float
In [25]: pandas.Series(pandas.Timedelta('0 days 00:00:08'))/pandas.Series(pandas.Timedelta('0 days 00:00:04'))
Out[25]:
0    2
dtype: float64

#Timedelta/NaT -> float
In [26]: pandas.Series(pandas.Timedelta('0 days 00:00:04'))/pandas.Series(pandas.NaT)
Out[26]:
0   NaN
dtype: float64

In [27]: pandas.Series(pandas.NaT) / pandas.Timedelta('0 days 00:00:02')
TypeError: can only operate on a datetime with a rhs of a timedelta/DateOffset for addition and subtraction, but the operator [__truediv__]
was passed 

Maybe this is better, since a Timedelta divided by a Timedelta is a float now.

/ Timedelta Timestamp NaT
Timedelta float TypeError nan
Timestamp TypeError TypeError TypeError
NaT nan TypeError ?

Since NaT plays the dual role of Timestamp null and Timedelta null, it's not clear whether NaT/NaT should error or not.

@jreback
Copy link
Contributor

jreback commented Dec 1, 2015

NaT / NaT should prob be nan or ok with error as well. its a bit inconsitent but not should can avoid that.

@kawochen
Copy link
Contributor Author

kawochen commented Dec 1, 2015

OK. How about these? Should all of them be NaN?

#all current behavior
In [83]: Timedelta('1s')/NaT
Out[83]: NaT

In [84]: NaT/Timedelta('1s')
Out[84]: NaT

In [85]: Series([Timedelta('1s')])/Series([NaT])
Out[85]:
0   NaN
dtype: float64

In [86]: Series([NaT])/Series([Timedelta('1s')])
TypeError: ...

@jreback
Copy link
Contributor

jreback commented Dec 1, 2015

So if we have timedelta/timedelta it returns a float, otherwise tries to preserve type.

In [2]: pd.Timedelta('1s')/10
Out[2]: Timedelta('0 days 00:00:00.100000')

In [3]: pd.Timedelta('1s')/np.nan
Out[3]: NaT

In [5]: Timedelta('1s')/Timedelta('2s')
Out[5]: 0.5

I think would be ok to make [3] return np.nan though (and all of your examples). Generally if that is intermixed with other Timedeltas it would be eventually coerced as well, e.g.

@kawochen
Copy link
Contributor Author

kawochen commented Dec 3, 2015

In [18]: pandas.Series([pandas.Timedelta('1s')])/2
Out[18]: 
0   00:00:00.500000
dtype: timedelta64[ns]

In [19]: pandas.Series([pandas.Timedelta('1s')])/2.0
TypeError: ...

Should [19] work?

Should Series([pd.NaT])*2 , Series([pd.NaT])*2.0, Series([pd.NaT])/2.0 work? (I think they should, since the analogous numpy ops work)

@jreback
Copy link
Contributor

jreback commented Dec 3, 2015

yes in all

@kawochen
Copy link
Contributor Author

kawochen commented Dec 3, 2015

OK. Lots to change 😄

@kawochen
Copy link
Contributor Author

kawochen commented Dec 3, 2015

Just realized pd.Timedelta('1s')/2.0 doesn't work. Should it?

@jreback
Copy link
Contributor

jreback commented Dec 3, 2015

yep suppose that dividing by floats should also work

@@ -879,10 +879,10 @@ def test_timedelta_ops_with_missing_values(self):
actual = s1 - timedelta_NaT
assert_series_equal(actual, sn)

actual = s1 + NA
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this should work? Does it make certain operations convenient?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this just proves coercion of nan -> NaT when the dtype is appropriate (e.g. M8[ns] or m8[ns].

I would expect this to work (as opposed to raising an error)

-> actual = s1 + NA
(Pdb) p s1
0   00:00:01
dtype: timedelta64[ns]
(Pdb) p NA   
nan
(Pdb) p s1 + NA
0   NaT
dtype: timedelta64[ns]
(Pdb) p s1 + pd.NaT
0   NaT
dtype: timedelta64[ns]

with dates

(Pdb) Series(pd.date_range('20130101',periods=2))        
0   2013-01-01
1   2013-01-02
dtype: datetime64[ns]
(Pdb) Series(pd.date_range('20130101',periods=2))+pd.NaT
*** TypeError: can only operate on a datetimes for subtraction, but the operator [__add__] was passed
(Pdb) Series(pd.date_range('20130101',periods=2))+np.nan
*** TypeError: can only operate on a datetimes for subtraction, but the operator [__add__] was passed
(Pdb) Series(pd.date_range('20130101',periods=2))-np.nan
0   NaT
1   NaT
dtype: timedelta64[ns]
(Pdb) Series(pd.date_range('20130101',periods=2))-pd.NaT
0   NaT
1   NaT
dtype: timedelta64[ns]

@kawochen
Copy link
Contributor Author

kawochen commented Dec 7, 2015

So a naked pd.NaT or np.nan is always the same as Series([pd.NaT], dtype='timedelta64[ns]')?

result = s1.astype("timedelta64[{0}]".format(unit))
assert_series_equal(result, expected)

# reverse op
expected = s1.apply(lambda x: np.timedelta64(m,unit) / x)
expected = s1.apply(lambda x: Timedelta(np.timedelta64(m,unit)) / x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should you just add a test here? (e.g. for both wrapped np.timedelta64 and Timedelta)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without wrapping it doesn't work, since np.timedelta64 doesn't return NotImplemented -- it raises TypeError directly

@kawochen
Copy link
Contributor Author

updated

@@ -167,6 +167,56 @@ Backwards incompatible API changes
- The parameter ``out`` has been removed from the ``Series.round()`` method. (:issue:`11763`)
- ``DataFrame.round()`` leaves non-numeric columns unchanged in its return, rather than raises. (:issue:`11885`)

NaT and timedelta64 operations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make underline same len as title (avoids build errors)

@jreback
Copy link
Contributor

jreback commented Dec 29, 2015

@kawochen really just some minor comments. looks really good! lots of edge cases you figured out!

@jreback
Copy link
Contributor

jreback commented Dec 29, 2015

@kawochen can you check this:

This is on master

In [12]: s2
Out[12]:
0   1 days
1   2 days
2   3 days
dtype: timedelta64[ns]

In [9]: pd.Timestamp('2012-01-01') + s2
Out[9]:
0   2012-01-02
1   2012-01-03
2   2012-01-04
dtype: datetime64[ns]

In [10]: pd.Timestamp('2012-01-01') - s2
TypeError: can only operate on a timedelta/DateOffset and a datetime for addition, but the operator [__rsub__] was passed

In [11]: pd.Timestamp('2012-01-01') + (-s2)
Out[11]:
0   2011-12-31
1   2011-12-30
2   2011-12-29
dtype: datetime64[ns]

@jreback
Copy link
Contributor

jreback commented Dec 29, 2015

this last fails, xref #11925

@kawochen kawochen force-pushed the BUG-FIX-11349 branch 2 times, most recently from 8f43e6d to 7f07269 Compare December 30, 2015 05:14
@kawochen
Copy link
Contributor Author

updated

@@ -299,7 +300,18 @@ def __init__(self, left, right, name, na_op):
self.is_datetime64tz_rhs = is_datetime64tz_dtype(rvalues)
self.is_datetime_rhs = self.is_datetime64_rhs or self.is_datetime64tz_rhs
self.is_timedelta_rhs = is_timedelta64_dtype(rvalues)
if not (isinstance(rvalues, pd.Series) or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put a comment indicating why this is necessary here (the expression to determine if its a timedelta)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated. the checking was actually not needed, so I removed it

@jreback jreback modified the milestones: 0.18.0, Next Major Release Dec 30, 2015
@jreback
Copy link
Contributor

jreback commented Dec 30, 2015

@kawochen very minor comment. ping when pushed.

jreback added a commit that referenced this pull request Dec 31, 2015
BUG: GH11349 where Series.apply and Series.map did not box timedelta64
@jreback jreback merged commit a89f489 into pandas-dev:master Dec 31, 2015
@jreback
Copy link
Contributor

jreback commented Dec 31, 2015

@kawochen awesome fixes! This really cleaned lots of edge cases.

keep em coming!

@jreback
Copy link
Contributor

jreback commented Jan 5, 2016

http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#nat-and-timedelta-operations

There is this naked exception.

In [3]: pd.Timestamp('1990315') + pd.Timestamp('19900315')
ValueError:

can you raise an informative message here?

@kawochen
Copy link
Contributor Author

kawochen commented Jan 5, 2016

yes sorry I made a mistake in the example. will get to it tonight.

@rockg rockg mentioned this pull request Jan 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: subtraction of datetime / timedelta should be allowed BUG: boxing Timedeltas on .apply
2 participants