Skip to content

BUG: boxing Timedeltas on .apply #11349

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amelio-vazquez-reina opened this issue Oct 16, 2015 · 2 comments · Fixed by #11564
Closed

BUG: boxing Timedeltas on .apply #11349

amelio-vazquez-reina opened this issue Oct 16, 2015 · 2 comments · Fixed by #11564
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timedelta Timedelta data type
Milestone

Comments

@amelio-vazquez-reina
Copy link
Contributor

Consider the following Series:

object_id
0CKVYKjyFn    76 days
0CrPL2QKH3   -15 days
0CrVStlVrg    23 days
0Cc5ZvS67u    76 days
0CTOk5OdtI    76 days
0CTSWtTzBa    76 days
0CwBqVeNCX    76 days
0CIRJFIOcD    58 days
0CRQPCxzQe   350 days
0CAq4m9Nru    15 days
0C617yvXBj    76 days
0CzUUJNKX9   -16 days
Name: days_left, dtype: timedelta64[ns]

I am hoping to convert the above to hours.

If I do:

my_series.dt.hours

I get:

AttributeError: 'Series' object has no attribute 'hours

What's even more strange is that if I do:

> my_series[0].total_seconds()/3600
1824.0

it works for one element, but if I do:

> my_series.apply(lambda x: x.total_seconds())

I get:

AttributeError: 'numpy.timedelta64' object has no attribute 'total_seconds'

I thought apply would run the function I pass it item by item in the series. Why does total_seconds() work for a single item, but not with apply?

@chris-b1
Copy link
Contributor

As outlined in the docs the way to do conversions is via astype (which truncates units) or by dividing by the appropriate delta (which doesn't)

In [8]: s.astype('m8[h]')
Out[8]: 
0     1824
1     -360
2      552
3     1824
4     1824
5     1824
6     1824
7     1392
8     8400
9      360
10    1824
11    -384
Name: 1, dtype: float64

In [9]: s / np.timedelta64(1, 'h')
Out[9]: 
0     1824
1     -360
2      552
3     1824
4     1824
5     1824
6     1824
7     1392
8     8400
9      360
10    1824
11    -384
Name: 1, dtype: float64

You're seeing that result with apply because a single element is boxed in a Timedelta when accessed (which has extra properties), but the underlying storage is a np.timedelta64 array, which doesn't.

@jreback
Copy link
Contributor

jreback commented Oct 17, 2015

In [5]: s = Series(pd.timedelta_range('1 day 1 s',periods=5,freq='h'))

In [6]: s
Out[6]: 
0   1 days 00:00:01
1   1 days 01:00:01
2   1 days 02:00:01
3   1 days 03:00:01
4   1 days 04:00:01
dtype: timedelta64[ns]

In [7]: s.dt.components
Out[7]: 
   days  hours  minutes  seconds  milliseconds  microseconds  nanoseconds
0     1      0        0        1             0             0            0
1     1      1        0        1             0             0            0
2     1      2        0        1             0             0            0
3     1      3        0        1             0             0            0
4     1      4        0        1             0             0            0

In [8]: s.dt.
s.dt.components      s.dt.days            s.dt.freq            s.dt.microseconds    s.dt.nanoseconds     s.dt.seconds         s.dt.to_pytimedelta  s.dt.total_seconds   

@amelio-vazquez-reina the reason we don't support hour/minutes is for compatibility to datetime.timedelta and to make it slightly less confusing.

datetime.timedelta give you days,seconds,microseconds which are the TOTAL amount (which IMHO is actually confusing, but that is what the API is).

.components will give you the 'displayed' values (e.g. the components of the timedeltas), which you can then access.

so

s.apply(....) should actually box these into Timedelta objects (and not just leave them as np.timedelta64), as we do similarly for .apply with a datetime64[ns]

In [9]: s.apply(lambda x: type(x))
Out[9]: 
0    <type 'numpy.timedelta64'>
1    <type 'numpy.timedelta64'>
2    <type 'numpy.timedelta64'>
3    <type 'numpy.timedelta64'>
4    <type 'numpy.timedelta64'>
dtype: object

In [10]: Series(pd.date_range('20130101',periods=3)).apply(lambda x: type(x))
Out[10]: 
0    <class 'pandas.tslib.Timestamp'>
1    <class 'pandas.tslib.Timestamp'>
2    <class 'pandas.tslib.Timestamp'>
dtype: object

So this is a bug here
should be something like what is happening in __iter__ where the needs_i8_conversion and i8_boxer is called. I am going to repurpose this issue.

pull-requests welcome!

@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timedelta Timedelta data type labels Oct 17, 2015
@jreback jreback added this to the 0.17.1 milestone Oct 17, 2015
@jreback jreback changed the title Operations with Series holding Timedeltas BUG: boxing Timedeltas on .apply Oct 17, 2015
@jreback jreback modified the milestones: Next Major Release, 0.17.1 Nov 13, 2015
@jreback jreback modified the milestones: 0.18.0, Next Major Release Dec 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants