Skip to content

BUG/ENH: timedelta fillna should work intuitively #3371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Apr 15, 2013 · 0 comments · Fixed by #4684
Closed

BUG/ENH: timedelta fillna should work intuitively #3371

jreback opened this issue Apr 15, 2013 · 0 comments · Fixed by #4684
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Enhancement Groupby Timedelta Timedelta data type
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Apr 15, 2013

http://stackoverflow.com/questions/16023584/pandas-time-delta-from-grouped-neighbors

setup

txt = '''ID,DATE
002691c9cec109e64558848f1358ac16,2003-08-13 00:00:00
002691c9cec109e64558848f1358ac16,2003-08-13 00:00:00
0088f218a1f00e0fe1b94919dc68ec33,2006-05-07 00:00:00
0088f218a1f00e0fe1b94919dc68ec33,2006-06-03 00:00:00
00d34668025906d55ae2e529615f530a,2006-03-09 00:00:00
00d34668025906d55ae2e529615f530a,2006-03-09 00:00:00
0101d3286dfbd58642a7527ecbddb92e,2007-10-13 00:00:00
0101d3286dfbd58642a7527ecbddb92e,2007-10-27 00:00:00
0103bd73af66e5a44f7867c0bb2203cc,2001-02-01 00:00:00
0103bd73af66e5a44f7867c0bb2203cc,2008-01-20 00:00:00
'''
df = pandas.read_csv(StringIO.StringIO(txt))
df = df.sort('DATE')
df.DATE = pandas.to_datetime(df.DATE)
grouped = df.groupby('A')
  1. this should probably work? (fill in a 0 timedelta)
    grouped.apply(lambda g: g['DATE']-g['DATE'].shift()).fillna(0)

  2. this is returning timedelta64[us], weird numpy error again

In [34]: grouped.apply(lambda g: g['DATE']-g['DATE'].shift())
Out[34]: 
8                   NaT
0                   NaT
1              00:00:00
4                   NaT
5              00:00:00
2                   NaT
3     27 days, 00:00:00
6                   NaT
7     14 days, 00:00:00
9   2544 days, 00:00:00
Name: DATE, dtype: timedelta64[us]

This DOES work

In [57]: df['X_SEQUENCE_GAP'].sort_index().astype('timedelta64[ns]').fillna(0)
Out[57]: 
0              00:00:00
1              00:00:00
2              00:00:00
3     27 days, 00:00:00
4              00:00:00
5              00:00:00
6              00:00:00
7     14 days, 00:00:00
8              00:00:00
9   2544 days, 00:00:00
Name: X_SEQUENCE_GAP, dtype: timedelta64[ns]

This DOES NOT WORK, (need to use a view of i8 first)

In [58]: df['X_SEQUENCE_GAP'].sort_index().astype('timedelta64[ns]').ffill()
ValueError: Invalid dtype for pad_1d [timedelta64[ns]]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Enhancement Groupby Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant