Skip to content

issue when shifting with Timedelta in a groupby #20492

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
randomgambit opened this issue Mar 26, 2018 · 12 comments
Open

issue when shifting with Timedelta in a groupby #20492

randomgambit opened this issue Mar 26, 2018 · 12 comments

Comments

@randomgambit
Copy link

randomgambit commented Mar 26, 2018

Hello the awesome Pandas team!

Consider the example below

data = pd.DataFrame({'mydate' : [pd.to_datetime('2016-06-06'),
                                 pd.to_datetime('2016-06-08'),
                                 pd.to_datetime('2016-06-09'),
                                 pd.to_datetime('2016-06-10'),
                                 pd.to_datetime('2016-06-12'),
                                 pd.to_datetime('2016-06-13')],
                     'myvalue' : [1, 2, 3, 4, 5, 6],
                     'group' : ['A', 'A', 'A', 'B', 'B', 'B']})

data.set_index('mydate', inplace = True)
Out[58]: 
           group  myvalue
mydate                   
2016-06-06     A        1
2016-06-08     A        2
2016-06-09     A        3
2016-06-10     B        4
2016-06-12     B        5
2016-06-13     B        6

Now I need to compute the difference between the current value of myvalue and its lagged value, where by lagged I actually mean lagged by 1 day (if possible).

So this returns a result, but its not what I need

data['delta_one'] = data.groupby('group').myvalue.transform(lambda x: x - x.shift(1))

data
Out[56]: 
           group  myvalue  delta_one
mydate                              
2016-06-06     A        1        nan
2016-06-08     A        2     1.0000
2016-06-09     A        3     1.0000
2016-06-10     B        4        nan
2016-06-12     B        5     1.0000
2016-06-13     B        6     1.0000

This is what I need, but it does not work

data['delta_two'] = data.groupby('group').myvalue.transform(lambda x: x - x.shift(1, pd.Timedelta('1 days')))

  File "C:\Users\john\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py", line 120, in __init__
    len(self.mgr_locs)))

ValueError: Wrong number of items passed 4, placement implies 3

Any ideas? Is this a bug? I think I am using the correct pandonic syntax here.

Thanks!

@chris-b1
Copy link
Contributor

I think apply works here?

In [14]: data.groupby('group').myvalue.apply(lambda x: x - x.shift(1, pd.Timedelta('1 days')))
Out[14]: 
group  mydate    
A      2016-06-06    NaN
       2016-06-07    NaN
       2016-06-08    NaN
       2016-06-09    1.0
       2016-06-10    NaN
B      2016-06-10    NaN
       2016-06-11    NaN
       2016-06-12    NaN
       2016-06-13    1.0
       2016-06-14    NaN
Name: myvalue, dtype: float64

@randomgambit
Copy link
Author

@chris-b1 thanks yes, I saw that but you cannot directly assign this data to the dataframe

data['delta_two'] = data.groupby('group').myvalue.apply(lambda x: x - x.shift(1, pd.Timedelta('1 days')))

  File "C:\Users\john\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py", line 2734, in reindexer
    raise TypeError('incompatible index of inserted column '

TypeError: incompatible index of inserted column with frame index

@mroeschke
Copy link
Member

You can do some reshaping and remerge the result of the groupby.apply to your original data

In [5]: res = data.groupby('group').myvalue.apply(lambda x: x - x.shift(1, pd.Timedelta('1 days')))

In [6]: res.name = 'delta_one'

In [7]: data.reset_index().merge(res.reset_index(), how='left', on=['mydate', 'group']).set_index('mydate')
Out[7]:
           group  myvalue  delta_one
mydate
2016-06-06     A        1        NaN
2016-06-08     A        2        NaN
2016-06-09     A        3        1.0
2016-06-10     B        4        NaN
2016-06-12     B        5        NaN
2016-06-13     B        6        1.0

@randomgambit
Copy link
Author

@mroeschke thanks buddy, yes I ended up doing something very much like that :) I wonder if this behavior is expected or not

@WillAyd
Copy link
Member

WillAyd commented Mar 27, 2018

My $.02 is that if you wanted dates to be sequential by day then you should be reindexing and filling the group before any groupby. So something along the lines of:

>>> dt_rng = pd.date_range(data.index.min(), data.index.max())
>>> data = data.reindex(rng)
>>> data['group'] = data['group'].ffill()
>>> data.groupby('group')['myvalue'].transform(lambda x: x-x.shift())
2016-06-06    NaN
2016-06-07    NaN
2016-06-08    NaN
2016-06-09    1.0
2016-06-10    NaN
2016-06-11    NaN
2016-06-12    NaN
2016-06-13    1.0
Freq: D, Name: myvalue, dtype: float64

Probably a little more verbose than you want to be and gives a different result than what you have, but seems the most "pandonic" to me

@jreback
Copy link
Contributor

jreback commented Mar 27, 2018

lots of good answers here. .shift() does care whether an index is complete or not, just that it is ordered. You must resample / reindex-fill yourself.

I suppose adding this example to the doc-string (or maybe in the docs might be easier) could be informative. Going to repurpose this issue for that enhancement.

@jreback jreback added Docs Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Frequency DateOffsets Resample resample method Effort Low good first issue labels Mar 27, 2018
@jreback jreback added this to the Next Major Release milestone Mar 27, 2018
@randomgambit
Copy link
Author

Hi @jeffreback thanks!

but what do you mean by shift does not care whether and index is ordered or not? Using timedelta works with shift if I am not mistaken.

essentially, I ended up writing a function that is very similar to what @chris-b1 is doing. I hope all is good

@jreback
Copy link
Contributor

jreback commented Mar 30, 2018

@randomgambit

I said

does care whether an index is complete or not, just that it is ordered. You must resample / reindex-fill yourself.

it doesn't care whether its complete, just ordered.

@PyJay
Copy link

PyJay commented May 14, 2018

I will add this example to the documentation

@PyJay PyJay mentioned this issue May 14, 2018
4 tasks
@PyJay
Copy link

PyJay commented May 14, 2018

PR here - #21039

@randomgambit
Copy link
Author

randomgambit commented Oct 11, 2018

Hello guys, as the original OP I would be happy to contribute here!

From what I see above, you want to get rid of the pd.Timedelta shift? But that is the essence of the question! and the solution proposed by @mroeschke in #20492 works like a charm!

Here is a variant

df = pd.DataFrame({'mytime' : [pd.to_datetime('2018-01-01 14:34:12'),
                             pd.to_datetime('2018-01-01 14:34:13'),
                             pd.to_datetime('2018-01-01 14:34:15'),
                             pd.to_datetime('2018-01-01 14:34:16'),
                             pd.to_datetime('2018-01-01 14:34:28')],
                    'myvalue' : [1,2,np.NaN,3,1],
                    'mygroup' : ['a','a','a','b','b']})

df.set_index('mytime', inplace = True)

shifted_df =  df.groupby('mygroup').myvalue.apply(lambda x: x - x.shift(1, pd.Timedelta('1 sec')))
shifted_df.name = 'shifted'
df = df.reset_index().merge(shifted_df.reset_index(), on = ['mytime','mygroup'], how = 'left')
 
df
Out[108]: 
               mytime mygroup  myvalue  shifted
0 2018-01-01 14:34:12       a  1.00000      nan
1 2018-01-01 14:34:13       a  2.00000  1.00000
2 2018-01-01 14:34:15       a      nan      nan
3 2018-01-01 14:34:16       b  3.00000      nan
4 2018-01-01 14:34:28       b  1.00000      nan


Why dont you just use that in the docs? Pretty nice IMHO

@mroeschke mroeschke removed Frequency DateOffsets Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Resample resample method labels Jun 19, 2021
@jbrockmendel jbrockmendel added the Timedelta Timedelta data type label Oct 29, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@jbrockmendel
Copy link
Member

@randomgambit the OP example now doesn't raise, can you try it and see if the result it gives is what you expect

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants