-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
issue when shifting with Timedelta in a groupby #20492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think
|
@chris-b1 thanks yes, I saw that but you cannot directly assign this data to the dataframe
|
You can do some reshaping and remerge the result of the
|
@mroeschke thanks buddy, yes I ended up doing something very much like that :) I wonder if this behavior is expected or not |
My $.02 is that if you wanted dates to be sequential by day then you should be reindexing and filling the group before any groupby. So something along the lines of: >>> dt_rng = pd.date_range(data.index.min(), data.index.max())
>>> data = data.reindex(rng)
>>> data['group'] = data['group'].ffill()
>>> data.groupby('group')['myvalue'].transform(lambda x: x-x.shift())
2016-06-06 NaN
2016-06-07 NaN
2016-06-08 NaN
2016-06-09 1.0
2016-06-10 NaN
2016-06-11 NaN
2016-06-12 NaN
2016-06-13 1.0
Freq: D, Name: myvalue, dtype: float64 Probably a little more verbose than you want to be and gives a different result than what you have, but seems the most "pandonic" to me |
lots of good answers here. I suppose adding this example to the doc-string (or maybe in the docs might be easier) could be informative. Going to repurpose this issue for that enhancement. |
Hi @jeffreback thanks! but what do you mean by shift does not care whether and index is ordered or not? Using timedelta works with shift if I am not mistaken. essentially, I ended up writing a function that is very similar to what @chris-b1 is doing. I hope all is good |
I said
it doesn't care whether its complete, just ordered. |
I will add this example to the documentation |
PR here - #21039 |
Hello guys, as the original OP I would be happy to contribute here! From what I see above, you want to get rid of the pd.Timedelta shift? But that is the essence of the question! and the solution proposed by @mroeschke in #20492 works like a charm! Here is a variant
Why dont you just use that in the docs? Pretty nice IMHO |
@randomgambit the OP example now doesn't raise, can you try it and see if the result it gives is what you expect |
Hello the awesome Pandas team!
Consider the example below
Now I need to compute the difference between the current value of
myvalue
and its lagged value, where by lagged I actually mean lagged by 1 day (if possible).So this returns a result, but its not what I need
This is what I need, but it does not work
Any ideas? Is this a bug? I think I am using the correct pandonic syntax here.
Thanks!
The text was updated successfully, but these errors were encountered: