-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
forward rolling functions #6772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Is anybody still working on it ? |
This is one I actually miss quite often. With datetime index you can't just reverse the dataframe, since you get the error: Here's an example:
Output
Tried
Solution to this issue for now: |
If you have time-series on a fix interval, let's say every second, you can do shifting - Rolling: shifting(-30) -> 30 seconds back and then rolling to find the mean, max, min... on that window. df_rol['.......'] = df['........'].shift(-1,freq=dt).rolling(dt).min() If you don't have a fix interval try Truncate (truncate() is gonna ask you to sort_index()): df['....'] = df.index.map(
lambda x: df.truncate(before=x, after=x + pd.Timedelta(dt,unit='s'))['....'].max()) With truncate, the computational time is exponential as you have more rows, Let's say 2min for 1 million rows and 10 min for 2 millions. You can do the slicing based on numeric index that sometimes is faster than compare index datetimes: init = get_loc(date)
end = get_loc(date+ pd.Timedelta(dt,unit='s'),method='nearest')
df.iloc[init:end]['column'].max() Looping something like this (not optimum) but probably useful for any custom rolling (try iterrows(), itertuples()): def window(deltaseconds,df):
colmax = '{}maxPrice'.format(deltaseconds)
colmin = '{}minPrice'.format(deltaseconds)
df_max = pd.DataFrame(columns = [colmax], index=list(range(0,df.shape[0])))
df_min = pd.DataFrame(columns = [colmin], index=list(range(0,df.shape[0])))
for i in range(0,df.shape[0]):
print(i)
init = df.index[i]
end = init + pd.Timedelta(deltaseconds,unit='s')
maxPrice = 0
minPrice = 0
for j in range(i,df.shape[0]):
if df.index[i]<=end:
if df.iloc[j]['LastPrice']>maxPrice:
maxPrice = df.iloc[j]['LastPrice']
elif df.iloc[j]['LastPrice']<minPrice:
minPrice = df.iloc[j]['LastPrice']<min
else:
break
df_max.loc[i] = maxPrice - df.iloc[i]['LastPrice']
df_min.loc[i] = minPrice - df.iloc[i]['LastPrice']
print("j:" + str(j))
return pd.concat([df,df_max,df_min],axis=1,join_axes=[df.index]) Using Dask with map_overlap(lambda x:...) Working on billions of rows, even Dask or numpy with numba in any of the above solution is slow. Would be really useful to have a forward rolling() in a function. |
Should be resolved by: #28297 |
Yup looks like #28297 addresses this |
This issue should not be closed since #28297 (what seems to address this bug) isn't closed. |
A possible solution is to shift the rolling window aggregation |
https://groups.google.com/forum/#!msg/pydata/LLPnzMfDlSg/OpK-mHJzJwYJ
http://stackoverflow.com/questions/22820292/how-to-use-pandas-rolling-functions-on-a-forward-looking-basis
Applying to reverse Series and reversing could work on all (?) rolling functions, I think sometimes can just do on values array, a kwarg would be nice.
(OP says that this doesn't actually work in some cases: http://stackoverflow.com/questions/22820292/how-to-use-pandas-rolling-functions-on-a-forward-looking-basis/22820689?noredirect=1#comment34813442_22820689... ?)
The text was updated successfully, but these errors were encountered: