API: Table-wise rolling / expanding / EWM function application #15095

TomAugspurger · 2017-01-10T01:29:46Z

In #11603 (comment) (the main PR implementing the deferred API for rolling / expanding / ewm), we discussed how to specify table-wise applys. Groupby.apply(f) feeds the entire group (all columns) to f. For backwards-compatibility, .rolling(n).apply(f) needed to be column-wise.

#11603 (comment) mentions a possible API like what I added for .style

axis=0: apply to each column independently
axis=1: apply to each row independently
axis=None: apply the supplied function to the entire table

So it'd be df.rolling(n).apply(f, axis=None).
Do people like the axis=0 / 1 / None idiom? Is it obvious enough?

This is prompted by @josef-pkt's post on the mailinglist. Needing a rolling OLS.

An example:

In [2]: import numpy as np
   ...: import pandas as pd
   ...:
   ...: np.random.seed(0)
   ...: df = pd.DataFrame(np.random.randint(0, 10, size=(10, 2)), columns=["A", "B"])
   ...: df
   ...:
Out[2]:
   A  B
0  5  0
1  3  3
2  7  9
3  3  5
4  2  4
5  7  6
6  8  8
7  1  6
8  7  7
9  8  1

For a concrete example, get the table-wise max (this is equivalent to df.rolling(4).max().max(1))

In [10]: df.rolling(4).apply(np.max, axis=None)
Out[10]:
0    NaN
1    NaN
2    NaN
3    9.0
4    9.0
5    9.0
6    8.0
7    8.0
8    8.0
9    8.0
dtype: float64

A real example is something like a rolling OLS:

import statsmodels.api as sm
f = lambda x: sm.OLS.from_formula('A ~ B', data=x).fit()  # wrong, but w/e

df.rolling(5).apply(f, axis=None)

The text was updated successfully, but these errors were encountered:

jreback · 2017-01-10T12:31:49Z

can u put up a simple example with the various options exercised? (e.g. simulate the output)

TomAugspurger · 2017-01-12T23:01:28Z

Updated with an example.

I also changed the suggested API: Before I had

df.rolling(n, axis=None).apply(f)

But really it should be

df.rolling(n).apply(f, axis=None).

The .rolling(axis=.) parameter controls the direction for rolling. The .rolling(...).apply(f, axis=.) parameter controls the axis for function application.

jreback · 2017-01-13T14:23:34Z

@TomAugspurger correct me if I am wrong, but what you really want is for .apply to be passed one of 2 cases.

a single column (now)
the entire table (option)

?
The other functions are only univariate so this doesn't matter.

but apply is pretty generic so we don't know what the user wants (but the original implementation was a single column)

TomAugspurger · 2017-01-13T22:45:42Z

You're correct.

This should make things clear

In [9]: def f(x):
   ...:     print(x)
   ...:     return 0

In [8]: df = pd.DataFrame(np.arange(9).reshape(3, 3))

In [14]: df
Out[14]:
   0  1  2
0  0  1  2
1  3  4  5
2  6  7  8

Currently, and the default in the future, this prints out

In [10]: df.rolling(2).apply(f)
[ 0.  3.]
[ 3.  6.]
[ 1.  4.]
[ 4.  7.]
[ 2.  5.]
[ 5.  8.]

With the new implementation and axis=None, the printed output would be

In [10]: df.rolling(2).apply(f, axis=None)
[[ 0  1, 2],  # first window; 2x3 array
 [ 3, 4, 5]]
[[ 3, 4, 5],  # second window; 2x3 array
  [6, 7, 8]]

jreback · 2017-01-14T02:08:28Z

@TomAugspurger I know you used axis=None this way in .style, but I personally find this a bit confusing.

I think its better to follow our current model, IOW

receive a DataFrame df.rolling(...).apply(...)
receive a Series df.rolling(...).column.apply(...)

is very natural. This would be an API change, though even now I think we pass a ndarray.

another possibilty is to have return_type = 'frame', 'series', 'ndarray' (with a default of None, so that we can make this change easier).

dbivolaru · 2017-03-25T18:15:27Z

I ran into a similar issue with a rolling function that uses OLS internally and needs to return more than one column (eg. the confidence interval).

Would the test cases cover also df.groupby(level=...)['column'].rolling(...).apply(...) and is there a workaround for pre-0.20 versions that would prevent re-calculating the OLS twice ie. for each returned column?

Regarding API, I think the best way it should look like:

def f(narray):
    res = sm.OLS(narray, ...).fit()
    m_min, m_max = res.conf_int(0.05)[0]
    return m_min, m_max

# Single column
df.groupby(level=...)['column'].rolling(...).apply(lambda x: f(x))

def g(exogen, endogen):
    res = sm.OLS(exogen, endogen).fit()
    m_min, m_max = res.conf_int(0.05)[0]
    return m_min, m_max

# Multiple columns
df.groupby(level=...).rolling(...).apply(lambda x: g(x['exogen'], x['endogen']))

jreback · 2017-03-25T18:50:15Z

@dbivolaru

Would the test cases cover also df.groupby(level=...)['column'].rolling(...).apply(...) and is there a workaround for pre-0.20 versions that would prevent re-calculating the OLS twice ie. for each returned column?

This is just an idea. You are welcome to submit a patch for this.

makmanalp · 2018-11-05T17:02:16Z

think its better to follow our current model, IOW
receive a DataFrame df.rolling(...).apply(...)
receive a Series df.rolling(...).column.apply(...)
is very natural. This would be an API change, though even now I think we pass a ndarray.

I definitely agree with this - it fits well with everything else.

So is the idea here that because apply() currently works column-wise and not dataframe-wise on dataframe.rolling.apply(), we're kinda locked in now and don't want to break backwards compat, and we need a new API? Or are we just waiting for a patch and and opportune moment to release?

TomAugspurger · 2018-11-05T17:12:00Z

So is the idea here that because apply() currently works column-wise and not dataframe-wise on dataframe.rolling.apply(), we're kinda locked in now and don't want to break backwards compat, and we need a new API?

That's my opinion. We could maybe do this with a deprecation cycle with keywords.

mroeschke · 2020-10-19T01:08:38Z

2 thoughts here:

I'm not sure if we should stuff this feature in the axis keyword; I think we should add a new parameter as I can see this being a possibility (from Tom's example). Maybe a how=None|'table' argument for None=1D, table=2D

# roll tablewise along rows
In [10]: df.rolling(2).apply(f, axis=0, how='table')
[[ 0  1, 2],  # first window; 2x3 array
 [ 3, 4, 5]]
[[ 3, 4, 5],  # second window; 2x3 array
  [6, 7, 8]]

# roll tablewise along columns
In [10]: df.rolling(2).apply(f, axis=1, how='table')
[[ 0  1,],  # first window; 3x2 array
 [ 3, 4,],
 [ 6, 7,]]
[[ 1  2,],  # second window; 3x2 array
 [ 4, 5,],
 [ 7, 8,]]

Implementation wise, these might be some potential hurdles & complexities to consider:

Currently all windowing aggregations are calculated blockwise. This feature would probably need a dedicated code path that does the calculations over the rows/columns (easier if we eventually remove the block manager)
Currently, data types other than float or int are dropped. There's a consistency argument to align that with table-wide windowing but may render table wide rolling less useful if data is dropped.

mroeschke · 2020-10-30T19:31:31Z

A proposal for the implementation would be:

Add a new keyword method='table'|'column' in the rolling/ewm/expanding method to specify whether we are rolling over a column or the entire object
Requires the engine='numba' keyword to be set in the aggregation function (otherwise, the existing Cython aggregation functions need an overhaul
Table-wise rolling requires a single float dtype
(Mostly important for apply) the output of table-wise rolling will need to be 1 x number of columns for axis=0 and number of rows x 1 for axis=1

e.g.

df.rolling(2, method='table').apply(f, axis=1, engine='numba')

TomAugspurger added the API Design label Jan 10, 2017

TomAugspurger added this to the 0.20.0 milestone Jan 10, 2017

TomAugspurger mentioned this issue Jan 20, 2017

rolling window function with multiple arguments by group #15178

Closed

jreback modified the milestones: 0.20.0, Next Major Release Mar 23, 2017

TomAugspurger mentioned this issue Feb 26, 2018

Multicolumn .expanding() #19885

Closed

leeong05 mentioned this issue May 2, 2018

API/ENH: master issue for pd.rolling_apply #8659

Closed

14 tasks

jreback mentioned this issue May 2, 2018

API：How can I apply a function on rolling DataFrames #20919

Closed

icexelloss mentioned this issue Oct 23, 2019

ENH: Pandas backend doesn't handle udf with two parameters correctly with trailing_window ibis-project/ibis#1998

Closed

jreback mentioned this issue Apr 21, 2020

ENH: Enable rolling.apply on custom function that requires multiple columns of data frame #33695

Closed

DiegoAlbertoTorres mentioned this issue Sep 2, 2020

TRACKER: milestones twosigma/pandas#44

Open

32 tasks

TomAugspurger mentioned this issue Sep 4, 2020

BUG: Unexpected behaviour of rolling with apply on DataFrame #34965

Closed

3 tasks

jreback added the Window rolling, ewma, expanding label Nov 25, 2020

mroeschke mentioned this issue Dec 11, 2020

ENH: Add method argument to rolling constructor to allow table-wise rolling #38417

Merged

5 tasks

jreback modified the milestones: Contributions Welcome, 1.3 Dec 18, 2020

jreback closed this as completed in #38417 Dec 27, 2020

mroeschke mentioned this issue Jan 2, 2021

Feature Request: axis argument in np.nan[sum | mean | std | var | max | min | median] numba/numba#6610

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: Table-wise rolling / expanding / EWM function application #15095

API: Table-wise rolling / expanding / EWM function application #15095

TomAugspurger commented Jan 10, 2017 •

edited

Loading

jreback commented Jan 10, 2017

TomAugspurger commented Jan 12, 2017

jreback commented Jan 13, 2017

TomAugspurger commented Jan 13, 2017 •

edited

Loading

jreback commented Jan 14, 2017

dbivolaru commented Mar 25, 2017 •

edited

Loading

jreback commented Mar 25, 2017

makmanalp commented Nov 5, 2018

TomAugspurger commented Nov 5, 2018

mroeschke commented Oct 19, 2020

mroeschke commented Oct 30, 2020

API: Table-wise rolling / expanding / EWM function application #15095

API: Table-wise rolling / expanding / EWM function application #15095

Comments

TomAugspurger commented Jan 10, 2017 • edited Loading

jreback commented Jan 10, 2017

TomAugspurger commented Jan 12, 2017

jreback commented Jan 13, 2017

TomAugspurger commented Jan 13, 2017 • edited Loading

jreback commented Jan 14, 2017

dbivolaru commented Mar 25, 2017 • edited Loading

jreback commented Mar 25, 2017

makmanalp commented Nov 5, 2018

TomAugspurger commented Nov 5, 2018

mroeschke commented Oct 19, 2020

mroeschke commented Oct 30, 2020

TomAugspurger commented Jan 10, 2017 •

edited

Loading

TomAugspurger commented Jan 13, 2017 •

edited

Loading

dbivolaru commented Mar 25, 2017 •

edited

Loading