Skip to content

ENH: Enable rolling.apply on custom function that requires multiple columns of data frame #33695

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bbkaran opened this issue Apr 21, 2020 · 1 comment

Comments

@bbkaran
Copy link

bbkaran commented Apr 21, 2020

Is your feature request related to a problem?

Yes.

I wish I could use pandas to:

create an additional column 'newc' of my dataframe df as df['newc'] through rolling.apply on df['cond'] with a custom function. The custom function requires two columns of df.

Describe the solution you'd like

I prefer the solution to look something like:

df['newc'] = df['cond'].rolling(4).apply(T_correction, args = (df['temp'].rolling(4)))

where

>>> df.head()
                       temp   cond
ts
2018-06-01 00:00:00  51.908  27.83
2018-06-01 00:05:00  52.144  27.83
2018-06-01 00:10:00  51.880  27.83
2018-06-01 00:15:00  52.001  27.83
2018-06-01 00:20:00  51.835  27.83

def T_correction(df, d):
    df = pd.DataFrame(data = df)
    df.columns = ['cond']
    df['temp'] = d
    X = df.drop(['cond'], axis = 1)    # X features: temp

    X = sm.add_constant(X)             # add intercept
    lmodel = sm.OLS(df.cond, X)        # fit cond = a + b*temp
    results = lmodel.fit()             #
    Op = results.predict(X)            # derive 'cond' as explained by temp
    Tc1 = df.cond - Op                 # remove the linear influence

#---conditional correction --------------------------------------
    Tc = np.where(df.temp > (np.mean(df.temp) + 0.5*np.std(df.temp)), df.cond, Tc1)
    return Tc[-1]     # returning the last value

I understand it may be related to the issue:

raise NotImplementedError('See issue #11704 {url}'.format(url=url))
NotImplementedError: See issue #11704 https://github.com/pandas-dev/pandas/issues/11704

This may not have reached high enough importance for someone to look into it. Or there may be other ways we could achieve this. I am not experienced enough with pandas code to start looking into myself but I am happy try if some guidance is available.

API breaking implications

No changes in the way we call the API

Describe alternatives you've considered

Ended up using loop - much slower

@bbkaran bbkaran added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 21, 2020
@jreback
Copy link
Contributor

jreback commented Apr 21, 2020

duplicate of #15095

@jreback jreback closed this as completed Apr 21, 2020
@bashtage bashtage removed the Needs Triage Issue that has not been reviewed by a pandas team member label Aug 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants