Skip to content

Rolling.apply documentation not clear #25656

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
js3711 opened this issue Mar 11, 2019 · 3 comments · Fixed by #25712
Closed

Rolling.apply documentation not clear #25656

js3711 opened this issue Mar 11, 2019 · 3 comments · Fixed by #25712
Labels
Error Reporting Incorrect or improved errors from pandas Usage Question

Comments

@js3711
Copy link

js3711 commented Mar 11, 2019

The current documentation for rolling.apply states:

func : function
Must produce a single value from an ndarray input if raw=True or a Series if raw=False.

This makes it sound like a series consisting of multiple values can be returned by the apply func. This is not true. The following error results:

Pandas v 0.24.1

samples = roll.apply(lambda x: pd.Series([1]), raw=True)
example = pd.Series(np.random.rand(10000,))
example.rolling(100).apply(lambda x: pd.Series([1,2,3,4]), raw=False)

~/anaconda3/envs/pandas/core/series.py in wrapper(self)
     91             return converter(self.iloc[0])
     92         raise TypeError("cannot convert the series to "
---> 93                         "{0}".format(str(converter)))
     94 
     95     wrapper.__name__ = "__{name}__".format(name=converter.__name__)
TypeError: cannot convert the series to <class 'float'>
@WillAyd
Copy link
Member

WillAyd commented Mar 11, 2019

The func argument is documented as follows:

Must produce a single value from an ndarray input if raw=True or a Series if raw=False.

So the problem is that your expression doesn't reduce to a scalar.

What distinction are you trying to make with Series here? This will fail if you replace with a NumPy array as well (though admittedly with a different error message)

@WillAyd WillAyd added Usage Question Error Reporting Incorrect or improved errors from pandas labels Mar 11, 2019
@js3711
Copy link
Author

js3711 commented Mar 13, 2019

Yup, that makes it clearer. It initially read as though multiple values could be returned through the Series

@lycanthropes
Copy link

lycanthropes commented Mar 16, 2019

The func argument is documented as follows:

Must produce a single value from an ndarray input if raw=True or a Series if raw=False.

So the problem is that your expression doesn't reduce to a scalar.

What distinction are you trying to make with Series here? This will fail if you replace with a NumPy array as well (though admittedly with a different error message)

I want to realize some ideas like this:
df.groupby('stock_code').rolling(window=12).apply(wavg,'column1',weight_variable)
this means applying a given weight variable to a rolling column (here it is df.column1). But I can not design a available wavg function .My wavg function is:
def wavg(group, avg_name):
d = group[avg_name]
w = pd.Series(data=[0,0,0.5,0,0,0.5,0,0,0.5,0,0,1])
d=d.reset_index(drop=True)
try:
return (d * w).sum() / w.sum()
except ZeroDivisionError:
return np.nan

But when I run the code : df.groupby('stock_code').rolling(window=12).apply(wavg,'column1',weight_variable), it always return a TypeError: wavg() missing 1 required positional argument: 'avg_name'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Usage Question
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants