Skip to content

BUG: Positional Arguments Passed as a Keyword Argument in Custom Rolling Aggregation #34605

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
daskol opened this issue Jun 5, 2020 · 6 comments · Fixed by #34705
Closed
3 tasks done
Assignees
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Window rolling, ewma, expanding
Milestone

Comments

@daskol
Copy link

daskol commented Jun 5, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Problem description

According to docstring of Rolling.apply arguments args and kwargs allow to pass to a custom function positional and keyword arguments enumerated in kwargs. However, Rolling.apply passes both of them as keyword arguments. In other word, it passes keyword args consisting of "positional" arguments. This is definitely buggy behaviour.

The reason is that a closure function is built in a wrong way. The issue states for cython backend as well as numba backend. The possible should looks like the following.

diff --git a/pandas/core/window/rolling.py b/pandas/core/window/rolling.py
index a01a753e8..32360848a 100644
--- a/pandas/core/window/rolling.py
+++ b/pandas/core/window/rolling.py
@@ -1313,11 +1313,11 @@ class _Rolling_and_Expanding(_Rolling):
 
         window_func = partial(
             self._get_cython_func_type("roll_generic"),
-            args=args,
-            kwargs=kwargs,
+            *args,
             raw=raw,
             offset=offset,
             func=func,
+            **kwargs,
         )
 
         def apply_func(values, begin, end, min_periods, raw=raw):
@daskol daskol added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 5, 2020
@mroeschke
Copy link
Member

mroeschke commented Jun 5, 2020

Could you give an example of the buggy behavior? The API doesn't have a traditional *args signature, but it looks like the args are being passed to the function:

In [9]: def f(x, *args):
   ...:     return sum(args) + sum(x)
   ...:

In [10]: pd.Series(range(5))
Out[10]:
0    0
1    1
2    2
3    3
4    4
dtype: int64

In [12]: pd.Series(range(5)).rolling(1).apply(f, args=(1,))
Out[12]:
0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
dtype: float64

@simonjayhawkins simonjayhawkins added Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member Bug labels Jun 6, 2020
@daskol
Copy link
Author

daskol commented Jun 8, 2020

Sorry, I was mistaken in the previous investigation. The issue is more intricate than it seems at a glance.

def foo(*args, **kwargs):
    assert len(args) == 2, 'Too a few pos argumnets.'
    return float('nan')


df = pd.DataFrame(data={'X': range(5)},
                  index=[0, 0, 1, 1, 1])

df.rolling(1) \
    .apply(foo, raw=True, args=(42,))  # ok

df.groupby(level=0) \
    .rolling(1) \
    .apply(foo, raw=True, args=(42,))  # failed

The second .apply() is applied to an instance of pandas.core.window.rolling.RollingGroupby but pandas.core.window.rolling.Rolling. The major difference of these rolling-objects is that pandas.core.window.rolling.RollingGroupby has another method resolution order due to pandas.core.window.common.WindowGroupByMixin object.

So, this snippet elucidates where buggy behavour is from. Method _apply() invokes either common agg functions (like sum() or mean()) or original not curried custom agg function. In the example above we have func arg is _Rolling_and_Expanding._generate_cython_apply_func.<locals>.apply_func and name arg is just foo.

class WindowGroupByMixin(GroupByMixin):
     def _apply(
        self,
        func: Callable,
        ....
        name: Optional[str] = None,
        ....
    ):
        def f(x, name=name, *args):
            x = self._shallow_copy(x)

            if isinstance(name, str):
                return getattr(x, name)(*args, **kwargs)

            return x.apply(name, *args, **kwargs)

@mroeschke
Copy link
Member

It appears this is fixed on master now, and will be fixed in the pandas 1.1 release. I think #34052 inadvertently fixed this. It'd be good to add a test to confirm.

In [4]: df.groupby(level=0) \
   ...:     .rolling(1) \
   ...:     .apply(foo, raw=True, args=(42,))
Out[4]:
      X
0 0 NaN
  0 NaN
1 1 NaN
  1 NaN
  1 NaN

In [5]: pd.__version__
Out[5]: '1.1.0.dev0+1800.g43a463d22'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Needs Info Clarification about behavior needed to assess issue labels Jun 9, 2020
@gglanzani
Copy link
Contributor

take

@gglanzani
Copy link
Contributor

Hi, I'm preparing a tutorial on pytest for PyData Amsterdam 2020.

The tutorial — which is part of a longer serie — should prepare them to start contributing on pandas.

I thought this would be a good issue to illustrate how people can get started with pytest. I plan to work these days until the 17/06 (day of the tutorial).

I should be able to create a pull request for this, so I'm not taking it for the sake of the tutorial.

@daskol
Copy link
Author

daskol commented Jun 10, 2020

The tutorial — which is part of a longer serie — should prepare them to start contributing on pandas.

I thought this would be a good issue to illustrate how people can get started with pytest. I plan to work these days until the 17/06 (day of the tutorial).

@gglanzani This is great! Thanks for helping and good luck with your talk.

@jreback jreback added this to the 1.1 milestone Jun 10, 2020
@jreback jreback added the Window rolling, ewma, expanding label Jun 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants