Skip to content

Unable to pass additional arguments to resample().apply() #22261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 22, 2018

Conversation

discort
Copy link
Contributor

@discort discort commented Aug 9, 2018

There is flake8 warning ^ but it is not related with my changes.

@discort
Copy link
Contributor Author

discort commented Aug 9, 2018

I considered this comment at first before started the implementation. The reason why I didn't do like that is a compatibility with Resampler.apply docs.
Docs says us that positional args will be pass into applied function.

@@ -852,7 +854,7 @@ def __init__(self, obj, *args, **kwargs):
self._groupby.grouper.mutated = True
self.groupby = copy.copy(parent.groupby)

def _apply(self, f, **kwargs):
def _apply(self, f, *args, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*args is unused now, which seems incorrect. Shouldn't it be passed through?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger
I added it for a compatibility and TestResamplerGrouper.test_apply failed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test that the proper behavior is happening now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger

Could you please clarify which test did you mean?

Anyway, at this moment, _GroupByMixin and Resampler have different arguments in _groupby_and_aggregate, so I added args for a compatibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be hit by df.groupby(...).resample.

In [1]: import pandas as pd

In [2]: def f(x, arg):
   ...:     assert arg == 0
   ...:     return arg
   ...:
   ...:

In [3]: df = pd.DataFrame({"A": 1, "B": 2}, index=pd.date_range('2017', periods=10))

In [4]: df.groupby("A").resample("D").agg(f, 0)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/sandbox/pandas/pandas/core/groupby/generic.py in aggregate(self, func_or_funcs, *args, **kwargs)
    774             try:
--> 775                 return self._python_agg_general(func_or_funcs, *args, **kwargs)
    776             except Exception:

~/sandbox/pandas/pandas/core/groupby/groupby.py in _python_agg_general(self, func, *args, **kwargs)
    839         if len(output) == 0:
--> 840             return self._python_apply_general(f)
    841

~/sandbox/pandas/pandas/core/groupby/groupby.py in _python_apply_general(self, f)
    701         keys, values, mutated = self.grouper.apply(f, self._selected_obj,
--> 702                                                    self.axis)
    703

~/sandbox/pandas/pandas/core/groupby/ops.py in apply(self, f, data, axis)
    197             group_axes = _get_axes(group)
--> 198             res = f(group)
    199             if not _is_indexed_like(res, group_axes):

~/sandbox/pandas/pandas/core/groupby/groupby.py in <lambda>(x)
    827         func = self._is_builtin_func(func)
--> 828         f = lambda x: func(x, *args, **kwargs)
    829

TypeError: f() missing 1 required positional argument: 'arg'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
~/sandbox/pandas/pandas/core/resample.py in _groupby_and_aggregate(self, how, grouper, *args, **kwargs)
    325                 # Check if the function is reducing or not.
--> 326                 result = grouped._aggregate_item_by_item(how, *args, **kwargs)
    327             else:

~/sandbox/pandas/pandas/core/groupby/generic.py in _aggregate_item_by_item(self, func, *args, **kwargs)
    279             if not len(result_columns) and errors is not None:
--> 280                 raise errors
    281

~/sandbox/pandas/pandas/core/groupby/generic.py in _aggregate_item_by_item(self, func, *args, **kwargs)
    264                 result[item] = self._try_cast(
--> 265                     colg.aggregate(func, *args, **kwargs), data)
    266             except ValueError:

~/sandbox/pandas/pandas/core/groupby/generic.py in aggregate(self, func_or_funcs, *args, **kwargs)
    776             except Exception:
--> 777                 result = self._aggregate_named(func_or_funcs, *args, **kwargs)
    778

~/sandbox/pandas/pandas/core/groupby/generic.py in _aggregate_named(self, func, *args, **kwargs)
    904             group.name = name
--> 905             output = func(group, *args, **kwargs)
    906             if isinstance(output, (Series, Index, np.ndarray)):

TypeError: f() missing 1 required positional argument: 'arg'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
~/sandbox/pandas/pandas/core/groupby/groupby.py in apply(self, func, *args, **kwargs)
    683             try:
--> 684                 result = self._python_apply_general(f)
    685             except Exception:

~/sandbox/pandas/pandas/core/groupby/groupby.py in _python_apply_general(self, f)
    701         keys, values, mutated = self.grouper.apply(f, self._selected_obj,
--> 702                                                    self.axis)
    703

~/sandbox/pandas/pandas/core/groupby/ops.py in apply(self, f, data, axis)
    197             group_axes = _get_axes(group)
--> 198             res = f(group)
    199             if not _is_indexed_like(res, group_axes):

TypeError: f() missing 1 required positional argument: 'arg'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
~/sandbox/pandas/pandas/core/groupby/groupby.py in apply(self, func, *args, **kwargs)
    683             try:
--> 684                 result = self._python_apply_general(f)
    685             except Exception:

~/sandbox/pandas/pandas/core/groupby/groupby.py in _python_apply_general(self, f)
    701         keys, values, mutated = self.grouper.apply(f, self._selected_obj,
--> 702                                                    self.axis)
    703

~/sandbox/pandas/pandas/core/groupby/ops.py in apply(self, f, data, axis)
    197             group_axes = _get_axes(group)
--> 198             res = f(group)
    199             if not _is_indexed_like(res, group_axes):

~/sandbox/pandas/pandas/core/resample.py in func(x)
    869
--> 870             return x.apply(f, **kwargs)
    871

~/sandbox/pandas/pandas/core/resample.py in aggregate(self, func, *args, **kwargs)
    246                                                  *args,
--> 247                                                  **kwargs)
    248

~/sandbox/pandas/pandas/core/resample.py in _groupby_and_aggregate(self, how, grouper, *args, **kwargs)
    332             # try to evaluate
--> 333             result = grouped.apply(how, *args, **kwargs)
    334

~/sandbox/pandas/pandas/core/groupby/groupby.py in apply(self, func, *args, **kwargs)
    695                 with _group_selection_context(self):
--> 696                     return self._python_apply_general(f)
    697

~/sandbox/pandas/pandas/core/groupby/groupby.py in _python_apply_general(self, f)
    701         keys, values, mutated = self.grouper.apply(f, self._selected_obj,
--> 702                                                    self.axis)
    703

~/sandbox/pandas/pandas/core/groupby/ops.py in apply(self, f, data, axis)
    197             group_axes = _get_axes(group)
--> 198             res = f(group)
    199             if not _is_indexed_like(res, group_axes):

TypeError: f() missing 1 required positional argument: 'arg'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
~/sandbox/pandas/pandas/core/resample.py in _groupby_and_aggregate(self, how, grouper, *args, **kwargs)
    325                 # Check if the function is reducing or not.
--> 326                 result = grouped._aggregate_item_by_item(how, *args, **kwargs)
    327             else:

~/sandbox/pandas/pandas/core/groupby/generic.py in _aggregate_item_by_item(self, func, *args, **kwargs)
    279             if not len(result_columns) and errors is not None:
--> 280                 raise errors
    281

~/sandbox/pandas/pandas/core/groupby/generic.py in _aggregate_item_by_item(self, func, *args, **kwargs)
    264                 result[item] = self._try_cast(
--> 265                     colg.aggregate(func, *args, **kwargs), data)
    266             except ValueError:

~/sandbox/pandas/pandas/core/groupby/generic.py in aggregate(self, func_or_funcs, *args, **kwargs)
    776             except Exception:
--> 777                 result = self._aggregate_named(func_or_funcs, *args, **kwargs)
    778

~/sandbox/pandas/pandas/core/groupby/generic.py in _aggregate_named(self, func, *args, **kwargs)
    904             group.name = name
--> 905             output = func(group, *args, **kwargs)
    906             if isinstance(output, (Series, Index, np.ndarray)):

TypeError: f() missing 1 required positional argument: 'arg'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
~/sandbox/pandas/pandas/core/groupby/groupby.py in apply(self, func, *args, **kwargs)
    683             try:
--> 684                 result = self._python_apply_general(f)
    685             except Exception:

~/sandbox/pandas/pandas/core/groupby/groupby.py in _python_apply_general(self, f)
    701         keys, values, mutated = self.grouper.apply(f, self._selected_obj,
--> 702                                                    self.axis)
    703

~/sandbox/pandas/pandas/core/groupby/ops.py in apply(self, f, data, axis)
    197             group_axes = _get_axes(group)
--> 198             res = f(group)
    199             if not _is_indexed_like(res, group_axes):

TypeError: f() missing 1 required positional argument: 'arg'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-4-9cc8965c7cbe> in <module>()
----> 1 df.groupby("A").resample("D").agg(f, 0)

~/sandbox/pandas/pandas/core/resample.py in aggregate(self, func, *args, **kwargs)
    245                                                  grouper,
    246                                                  *args,
--> 247                                                  **kwargs)
    248
    249         result = self._apply_loffset(result)

~/sandbox/pandas/pandas/core/resample.py in _apply(self, f, *args, **kwargs)
    870             return x.apply(f, **kwargs)
    871
--> 872         result = self._groupby.apply(func)
    873         return self._wrap_result(result)
    874

~/sandbox/pandas/pandas/core/groupby/groupby.py in apply(self, func, *args, **kwargs)
    694
    695                 with _group_selection_context(self):
--> 696                     return self._python_apply_general(f)
    697
    698         return result

~/sandbox/pandas/pandas/core/groupby/groupby.py in _python_apply_general(self, f)
    700     def _python_apply_general(self, f):
    701         keys, values, mutated = self.grouper.apply(f, self._selected_obj,
--> 702                                                    self.axis)
    703
    704         return self._wrap_applied_output(

~/sandbox/pandas/pandas/core/groupby/ops.py in apply(self, f, data, axis)
    196             # group might be modified
    197             group_axes = _get_axes(group)
--> 198             res = f(group)
    199             if not _is_indexed_like(res, group_axes):
    200                 mutated = True

~/sandbox/pandas/pandas/core/resample.py in func(x)
    868                 return getattr(x, f)(**kwargs)
    869
--> 870             return x.apply(f, **kwargs)
    871
    872         result = self._groupby.apply(func)

~/sandbox/pandas/pandas/core/resample.py in aggregate(self, func, *args, **kwargs)
    245                                                  grouper,
    246                                                  *args,
--> 247                                                  **kwargs)
    248
    249         result = self._apply_loffset(result)

~/sandbox/pandas/pandas/core/resample.py in _groupby_and_aggregate(self, how, grouper, *args, **kwargs)
    331             # we have a non-reducing function
    332             # try to evaluate
--> 333             result = grouped.apply(how, *args, **kwargs)
    334
    335         result = self._apply_loffset(result)

~/sandbox/pandas/pandas/core/groupby/groupby.py in apply(self, func, *args, **kwargs)
    694
    695                 with _group_selection_context(self):
--> 696                     return self._python_apply_general(f)
    697
    698         return result

~/sandbox/pandas/pandas/core/groupby/groupby.py in _python_apply_general(self, f)
    700     def _python_apply_general(self, f):
    701         keys, values, mutated = self.grouper.apply(f, self._selected_obj,
--> 702                                                    self.axis)
    703
    704         return self._wrap_applied_output(

~/sandbox/pandas/pandas/core/groupby/ops.py in apply(self, f, data, axis)
    196             # group might be modified
    197             group_axes = _get_axes(group)
--> 198             res = f(group)
    199             if not _is_indexed_like(res, group_axes):
    200                 mutated = True

TypeError: f() missing 1 required positional argument: 'arg'

So IIUC, we want to be passing through *args here.

@@ -2165,6 +2165,14 @@ def test_resample_datetime_values(self):
res = df['timestamp'].resample('2D').first()
tm.assert_series_equal(res, exp)

def test_resample_apply_with_additional_args(self):
def f(data, add_arg):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better test would be to return add_arg, to ensure that the proper value is passed through.

@@ -2165,6 +2165,14 @@ def test_resample_datetime_values(self):
res = df['timestamp'].resample('2D').first()
tm.assert_series_equal(res, exp)

def test_resample_apply_with_additional_args(self):
def f(data, add_arg):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment referencing the github issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add this issue number as a comment

@@ -239,7 +239,9 @@ def aggregate(self, arg, *args, **kwargs):
self._set_binner()
result, how = self._aggregate(arg, *args, **kwargs)
if result is None:
grouper = kwargs.get('grouper', None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the user passes a function like def f(arr, grouper)? Does this raise? If so, can we provide a good error message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct. Is it okay to pass None like?:

self._groupby_and_aggregate(arg, None, *args, **kwargs)

@codecov
Copy link

codecov bot commented Aug 9, 2018

Codecov Report

Merging #22261 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22261      +/-   ##
==========================================
- Coverage   92.05%   92.05%   -0.01%     
==========================================
  Files         169      169              
  Lines       50733    50735       +2     
==========================================
+ Hits        46702    46703       +1     
- Misses       4031     4032       +1
Flag Coverage Δ
#multiple 90.46% <100%> (-0.01%) ⬇️
#single 42.24% <16.66%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/resample.py 96.09% <100%> (+0.01%) ⬆️
pandas/util/_depr_module.py 65.11% <0%> (-2.33%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4f11d1a...547d2a5. Read the comment docs.

@gfyoung gfyoung added Bug Groupby Resample resample method labels Aug 9, 2018
@@ -240,6 +240,7 @@ def aggregate(self, arg, *args, **kwargs):
result, how = self._aggregate(arg, *args, **kwargs)
if result is None:
result = self._groupby_and_aggregate(arg,
None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why you need this None argument?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gfyoung

Using test scenario, multiplier arg will be passed to _groupby_and_aggregate as grouper arg if we omit None. So it's better to pass None explicitly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you then pass grouper as a kwarg

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may need to rename this arg itself to avoid a user pass ing grouper= (unlikely though)

Copy link
Contributor Author

@discort discort Aug 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you then pass grouper as a kwarg

I can't.

TypeError: _groupby_and_aggregate() got multiple values for argument 'grouper'

The workaround for this will be something like that:

grouper = None
result = self._groupby_and_aggregate(arg,
                                     grouper,
                                     *args,
                                     **kwargs)

we may need to rename this arg itself to avoid a user pass ing grouper= (unlikely though)

At this moment, there may be TypeError if a user pass grouper or how args to Resampler.apply or even Series.apply:

TypeError: _groupby_and_aggregate() got multiple values for argument 'how'

I can tackle it within another issue.

@jreback

return np.mean(data) * add_arg

multiplier = 10
result = self.series.resample('D').apply(f, multiplier)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test both as an arg and as a kwargs

@@ -34,6 +34,7 @@ Bug Fixes
**Groupby/Resample/Rolling**

- Bug in :meth:`DataFrame.resample` when resampling ``NaT`` in ``TimeDeltaIndex`` (:issue:`13223`).
- Bug in :meth:`Resampler.apply` when passing an additional argument to applied func (:issue:`14615`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to 0.24.0

@@ -240,6 +240,7 @@ def aggregate(self, arg, *args, **kwargs):
result, how = self._aggregate(arg, *args, **kwargs)
if result is None:
result = self._groupby_and_aggregate(arg,
None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you then pass grouper as a kwarg

@@ -240,6 +240,7 @@ def aggregate(self, arg, *args, **kwargs):
result, how = self._aggregate(arg, *args, **kwargs)
if result is None:
result = self._groupby_and_aggregate(arg,
None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may need to rename this arg itself to avoid a user pass ing grouper= (unlikely though)

@@ -650,7 +650,7 @@ Groupby/Resample/Rolling
``SeriesGroupBy`` when the grouping variable only contains NaNs and numpy version < 1.13 (:issue:`21956`).
- Multiple bugs in :func:`pandas.core.Rolling.min` with ``closed='left'` and a
datetime-like index leading to incorrect results and also segfault. (:issue:`21704`)
-
- Bug in :meth:`Resampler.apply` when passing an additional argument to applied func (:issue:`14615`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you clarify that this is postiional args

result = self._groupby_and_aggregate(arg,
grouper,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would pass both how and grouper as kwargs, but you can only do that in PY3, ok
add ad how=arg and pass how to make this more clear then (positionally as the first arg, just where arg is now)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and actually rename arg to func, which is what it is called elsewhere

@discort
Copy link
Contributor Author

discort commented Aug 11, 2018

@jreback

@discort
Copy link
Contributor Author

discort commented Aug 15, 2018

@TomAugspurger
@gfyoung

@gfyoung
Copy link
Member

gfyoung commented Aug 15, 2018

Ping @jreback @TomAugspurger : Could you have a look?

@pep8speaks
Copy link

pep8speaks commented Aug 20, 2018

Hello @discort! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 20, 2018 at 18:45 Hours UTC

@discort
Copy link
Contributor Author

discort commented Aug 20, 2018

@TomAugspurger
Fixed within 66550b1

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment, ping on green.

@@ -2165,6 +2165,14 @@ def test_resample_datetime_values(self):
res = df['timestamp'].resample('2D').first()
tm.assert_series_equal(res, exp)

def test_resample_apply_with_additional_args(self):
def f(data, add_arg):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add this issue number as a comment

@jreback jreback added this to the 0.24.0 milestone Aug 20, 2018
@discort
Copy link
Contributor Author

discort commented Aug 22, 2018

@jreback

@TomAugspurger TomAugspurger merged commit 27e4141 into pandas-dev:master Aug 22, 2018
@TomAugspurger
Copy link
Contributor

Thanks!

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to pass additional arguments to resample().apply()
5 participants