-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Unable to pass additional arguments to resample().apply() #22261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I considered this comment at first before started the implementation. The reason why I didn't do like that is a compatibility with Resampler.apply docs. |
pandas/core/resample.py
Outdated
@@ -852,7 +854,7 @@ def __init__(self, obj, *args, **kwargs): | |||
self._groupby.grouper.mutated = True | |||
self.groupby = copy.copy(parent.groupby) | |||
|
|||
def _apply(self, f, **kwargs): | |||
def _apply(self, f, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*args
is unused now, which seems incorrect. Shouldn't it be passed through?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger
I added it for a compatibility and TestResamplerGrouper.test_apply
failed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test that the proper behavior is happening now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please clarify which test did you mean?
Anyway, at this moment, _GroupByMixin
and Resampler
have different arguments in _groupby_and_aggregate
, so I added args
for a compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be hit by df.groupby(...).resample
.
In [1]: import pandas as pd
In [2]: def f(x, arg):
...: assert arg == 0
...: return arg
...:
...:
In [3]: df = pd.DataFrame({"A": 1, "B": 2}, index=pd.date_range('2017', periods=10))
In [4]: df.groupby("A").resample("D").agg(f, 0)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/sandbox/pandas/pandas/core/groupby/generic.py in aggregate(self, func_or_funcs, *args, **kwargs)
774 try:
--> 775 return self._python_agg_general(func_or_funcs, *args, **kwargs)
776 except Exception:
~/sandbox/pandas/pandas/core/groupby/groupby.py in _python_agg_general(self, func, *args, **kwargs)
839 if len(output) == 0:
--> 840 return self._python_apply_general(f)
841
~/sandbox/pandas/pandas/core/groupby/groupby.py in _python_apply_general(self, f)
701 keys, values, mutated = self.grouper.apply(f, self._selected_obj,
--> 702 self.axis)
703
~/sandbox/pandas/pandas/core/groupby/ops.py in apply(self, f, data, axis)
197 group_axes = _get_axes(group)
--> 198 res = f(group)
199 if not _is_indexed_like(res, group_axes):
~/sandbox/pandas/pandas/core/groupby/groupby.py in <lambda>(x)
827 func = self._is_builtin_func(func)
--> 828 f = lambda x: func(x, *args, **kwargs)
829
TypeError: f() missing 1 required positional argument: 'arg'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
~/sandbox/pandas/pandas/core/resample.py in _groupby_and_aggregate(self, how, grouper, *args, **kwargs)
325 # Check if the function is reducing or not.
--> 326 result = grouped._aggregate_item_by_item(how, *args, **kwargs)
327 else:
~/sandbox/pandas/pandas/core/groupby/generic.py in _aggregate_item_by_item(self, func, *args, **kwargs)
279 if not len(result_columns) and errors is not None:
--> 280 raise errors
281
~/sandbox/pandas/pandas/core/groupby/generic.py in _aggregate_item_by_item(self, func, *args, **kwargs)
264 result[item] = self._try_cast(
--> 265 colg.aggregate(func, *args, **kwargs), data)
266 except ValueError:
~/sandbox/pandas/pandas/core/groupby/generic.py in aggregate(self, func_or_funcs, *args, **kwargs)
776 except Exception:
--> 777 result = self._aggregate_named(func_or_funcs, *args, **kwargs)
778
~/sandbox/pandas/pandas/core/groupby/generic.py in _aggregate_named(self, func, *args, **kwargs)
904 group.name = name
--> 905 output = func(group, *args, **kwargs)
906 if isinstance(output, (Series, Index, np.ndarray)):
TypeError: f() missing 1 required positional argument: 'arg'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
~/sandbox/pandas/pandas/core/groupby/groupby.py in apply(self, func, *args, **kwargs)
683 try:
--> 684 result = self._python_apply_general(f)
685 except Exception:
~/sandbox/pandas/pandas/core/groupby/groupby.py in _python_apply_general(self, f)
701 keys, values, mutated = self.grouper.apply(f, self._selected_obj,
--> 702 self.axis)
703
~/sandbox/pandas/pandas/core/groupby/ops.py in apply(self, f, data, axis)
197 group_axes = _get_axes(group)
--> 198 res = f(group)
199 if not _is_indexed_like(res, group_axes):
TypeError: f() missing 1 required positional argument: 'arg'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
~/sandbox/pandas/pandas/core/groupby/groupby.py in apply(self, func, *args, **kwargs)
683 try:
--> 684 result = self._python_apply_general(f)
685 except Exception:
~/sandbox/pandas/pandas/core/groupby/groupby.py in _python_apply_general(self, f)
701 keys, values, mutated = self.grouper.apply(f, self._selected_obj,
--> 702 self.axis)
703
~/sandbox/pandas/pandas/core/groupby/ops.py in apply(self, f, data, axis)
197 group_axes = _get_axes(group)
--> 198 res = f(group)
199 if not _is_indexed_like(res, group_axes):
~/sandbox/pandas/pandas/core/resample.py in func(x)
869
--> 870 return x.apply(f, **kwargs)
871
~/sandbox/pandas/pandas/core/resample.py in aggregate(self, func, *args, **kwargs)
246 *args,
--> 247 **kwargs)
248
~/sandbox/pandas/pandas/core/resample.py in _groupby_and_aggregate(self, how, grouper, *args, **kwargs)
332 # try to evaluate
--> 333 result = grouped.apply(how, *args, **kwargs)
334
~/sandbox/pandas/pandas/core/groupby/groupby.py in apply(self, func, *args, **kwargs)
695 with _group_selection_context(self):
--> 696 return self._python_apply_general(f)
697
~/sandbox/pandas/pandas/core/groupby/groupby.py in _python_apply_general(self, f)
701 keys, values, mutated = self.grouper.apply(f, self._selected_obj,
--> 702 self.axis)
703
~/sandbox/pandas/pandas/core/groupby/ops.py in apply(self, f, data, axis)
197 group_axes = _get_axes(group)
--> 198 res = f(group)
199 if not _is_indexed_like(res, group_axes):
TypeError: f() missing 1 required positional argument: 'arg'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
~/sandbox/pandas/pandas/core/resample.py in _groupby_and_aggregate(self, how, grouper, *args, **kwargs)
325 # Check if the function is reducing or not.
--> 326 result = grouped._aggregate_item_by_item(how, *args, **kwargs)
327 else:
~/sandbox/pandas/pandas/core/groupby/generic.py in _aggregate_item_by_item(self, func, *args, **kwargs)
279 if not len(result_columns) and errors is not None:
--> 280 raise errors
281
~/sandbox/pandas/pandas/core/groupby/generic.py in _aggregate_item_by_item(self, func, *args, **kwargs)
264 result[item] = self._try_cast(
--> 265 colg.aggregate(func, *args, **kwargs), data)
266 except ValueError:
~/sandbox/pandas/pandas/core/groupby/generic.py in aggregate(self, func_or_funcs, *args, **kwargs)
776 except Exception:
--> 777 result = self._aggregate_named(func_or_funcs, *args, **kwargs)
778
~/sandbox/pandas/pandas/core/groupby/generic.py in _aggregate_named(self, func, *args, **kwargs)
904 group.name = name
--> 905 output = func(group, *args, **kwargs)
906 if isinstance(output, (Series, Index, np.ndarray)):
TypeError: f() missing 1 required positional argument: 'arg'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
~/sandbox/pandas/pandas/core/groupby/groupby.py in apply(self, func, *args, **kwargs)
683 try:
--> 684 result = self._python_apply_general(f)
685 except Exception:
~/sandbox/pandas/pandas/core/groupby/groupby.py in _python_apply_general(self, f)
701 keys, values, mutated = self.grouper.apply(f, self._selected_obj,
--> 702 self.axis)
703
~/sandbox/pandas/pandas/core/groupby/ops.py in apply(self, f, data, axis)
197 group_axes = _get_axes(group)
--> 198 res = f(group)
199 if not _is_indexed_like(res, group_axes):
TypeError: f() missing 1 required positional argument: 'arg'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-4-9cc8965c7cbe> in <module>()
----> 1 df.groupby("A").resample("D").agg(f, 0)
~/sandbox/pandas/pandas/core/resample.py in aggregate(self, func, *args, **kwargs)
245 grouper,
246 *args,
--> 247 **kwargs)
248
249 result = self._apply_loffset(result)
~/sandbox/pandas/pandas/core/resample.py in _apply(self, f, *args, **kwargs)
870 return x.apply(f, **kwargs)
871
--> 872 result = self._groupby.apply(func)
873 return self._wrap_result(result)
874
~/sandbox/pandas/pandas/core/groupby/groupby.py in apply(self, func, *args, **kwargs)
694
695 with _group_selection_context(self):
--> 696 return self._python_apply_general(f)
697
698 return result
~/sandbox/pandas/pandas/core/groupby/groupby.py in _python_apply_general(self, f)
700 def _python_apply_general(self, f):
701 keys, values, mutated = self.grouper.apply(f, self._selected_obj,
--> 702 self.axis)
703
704 return self._wrap_applied_output(
~/sandbox/pandas/pandas/core/groupby/ops.py in apply(self, f, data, axis)
196 # group might be modified
197 group_axes = _get_axes(group)
--> 198 res = f(group)
199 if not _is_indexed_like(res, group_axes):
200 mutated = True
~/sandbox/pandas/pandas/core/resample.py in func(x)
868 return getattr(x, f)(**kwargs)
869
--> 870 return x.apply(f, **kwargs)
871
872 result = self._groupby.apply(func)
~/sandbox/pandas/pandas/core/resample.py in aggregate(self, func, *args, **kwargs)
245 grouper,
246 *args,
--> 247 **kwargs)
248
249 result = self._apply_loffset(result)
~/sandbox/pandas/pandas/core/resample.py in _groupby_and_aggregate(self, how, grouper, *args, **kwargs)
331 # we have a non-reducing function
332 # try to evaluate
--> 333 result = grouped.apply(how, *args, **kwargs)
334
335 result = self._apply_loffset(result)
~/sandbox/pandas/pandas/core/groupby/groupby.py in apply(self, func, *args, **kwargs)
694
695 with _group_selection_context(self):
--> 696 return self._python_apply_general(f)
697
698 return result
~/sandbox/pandas/pandas/core/groupby/groupby.py in _python_apply_general(self, f)
700 def _python_apply_general(self, f):
701 keys, values, mutated = self.grouper.apply(f, self._selected_obj,
--> 702 self.axis)
703
704 return self._wrap_applied_output(
~/sandbox/pandas/pandas/core/groupby/ops.py in apply(self, f, data, axis)
196 # group might be modified
197 group_axes = _get_axes(group)
--> 198 res = f(group)
199 if not _is_indexed_like(res, group_axes):
200 mutated = True
TypeError: f() missing 1 required positional argument: 'arg'
So IIUC, we want to be passing through *args
here.
pandas/tests/test_resample.py
Outdated
@@ -2165,6 +2165,14 @@ def test_resample_datetime_values(self): | |||
res = df['timestamp'].resample('2D').first() | |||
tm.assert_series_equal(res, exp) | |||
|
|||
def test_resample_apply_with_additional_args(self): | |||
def f(data, add_arg): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A better test would be to return add_arg
, to ensure that the proper value is passed through.
pandas/tests/test_resample.py
Outdated
@@ -2165,6 +2165,14 @@ def test_resample_datetime_values(self): | |||
res = df['timestamp'].resample('2D').first() | |||
tm.assert_series_equal(res, exp) | |||
|
|||
def test_resample_apply_with_additional_args(self): | |||
def f(data, add_arg): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a comment referencing the github issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add this issue number as a comment
pandas/core/resample.py
Outdated
@@ -239,7 +239,9 @@ def aggregate(self, arg, *args, **kwargs): | |||
self._set_binner() | |||
result, how = self._aggregate(arg, *args, **kwargs) | |||
if result is None: | |||
grouper = kwargs.get('grouper', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the user passes a function like def f(arr, grouper)
? Does this raise? If so, can we provide a good error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct. Is it okay to pass None
like?:
self._groupby_and_aggregate(arg, None, *args, **kwargs)
Codecov Report
@@ Coverage Diff @@
## master #22261 +/- ##
==========================================
- Coverage 92.05% 92.05% -0.01%
==========================================
Files 169 169
Lines 50733 50735 +2
==========================================
+ Hits 46702 46703 +1
- Misses 4031 4032 +1
Continue to review full report at Codecov.
|
pandas/core/resample.py
Outdated
@@ -240,6 +240,7 @@ def aggregate(self, arg, *args, **kwargs): | |||
result, how = self._aggregate(arg, *args, **kwargs) | |||
if result is None: | |||
result = self._groupby_and_aggregate(arg, | |||
None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why you need this None
argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using test scenario, multiplier
arg will be passed to _groupby_and_aggregate
as grouper
arg if we omit None
. So it's better to pass None
explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you then pass grouper as a kwarg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we may need to rename this arg itself to avoid a user pass ing grouper=
(unlikely though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you then pass grouper as a kwarg
I can't.
TypeError: _groupby_and_aggregate() got multiple values for argument 'grouper'
The workaround for this will be something like that:
grouper = None
result = self._groupby_and_aggregate(arg,
grouper,
*args,
**kwargs)
we may need to rename this arg itself to avoid a user pass ing grouper= (unlikely though)
At this moment, there may be TypeError
if a user pass grouper
or how
args to Resampler.apply
or even Series.apply
:
TypeError: _groupby_and_aggregate() got multiple values for argument 'how'
I can tackle it within another issue.
return np.mean(data) * add_arg | ||
|
||
multiplier = 10 | ||
result = self.series.resample('D').apply(f, multiplier) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test both as an arg and as a kwargs
doc/source/whatsnew/v0.23.5.txt
Outdated
@@ -34,6 +34,7 @@ Bug Fixes | |||
**Groupby/Resample/Rolling** | |||
|
|||
- Bug in :meth:`DataFrame.resample` when resampling ``NaT`` in ``TimeDeltaIndex`` (:issue:`13223`). | |||
- Bug in :meth:`Resampler.apply` when passing an additional argument to applied func (:issue:`14615`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to 0.24.0
pandas/core/resample.py
Outdated
@@ -240,6 +240,7 @@ def aggregate(self, arg, *args, **kwargs): | |||
result, how = self._aggregate(arg, *args, **kwargs) | |||
if result is None: | |||
result = self._groupby_and_aggregate(arg, | |||
None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you then pass grouper as a kwarg
pandas/core/resample.py
Outdated
@@ -240,6 +240,7 @@ def aggregate(self, arg, *args, **kwargs): | |||
result, how = self._aggregate(arg, *args, **kwargs) | |||
if result is None: | |||
result = self._groupby_and_aggregate(arg, | |||
None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we may need to rename this arg itself to avoid a user pass ing grouper=
(unlikely though)
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -650,7 +650,7 @@ Groupby/Resample/Rolling | |||
``SeriesGroupBy`` when the grouping variable only contains NaNs and numpy version < 1.13 (:issue:`21956`). | |||
- Multiple bugs in :func:`pandas.core.Rolling.min` with ``closed='left'` and a | |||
datetime-like index leading to incorrect results and also segfault. (:issue:`21704`) | |||
- | |||
- Bug in :meth:`Resampler.apply` when passing an additional argument to applied func (:issue:`14615`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you clarify that this is postiional args
pandas/core/resample.py
Outdated
result = self._groupby_and_aggregate(arg, | ||
grouper, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we would pass both how
and grouper
as kwargs, but you can only do that in PY3, ok
add ad how=arg
and pass how
to make this more clear then (positionally as the first arg, just where arg is now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and actually rename arg
to func
, which is what it is called elsewhere
Ping @jreback @TomAugspurger : Could you have a look? |
Hello @discort! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on August 20, 2018 at 18:45 Hours UTC |
@TomAugspurger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comment, ping on green.
pandas/tests/test_resample.py
Outdated
@@ -2165,6 +2165,14 @@ def test_resample_datetime_values(self): | |||
res = df['timestamp'].resample('2D').first() | |||
tm.assert_series_equal(res, exp) | |||
|
|||
def test_resample_apply_with_additional_args(self): | |||
def f(data, add_arg): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add this issue number as a comment
Thanks! |
git diff upstream/master -u -- "*.py" | flake8 --diff
There is flake8 warning ^ but it is not related with my changes.