-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
resample().apply not returning multiple columns like groupby(pd.Timegrouper()).apply #17950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #17950 +/- ##
==========================================
- Coverage 91.24% 91.23% -0.01%
==========================================
Files 163 163
Lines 50173 50175 +2
==========================================
- Hits 45778 45776 -2
- Misses 4395 4399 +4
Continue to review full report at Codecov.
|
pandas/tests/test_resample.py
Outdated
@@ -3103,6 +3103,20 @@ def f(x): | |||
result = g.apply(f) | |||
assert_frame_equal(result, expected) | |||
|
|||
def test_apply_with_mutated_index(self): | |||
index = pd.date_range('1-1-2015', '12-31-15', freq='D') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the github issue number here as a comment? See the test below for an example.
pandas/core/resample.py
Outdated
@@ -405,6 +405,15 @@ def _groupby_and_aggregate(self, how, grouper=None, *args, **kwargs): | |||
result = self._apply_loffset(result) | |||
return self._wrap_result(result) | |||
|
|||
def _try_aggregate(self, grouped, how, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add docstring.
pandas/core/resample.py
Outdated
grouped : GroupBy object | ||
how : string / cython mapped function | ||
""" | ||
if not compat.callable(how) or isinstance(grouped, PanelGroupBy): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of this, add the PanelGroupbyCase above (IOW it can only be a panel groupby in the except clause of the previous try/except).
you can then put the rest in the try/except, no need to add another function here. add comments indicating what is going on.
pandas/core/resample.py
Outdated
try: | ||
grouped = groupby(obj, by=None, grouper=grouper, axis=self.axis) | ||
except TypeError: | ||
|
||
# panel grouper | ||
grouped = PanelGroupBy(obj, grouper=grouper, axis=self.axis) | ||
|
||
try: | ||
result = grouped.aggregate(how, *args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you remove this (and remove the if statement for result is None
on 399) does this stil work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is failed: TestDatetimeIndex.test_resample_panel_numpy
E AssertionError: DataFrame.iloc[:, 0] are different
E
E DataFrame.iloc[:, 0] values are different (100.0 %)
E [left]: [nan, nan, nan, nan, nan, nan]
E [right]: [-0.0517234265172, -0.475611549875, -0.32227637063, -0.443460761802, -0.0929628389341, -0.10035826564]
However applying these changes make tests to pass
--- a/pandas/core/resample.py
+++ b/pandas/core/resample.py
@@ -395,7 +395,11 @@ class Resampler(_GroupBy):
grouped = PanelGroupBy(obj, grouper=grouper, axis=self.axis)
try:
- result = grouped.aggregate(how, *args, **kwargs)
+ if compat.callable(how) and not isinstance(grouped, PanelGroupBy):
+ # Check if the function is reducing or not.
+ result = grouped._aggregate_item_by_item(how, *args, **kwargs)
+ else:
+ result = grouped.aggregate(how, *args, **kwargs)
except Exception:
Let me know, could I apply these chages or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah let's go with your changes here. just trying to make code simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excuse me, to be more precise it should be:
--- a/pandas/core/resample.py
+++ b/pandas/core/resample.py
@@ -395,7 +395,11 @@ class Resampler(_GroupBy):
grouped = PanelGroupBy(obj, grouper=grouper, axis=self.axis)
try:
- result = grouped.aggregate(how, *args, **kwargs)
+ if isinstance(obj, ABCDataFrame) and compat.callable(how):
+ # Check if the function is reducing or not.
+ result = grouped._aggregate_item_by_item(how, *args, **kwargs)
+ else:
+ result = grouped.aggregate(how, *args, **kwargs)
because core.groupby.SeriesGroupBy
obj doesn't contain _aggregate_item_by_item
method
pandas/core/resample.py
Outdated
*args, **kwargs) | ||
else: | ||
result = grouped.aggregate(how, *args, **kwargs) | ||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you catch a more specific exception here, maybe TypeError and/or ValueError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It catches a lot of exceptions, including base Exception
as well. I found even
self = <pandas.core.groupby.DataFrameGroupBy object at 0x7f7e5a3ce048>, how = 'mean', alt = None, numeric_only = True
def _cython_agg_blocks(self, how, alt=None, numeric_only=True):
# TODO: the actual managing of mgr_locs is a PITA
# here, it should happen via BlockManager.combine
data, agg_axis = self._get_data_to_aggregate()
if numeric_only:
data = data.get_numeric_data(copy=False)
new_blocks = []
new_items = []
deleted_items = []
for block in data.blocks:
locs = block.mgr_locs.as_array
try:
result, _ = self.grouper.aggregate(
block.values, how, axis=agg_axis)
except NotImplementedError:
# generally if we have numeric_only=False
# and non-applicable functions
# try to python agg
if alt is None:
# we cannot perform the operation
# in an alternate way, exclude the block
deleted_items.append(locs)
continue
# call our grouper again with only this block
obj = self.obj[data.items[locs]]
s = groupby(obj, self.grouper)
result = s.aggregate(lambda x: alt(x, axis=self.axis))
newb = result._data.blocks[0]
finally:
# see if we can cast the block back to the original dtype
> result = block._try_coerce_and_cast_result(result)
E UnboundLocalError: local variable 'result' referenced before assignment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that last is quite tricky, this entire routine is actually also wrapped in try/except at a higher level to basically do a .apply
(rather than try cython) if things fail. many edge cases here.
you can try o debug that unboundedlocal error, that doesn't seem right (prob result is not defined, and it should raise a sensible error rather than try to coerce a block if all else fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I do not touch except Exception:
line? It seems like a separate issue.
I've found next exceptions there: IndexError
, UnboundLocalError
, AssertionError
, pandas.core.base.DataError
.
expected = df.groupby(pd.Grouper(freq='M')).apply(f) | ||
|
||
result = df.resample('M').apply(f) | ||
assert_frame_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test with a Series as well. (as you now have a separate case for that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added. Let me know if the test is okay or not.
70a8c81
to
5bc7f9f
Compare
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -1023,6 +1023,7 @@ Groupby/Resample/Rolling | |||
- Bug in ``DataFrame.groupby`` where a single level selection from a ``MultiIndex`` unexpectedly sorts (:issue:`17537`) | |||
- Bug in ``DataFrame.groupby`` where spurious warning is raised when ``Grouper`` object is used to override ambiguous column name (:issue:`17383`) | |||
- Bug in ``TimeGrouper`` differs when passes as a list and as a scalar (:issue:`17530`) | |||
- Bug in ``DataFrame.resample(...).apply(...)`` when there is a callable that returns different columns (:issue:`15169`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move to 0.21.1 bug fix section
thanks @discort very nice! (and responsive too!) |
(cherry picked from commit bdeadb9)
(cherry picked from commit bdeadb9)
git diff upstream/master -u -- "*.py" | flake8 --diff