resample().apply not returning multiple columns like groupby(pd.Timegrouper()).apply #17950

discort · 2017-10-23T09:36:08Z

closes resample().apply not returning multiple columns like groupby(pd.Timegrouper()).apply #15169
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

codecov · 2017-10-23T13:10:46Z

Codecov Report

Merging #17950 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #17950      +/-   ##
==========================================
- Coverage   91.24%   91.23%   -0.01%     
==========================================
  Files         163      163              
  Lines       50173    50175       +2     
==========================================
- Hits        45778    45776       -2     
- Misses       4395     4399       +4

Flag	Coverage Δ
#multiple	`89.04% <100%> (+0.01%)`	⬆️
#single	`40.28% <0%> (-0.06%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/resample.py	`96.16% <100%> (+0.01%)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.75% <0%> (-0.1%)`	⬇️
pandas/io/msgpack/_version.py	`44.65% <0%> (+1.9%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8137209...ada6b45. Read the comment docs.

discort · 2017-10-23T13:16:42Z

@jreback

jschendel · 2017-10-23T14:06:18Z

pandas/tests/test_resample.py

@@ -3103,6 +3103,20 @@ def f(x):
        result = g.apply(f)
        assert_frame_equal(result, expected)

+    def test_apply_with_mutated_index(self):
+        index = pd.date_range('1-1-2015', '12-31-15', freq='D')


Can you add the github issue number here as a comment? See the test below for an example.

gfyoung · 2017-10-23T18:44:06Z

pandas/core/resample.py

@@ -405,6 +405,15 @@ def _groupby_and_aggregate(self, how, grouper=None, *args, **kwargs):
        result = self._apply_loffset(result)
        return self._wrap_result(result)

+    def _try_aggregate(self, grouped, how, *args, **kwargs):


Add docstring.

jreback · 2017-10-24T10:16:57Z

pandas/core/resample.py

+        grouped : GroupBy object
+        how : string / cython mapped function
+        """
+        if not compat.callable(how) or isinstance(grouped, PanelGroupBy):


instead of this, add the PanelGroupbyCase above (IOW it can only be a panel groupby in the except clause of the previous try/except).

you can then put the rest in the try/except, no need to add another function here. add comments indicating what is going on.

discort · 2017-10-24T19:55:49Z

@jreback

jreback · 2017-10-25T12:23:45Z

pandas/core/resample.py

        try:
            grouped = groupby(obj, by=None, grouper=grouper, axis=self.axis)
        except TypeError:

            # panel grouper
            grouped = PanelGroupBy(obj, grouper=grouper, axis=self.axis)
-
-        try:
            result = grouped.aggregate(how, *args, **kwargs)


if you remove this (and remove the if statement for result is None on 399) does this stil work?

@jreback

This test is failed: TestDatetimeIndex.test_resample_panel_numpy

E AssertionError: DataFrame.iloc[:, 0] are different E E DataFrame.iloc[:, 0] values are different (100.0 %) E [left]: [nan, nan, nan, nan, nan, nan] E [right]: [-0.0517234265172, -0.475611549875, -0.32227637063, -0.443460761802, -0.0929628389341, -0.10035826564]

However applying these changes make tests to pass

--- a/pandas/core/resample.py +++ b/pandas/core/resample.py @@ -395,7 +395,11 @@ class Resampler(_GroupBy): grouped = PanelGroupBy(obj, grouper=grouper, axis=self.axis) try: - result = grouped.aggregate(how, *args, **kwargs) + if compat.callable(how) and not isinstance(grouped, PanelGroupBy): + # Check if the function is reducing or not. + result = grouped._aggregate_item_by_item(how, *args, **kwargs) + else: + result = grouped.aggregate(how, *args, **kwargs) except Exception:

Let me know, could I apply these chages or not?

yeah let's go with your changes here. just trying to make code simpler.

Excuse me, to be more precise it should be:

--- a/pandas/core/resample.py +++ b/pandas/core/resample.py @@ -395,7 +395,11 @@ class Resampler(_GroupBy): grouped = PanelGroupBy(obj, grouper=grouper, axis=self.axis) try: - result = grouped.aggregate(how, *args, **kwargs) + if isinstance(obj, ABCDataFrame) and compat.callable(how): + # Check if the function is reducing or not. + result = grouped._aggregate_item_by_item(how, *args, **kwargs) + else: + result = grouped.aggregate(how, *args, **kwargs)

because core.groupby.SeriesGroupBy obj doesn't contain _aggregate_item_by_item method

jreback · 2017-10-25T12:24:02Z

pandas/core/resample.py

+                                                             *args, **kwargs)
+                else:
+                    result = grouped.aggregate(how, *args, **kwargs)
+            except Exception:


can you catch a more specific exception here, maybe TypeError and/or ValueError

@jreback

It catches a lot of exceptions, including base Exception as well. I found even

self = <pandas.core.groupby.DataFrameGroupBy object at 0x7f7e5a3ce048>, how = 'mean', alt = None, numeric_only = True def _cython_agg_blocks(self, how, alt=None, numeric_only=True): # TODO: the actual managing of mgr_locs is a PITA # here, it should happen via BlockManager.combine data, agg_axis = self._get_data_to_aggregate() if numeric_only: data = data.get_numeric_data(copy=False) new_blocks = [] new_items = [] deleted_items = [] for block in data.blocks: locs = block.mgr_locs.as_array try: result, _ = self.grouper.aggregate( block.values, how, axis=agg_axis) except NotImplementedError: # generally if we have numeric_only=False # and non-applicable functions # try to python agg if alt is None: # we cannot perform the operation # in an alternate way, exclude the block deleted_items.append(locs) continue # call our grouper again with only this block obj = self.obj[data.items[locs]] s = groupby(obj, self.grouper) result = s.aggregate(lambda x: alt(x, axis=self.axis)) newb = result._data.blocks[0] finally: # see if we can cast the block back to the original dtype > result = block._try_coerce_and_cast_result(result) E UnboundLocalError: local variable 'result' referenced before assignment

that last is quite tricky, this entire routine is actually also wrapped in try/except at a higher level to basically do a .apply (rather than try cython) if things fail. many edge cases here.

you can try o debug that unboundedlocal error, that doesn't seem right (prob result is not defined, and it should raise a sensible error rather than try to coerce a block if all else fails.

@jreback

May I do not touch except Exception: line? It seems like a separate issue.

I've found next exceptions there: IndexError, UnboundLocalError, AssertionError, pandas.core.base.DataError.

discort · 2017-10-25T15:34:56Z

@jreback

jreback · 2017-10-26T01:49:00Z

pandas/tests/test_resample.py

+        expected = df.groupby(pd.Grouper(freq='M')).apply(f)
+
+        result = df.resample('M').apply(f)
+        assert_frame_equal(result, expected)


can you add a test with a Series as well. (as you now have a separate case for that).

@jreback

Added. Let me know if the test is okay or not.

jreback · 2017-10-27T10:29:09Z

doc/source/whatsnew/v0.21.0.txt

@@ -1023,6 +1023,7 @@ Groupby/Resample/Rolling
 - Bug in ``DataFrame.groupby`` where a single level selection from a ``MultiIndex`` unexpectedly sorts (:issue:`17537`)
 - Bug in ``DataFrame.groupby`` where spurious warning is raised when ``Grouper`` object is used to override ambiguous column name (:issue:`17383`)
 - Bug in ``TimeGrouper`` differs when passes as a list and as a scalar (:issue:`17530`)
+- Bug in ``DataFrame.resample(...).apply(...)`` when there is a callable that returns different columns (:issue:`15169`)


can you move to 0.21.1 bug fix section

discort · 2017-10-27T18:58:16Z

@jreback

jreback · 2017-10-27T20:32:30Z

thanks @discort very nice! (and responsive too!)

(cherry picked from commit bdeadb9)

discort closed this Oct 23, 2017

discort reopened this Oct 23, 2017

discort force-pushed the fix_15169 branch from 287a984 to 0c1184f Compare October 23, 2017 12:13

jschendel reviewed Oct 23, 2017

View reviewed changes

discort force-pushed the fix_15169 branch from 0c1184f to 6b1c26c Compare October 23, 2017 14:15

gfyoung added Bug Resample resample method Datetime Datetime data dtype labels Oct 23, 2017

gfyoung reviewed Oct 23, 2017

View reviewed changes

discort force-pushed the fix_15169 branch from 6b1c26c to f90d3e7 Compare October 24, 2017 07:49

jreback requested changes Oct 24, 2017

View reviewed changes

discort force-pushed the fix_15169 branch from 926ae91 to 6a90b79 Compare October 24, 2017 16:39

jreback requested changes Oct 25, 2017

View reviewed changes

discort force-pushed the fix_15169 branch from 6a90b79 to 4d07649 Compare October 25, 2017 14:03

jreback reviewed Oct 26, 2017

View reviewed changes

discort force-pushed the fix_15169 branch 2 times, most recently from 70a8c81 to 5bc7f9f Compare October 26, 2017 07:29

jreback requested changes Oct 27, 2017

View reviewed changes

Added applying of multiple columns to resample

ada6b45

discort force-pushed the fix_15169 branch from 5bc7f9f to ada6b45 Compare October 27, 2017 15:50

jreback added this to the 0.21.1 milestone Oct 27, 2017

jreback added the Needs Backport label Oct 27, 2017

jreback approved these changes Oct 27, 2017

View reviewed changes

jreback merged commit bdeadb9 into pandas-dev:master Oct 27, 2017

peterpanmj pushed a commit to peterpanmj/pandas that referenced this pull request Oct 31, 2017

Added applying of multiple columns to resample (pandas-dev#17950)

a32a877

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

Added applying of multiple columns to resample (pandas-dev#17950)

dc05077

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Dec 8, 2017

Added applying of multiple columns to resample (pandas-dev#17950)

6a4567e

(cherry picked from commit bdeadb9)

TomAugspurger pushed a commit that referenced this pull request Dec 11, 2017

Added applying of multiple columns to resample (#17950)

af01122

(cherry picked from commit bdeadb9)

TomAugspurger removed the Needs Backport label Dec 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resample().apply not returning multiple columns like groupby(pd.Timegrouper()).apply #17950

resample().apply not returning multiple columns like groupby(pd.Timegrouper()).apply #17950

discort commented Oct 23, 2017 •

edited

Loading

codecov bot commented Oct 23, 2017 •

edited

Loading

discort commented Oct 23, 2017

jschendel Oct 23, 2017

gfyoung Oct 23, 2017

jreback Oct 24, 2017

discort commented Oct 24, 2017

jreback Oct 25, 2017

discort Oct 25, 2017

jreback Oct 25, 2017

discort Oct 25, 2017 •

edited

Loading

jreback Oct 25, 2017

discort Oct 25, 2017

jreback Oct 25, 2017

discort Oct 25, 2017

discort commented Oct 25, 2017

jreback Oct 26, 2017

discort Oct 26, 2017

jreback Oct 27, 2017

discort commented Oct 27, 2017

jreback commented Oct 27, 2017

resample().apply not returning multiple columns like groupby(pd.Timegrouper()).apply #17950

resample().apply not returning multiple columns like groupby(pd.Timegrouper()).apply #17950

Conversation

discort commented Oct 23, 2017 • edited Loading

codecov bot commented Oct 23, 2017 • edited Loading

Codecov Report

discort commented Oct 23, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

discort commented Oct 24, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

discort Oct 25, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

discort commented Oct 25, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

discort commented Oct 27, 2017

jreback commented Oct 27, 2017

discort commented Oct 23, 2017 •

edited

Loading

codecov bot commented Oct 23, 2017 •

edited

Loading

discort Oct 25, 2017 •

edited

Loading