Skip to content

resample().apply not returning multiple columns like groupby(pd.Timegrouper()).apply #17950

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 27, 2017

Conversation

discort
Copy link
Contributor

@discort discort commented Oct 23, 2017

@codecov
Copy link

codecov bot commented Oct 23, 2017

Codecov Report

Merging #17950 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17950      +/-   ##
==========================================
- Coverage   91.24%   91.23%   -0.01%     
==========================================
  Files         163      163              
  Lines       50173    50175       +2     
==========================================
- Hits        45778    45776       -2     
- Misses       4395     4399       +4
Flag Coverage Δ
#multiple 89.04% <100%> (+0.01%) ⬆️
#single 40.28% <0%> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/core/resample.py 96.16% <100%> (+0.01%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️
pandas/io/msgpack/_version.py 44.65% <0%> (+1.9%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8137209...ada6b45. Read the comment docs.

@discort
Copy link
Contributor Author

discort commented Oct 23, 2017

@jreback

@@ -3103,6 +3103,20 @@ def f(x):
result = g.apply(f)
assert_frame_equal(result, expected)

def test_apply_with_mutated_index(self):
index = pd.date_range('1-1-2015', '12-31-15', freq='D')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the github issue number here as a comment? See the test below for an example.

@gfyoung gfyoung added Bug Resample resample method Datetime Datetime data dtype labels Oct 23, 2017
@@ -405,6 +405,15 @@ def _groupby_and_aggregate(self, how, grouper=None, *args, **kwargs):
result = self._apply_loffset(result)
return self._wrap_result(result)

def _try_aggregate(self, grouped, how, *args, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add docstring.

grouped : GroupBy object
how : string / cython mapped function
"""
if not compat.callable(how) or isinstance(grouped, PanelGroupBy):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of this, add the PanelGroupbyCase above (IOW it can only be a panel groupby in the except clause of the previous try/except).

you can then put the rest in the try/except, no need to add another function here. add comments indicating what is going on.

@discort
Copy link
Contributor Author

discort commented Oct 24, 2017

@jreback

try:
grouped = groupby(obj, by=None, grouper=grouper, axis=self.axis)
except TypeError:

# panel grouper
grouped = PanelGroupBy(obj, grouper=grouper, axis=self.axis)

try:
result = grouped.aggregate(how, *args, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you remove this (and remove the if statement for result is None on 399) does this stil work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback

This test is failed: TestDatetimeIndex.test_resample_panel_numpy

E       AssertionError: DataFrame.iloc[:, 0] are different
E       
E       DataFrame.iloc[:, 0] values are different (100.0 %)
E       [left]:  [nan, nan, nan, nan, nan, nan]
E       [right]: [-0.0517234265172, -0.475611549875, -0.32227637063, -0.443460761802, -0.0929628389341, -0.10035826564]

However applying these changes make tests to pass

--- a/pandas/core/resample.py
+++ b/pandas/core/resample.py
@@ -395,7 +395,11 @@ class Resampler(_GroupBy):
             grouped = PanelGroupBy(obj, grouper=grouper, axis=self.axis)
 
         try:
-            result = grouped.aggregate(how, *args, **kwargs)
+            if compat.callable(how) and not isinstance(grouped, PanelGroupBy):
+                # Check if the function is reducing or not.
+                result = grouped._aggregate_item_by_item(how, *args, **kwargs)
+            else:
+                result = grouped.aggregate(how, *args, **kwargs)
         except Exception:

Let me know, could I apply these chages or not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah let's go with your changes here. just trying to make code simpler.

Copy link
Contributor Author

@discort discort Oct 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excuse me, to be more precise it should be:

--- a/pandas/core/resample.py
+++ b/pandas/core/resample.py
@@ -395,7 +395,11 @@ class Resampler(_GroupBy):
             grouped = PanelGroupBy(obj, grouper=grouper, axis=self.axis)
 
         try:
-            result = grouped.aggregate(how, *args, **kwargs)
+            if isinstance(obj, ABCDataFrame) and compat.callable(how):
+                # Check if the function is reducing or not.
+                result = grouped._aggregate_item_by_item(how, *args, **kwargs)
+            else:
+                result = grouped.aggregate(how, *args, **kwargs)

because core.groupby.SeriesGroupBy obj doesn't contain _aggregate_item_by_item method

*args, **kwargs)
else:
result = grouped.aggregate(how, *args, **kwargs)
except Exception:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you catch a more specific exception here, maybe TypeError and/or ValueError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback

It catches a lot of exceptions, including base Exception as well. I found even

self = <pandas.core.groupby.DataFrameGroupBy object at 0x7f7e5a3ce048>, how = 'mean', alt = None, numeric_only = True

    def _cython_agg_blocks(self, how, alt=None, numeric_only=True):
        # TODO: the actual managing of mgr_locs is a PITA
        # here, it should happen via BlockManager.combine
    
        data, agg_axis = self._get_data_to_aggregate()
    
        if numeric_only:
            data = data.get_numeric_data(copy=False)
    
        new_blocks = []
        new_items = []
        deleted_items = []
        for block in data.blocks:
    
            locs = block.mgr_locs.as_array
            try:
                result, _ = self.grouper.aggregate(
                    block.values, how, axis=agg_axis)
            except NotImplementedError:
                # generally if we have numeric_only=False
                # and non-applicable functions
                # try to python agg
    
                if alt is None:
                    # we cannot perform the operation
                    # in an alternate way, exclude the block
                    deleted_items.append(locs)
                    continue
    
                # call our grouper again with only this block
                obj = self.obj[data.items[locs]]
                s = groupby(obj, self.grouper)
                result = s.aggregate(lambda x: alt(x, axis=self.axis))
                newb = result._data.blocks[0]
    
            finally:
    
                # see if we can cast the block back to the original dtype
>               result = block._try_coerce_and_cast_result(result)
E               UnboundLocalError: local variable 'result' referenced before assignment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that last is quite tricky, this entire routine is actually also wrapped in try/except at a higher level to basically do a .apply (rather than try cython) if things fail. many edge cases here.

you can try o debug that unboundedlocal error, that doesn't seem right (prob result is not defined, and it should raise a sensible error rather than try to coerce a block if all else fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback

May I do not touch except Exception: line? It seems like a separate issue.

I've found next exceptions there: IndexError, UnboundLocalError, AssertionError, pandas.core.base.DataError.

@discort
Copy link
Contributor Author

discort commented Oct 25, 2017

@jreback

expected = df.groupby(pd.Grouper(freq='M')).apply(f)

result = df.resample('M').apply(f)
assert_frame_equal(result, expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test with a Series as well. (as you now have a separate case for that).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback

Added. Let me know if the test is okay or not.

@discort discort force-pushed the fix_15169 branch 2 times, most recently from 70a8c81 to 5bc7f9f Compare October 26, 2017 07:29
@@ -1023,6 +1023,7 @@ Groupby/Resample/Rolling
- Bug in ``DataFrame.groupby`` where a single level selection from a ``MultiIndex`` unexpectedly sorts (:issue:`17537`)
- Bug in ``DataFrame.groupby`` where spurious warning is raised when ``Grouper`` object is used to override ambiguous column name (:issue:`17383`)
- Bug in ``TimeGrouper`` differs when passes as a list and as a scalar (:issue:`17530`)
- Bug in ``DataFrame.resample(...).apply(...)`` when there is a callable that returns different columns (:issue:`15169`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move to 0.21.1 bug fix section

@discort
Copy link
Contributor Author

discort commented Oct 27, 2017

@jreback

@jreback jreback added this to the 0.21.1 milestone Oct 27, 2017
@jreback jreback merged commit bdeadb9 into pandas-dev:master Oct 27, 2017
@jreback
Copy link
Contributor

jreback commented Oct 27, 2017

thanks @discort very nice! (and responsive too!)

peterpanmj pushed a commit to peterpanmj/pandas that referenced this pull request Oct 31, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Dec 8, 2017
TomAugspurger pushed a commit that referenced this pull request Dec 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Resample resample method
Projects
None yet
Development

Successfully merging this pull request may close these issues.

resample().apply not returning multiple columns like groupby(pd.Timegrouper()).apply
5 participants