Skip to content

BUG: groupby with resample using on parameter errors when selecting column to apply function #19433

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

discort
Copy link
Contributor

@discort discort commented Jan 28, 2018

@discort discort force-pushed the fix_17813 branch 3 times, most recently from e4ca463 to 836139e Compare January 29, 2018 14:36
@codecov
Copy link

codecov bot commented Jan 29, 2018

Codecov Report

Merging #19433 into master will decrease coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19433      +/-   ##
==========================================
- Coverage   91.62%   91.59%   -0.03%     
==========================================
  Files         150      150              
  Lines       48714    48720       +6     
==========================================
- Hits        44633    44625       -8     
- Misses       4081     4095      +14
Flag Coverage Δ
#multiple 89.96% <100%> (-0.03%) ⬇️
#single 41.75% <22.22%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/groupby.py 92.17% <100%> (+0.02%) ⬆️
pandas/plotting/_converter.py 65.22% <0%> (-1.74%) ⬇️
pandas/util/testing.py 83.64% <0%> (-0.21%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update de39a15...2f25d40. Read the comment docs.

@discort
Copy link
Contributor Author

discort commented Jan 29, 2018

@jreback

@jreback
Copy link
Contributor

jreback commented Jan 30, 2018

try with something like this. your patch is just working around the issue and not fixing it at the source. its pretty deep code, but try here.

diff --git a/pandas/core/groupby.py b/pandas/core/groupby.py
index 4c03f3bfa..6a2cd69eb 100644
--- a/pandas/core/groupby.py
+++ b/pandas/core/groupby.py
@@ -466,11 +466,15 @@ class Grouper(object):
                 "The Grouper cannot specify both a key and a level!")
 
         # the key must be a valid info item
-        if self.key is not None and not isinstance(obj, Series):
+        if self.key is not None:
             key = self.key
-            if key not in obj._info_axis:
-                raise KeyError("The grouper name {0} is not found".format(key))
-            ax = Index(obj[key], name=key)
+
+            if getattr(self.grouper, 'name', None) == key:
+                ax = self.grouper
+            else:
+                if key not in obj._info_axis:
+                    raise KeyError("The grouper name {0} is not found".format(key))
+                ax = Index(obj[key], name=key)
 
         else:
             ax = obj._get_axis(self.axis)
diff --git a/pandas/core/resample.py b/pandas/core/resample.py
index 03b522abc..43391061d 100644
--- a/pandas/core/resample.py
+++ b/pandas/core/resample.py
@@ -743,11 +743,6 @@ class _GroupByMixin(GroupByMixin):
 
             return x.apply(f, **kwargs)
 
-        # Patch the index in case we're applying after __getitem__. GH 17813
-        if self.groupby.key:
-            key = self.groupby.key
-            self._groupby.obj.index = Index(self.groupby.obj[key], key=key)
-
         result = self._groupby.apply(func)
         return self._wrap_result(result)
 

@discort
Copy link
Contributor Author

discort commented Jan 30, 2018

I applied your diff. The result, in this case, would be:

In [11]: result
Out[11]:
id  date
a   2016-01-01    2
    2016-01-03    0
    2016-01-05    0
b   2016-01-01    2
    2016-01-03    1
    2016-01-05    0
Name: data, dtype: int64

instead of:

In [10]: exp
Out[10]:
id  date
a   2016-01-01    2
b   2016-01-03    2
    2016-01-05    1
Name: data, dtype: int64

But I got your idea. The main challenge is that self.grouper is overriding here. Could you take a look at newest solution? @jreback

@discort discort force-pushed the fix_17813 branch 2 times, most recently from 838c058 to b4dbb59 Compare February 4, 2018 06:56
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change looks good. some comments. ping on green.

@@ -548,7 +548,7 @@ Groupby/Resample/Rolling
- Bug in :func:`DataFrame.resample` which silently ignored unsupported (or mistyped) options for ``label``, ``closed`` and ``convention`` (:issue:`19303`)
- Bug in :func:`DataFrame.groupby` where tuples were interpreted as lists of keys rather than as keys (:issue:`17979`, :issue:`18249`)
- Bug in ``transform`` where particular aggregation functions were being incorrectly cast to match the dtype(s) of the grouped data (:issue:`19200`)
-
- Bug in `DataFrame.groupby` with resample using `on` parameter when selecting column to apply function (:issue:`17813`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passing the on= kwarg, and subsequently using .apply() (not the double backticks on both)

if key not in obj._info_axis:
raise KeyError("The grouper name {0} is not found".format(key))
ax = Index(obj[key], name=key)
if getattr(self.grouper, 'name', None) == key and \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment here about what this is doing (IOW the on is already defined).

use ABCSeries rather than Series

@@ -252,6 +252,15 @@ def test_pipe(self):
result = r.pipe(lambda x: x.max() - x.mean())
tm.assert_frame_equal(result, expected)

def test_groupby_resample_on_api_with_getitem(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this a bit lower down in the file (immediately after the other getitem tests)

@discort
Copy link
Contributor Author

discort commented Feb 5, 2018

@jreback

@jreback jreback closed this in 5b58a20 Feb 5, 2018
@jreback jreback added this to the 0.23.0 milestone Feb 5, 2018
@jreback
Copy link
Contributor

jreback commented Feb 5, 2018

thanks @discort

harisbal pushed a commit to harisbal/pandas that referenced this pull request Feb 28, 2018
…olumn to apply function

closes pandas-dev#17813

Author: discort <[email protected]>

Closes pandas-dev#19433 from discort/fix_17813 and squashes the following commits:

2f25d40 [discort] Fixed bug in df.resample using 'on' parameter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Resample resample method
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: groupby with resample using on parameter errors when selecting column to apply function
2 participants