Skip to content

BUG: df.groupby().resample()[[cols]] without key columns raise KeyError #47605

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 11, 2022

Conversation

GYHHAHA
Copy link
Contributor

@GYHHAHA GYHHAHA commented Jul 6, 2022

@pep8speaks
Copy link

pep8speaks commented Jul 6, 2022

Hello @GYHHAHA! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-07-06 05:46:26 UTC

@mroeschke mroeschke added Groupby Resample resample method labels Jul 6, 2022
@GYHHAHA
Copy link
Contributor Author

GYHHAHA commented Jul 6, 2022

The reason why the current result only returns one column is _cython_agg_general blocks the non-numeric dtype when calling the aggregate from resample.

result = obj.groupby(self.grouper, axis=self.axis).aggregate(how, **kwargs)

result = self._cython_agg_general(

I think we should allow the aggregation on datetime-like columns when corresponding key(s) is/are included, but this will change the current groupby behavior (and I'm not sure which one is desired). Also this may be related to #47177 , not clear about the logic.

>>> df = pd.DataFrame(
    data={
        "date": pd.date_range(start="2016-01-01", periods=8),
        "group": [0, 0, 0, 0, 1, 1, 1, 1],
    }
)
>>> df.groupby("group")[["date", "group"]].mean()
# original
       group
group
0        0.0
1        1.0
# changed
                     date group
group
0     2016-01-02 12:00:00   0.0
1     2016-01-06 12:00:00   0.0

cc @mroeschke @rhshadrach

@GYHHAHA
Copy link
Contributor Author

GYHHAHA commented Jul 6, 2022

I find a simple solution is keep the resample exclusions empty:

self.exclusions = frozenset([self.groupby.key])

But this fails the following test:
def test_resample_groupby_agg_object_dtype_all_nan(consolidate):

I personally think the test is not reasonable since we do want to get all the valid aggreation column results, thus the expect should have shape of (6, 4) instead of (6, 3). And as mentioned in the above pr, the default behavior will be changed in the future. A little confused. Discussion needed.

@GYHHAHA
Copy link
Contributor Author

GYHHAHA commented Jul 6, 2022

Since now we have DEPR the numeric_only on agg for resample and groupby, we should also DEPR the agg on groupby.resample agg function, which is currently not. And also the numeric_only keyword is still not work.

>>>df=pd.DataFrame({"a":[0,0,1,1], "b":pd.date_range("20200101", "20200104")})
>>>df.groupby("a").resample("D", on="b")[["a", "b"]].mean()
# should DEPR but not
                a
a b
0 2020-01-01  0.0
  2020-01-02  0.0
1 2020-01-03  1.0
  2020-01-04  1.0
>>>df.groupby("a").resample("D", on="b")[["a", "b"]].mean(numeric_only=False)
# should include "b" column agg result but not
                a
a b
0 2020-01-01  0.0
  2020-01-02  0.0
1 2020-01-03  1.0
  2020-01-04  1.0

@jreback
Copy link
Contributor

jreback commented Jul 8, 2022

@GYHHAHA #47605 (comment) is fine to do but another PR

@GYHHAHA
Copy link
Contributor Author

GYHHAHA commented Jul 9, 2022

For the current BUG, I think the fix is enough. I can not add the test for ["val", "date"] for this one since the result will be influenced by latter DEPR pr. @jreback

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current fix looks reasonable cc @rhshadrach

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mroeschke mroeschke added this to the 1.5 milestone Jul 11, 2022
@mroeschke mroeschke merged commit b03389e into pandas-dev:main Jul 11, 2022
@mroeschke
Copy link
Member

Thanks @GYHHAHA

@GYHHAHA GYHHAHA deleted the patch-1 branch July 12, 2022 02:00
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
…or (pandas-dev#47605)

* Update resample.py

* Update v1.5.0.rst

* Update test_resampler_grouper.py

* delete blank

* Update test_resampler_grouper.py

* Update v1.5.0.rst

* Update resample.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Resample resample method
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: df.groupby().resample()[[cols]] without key columns raise KeyError
5 participants