-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: DatetimeIndexResamplerGroupby as_index interaction with .agg() #52819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/groupby/groupby.py
Outdated
@@ -2707,6 +2707,8 @@ def resample(self, rule, *args, **kwargs): | |||
""" | |||
from pandas.core.resample import get_resampler_for_grouping | |||
|
|||
self.as_index = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes the state of the groupby object, meaning results will change if the users performs subsequent operations. In addition, I think groupby(...).resample(...) should behave differently if as_index
is True or False.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HI @rhshadrach, really appreciate the feedback - the more the better please (especially any tips when it comes to working with repo are very welcomed)
I've updated the logic.
Now:
- depending on if we use as_index = False vs as_index = True, we'll get different behaviour
- the groupby's as_index won't get overwritten
- All the other cases provided in bugs descriptions work fine (didn't include them tho, since some of the cases included were throwing FeatureWarning)
I'm not sure if this is an optimal solution, and not sure about variable name, but hope this is good enough : )
Thanks !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like I was wrong above, as_index
shouldn't have an impact here. I think your prior change was in the right direction now, but as I mentioned it is changing state. In Resampler.aggregate
, we can safely do this as:
with com.temp_setattr(self._groupby, 'as_index', True):
result = ResamplerWindowApply(self, func, args=args, kwargs=kwargs).agg()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated
result = ( | ||
DataFrame( | ||
{"a": np.repeat([0, 1, 2, 3, 4], 10), "b": range(50, 100)}, | ||
index=date_range(start=Timestamp.now(), freq="1min", periods=50), | ||
) | ||
.groupby("a", as_index=False) | ||
.resample("2min") | ||
.agg({"b": "min"}) | ||
.reset_index(drop=True) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a test, don't modify result (especially a lossy modification). It is okay to modify expected after a computation to get it to match result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated
Hey @rhshadrach, can I ask for a second review ? |
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
Apologies @szczekulskij - this fell off my radar. Are you interested in continuing? |
Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen. |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.