Skip to content

BUG: DatetimeIndexResamplerGroupby as_index interaction with .agg() #52819

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

szczekulskij
Copy link

@szczekulskij szczekulskij commented Apr 20, 2023

@szczekulskij szczekulskij changed the title (in progress - give me 1 hr to update) BUG: DatetimeIndexResamplerGroupby as_index interaction with .agg() Apr 21, 2023
@@ -2707,6 +2707,8 @@ def resample(self, rule, *args, **kwargs):
"""
from pandas.core.resample import get_resampler_for_grouping

self.as_index = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the state of the groupby object, meaning results will change if the users performs subsequent operations. In addition, I think groupby(...).resample(...) should behave differently if as_index is True or False.

Copy link
Author

@szczekulskij szczekulskij Apr 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI @rhshadrach, really appreciate the feedback - the more the better please (especially any tips when it comes to working with repo are very welcomed)
I've updated the logic.

Now:

  • depending on if we use as_index = False vs as_index = True, we'll get different behaviour
  • the groupby's as_index won't get overwritten
  • All the other cases provided in bugs descriptions work fine (didn't include them tho, since some of the cases included were throwing FeatureWarning)

I'm not sure if this is an optimal solution, and not sure about variable name, but hope this is good enough : )
Thanks !

Copy link
Member

@rhshadrach rhshadrach Apr 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I was wrong above, as_index shouldn't have an impact here. I think your prior change was in the right direction now, but as I mentioned it is changing state. In Resampler.aggregate, we can safely do this as:

with com.temp_setattr(self._groupby, 'as_index', True):
    result = ResamplerWindowApply(self, func, args=args, kwargs=kwargs).agg()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated

Comment on lines 661 to 685
result = (
DataFrame(
{"a": np.repeat([0, 1, 2, 3, 4], 10), "b": range(50, 100)},
index=date_range(start=Timestamp.now(), freq="1min", periods=50),
)
.groupby("a", as_index=False)
.resample("2min")
.agg({"b": "min"})
.reset_index(drop=True)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a test, don't modify result (especially a lossy modification). It is okay to modify expected after a computation to get it to match result.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated

@szczekulskij
Copy link
Author

Hey @rhshadrach, can I ask for a second review ?

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Jun 16, 2023
@rhshadrach
Copy link
Member

Apologies @szczekulskij - this fell off my radar. Are you interested in continuing?

@mroeschke
Copy link
Member

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

@mroeschke mroeschke closed this Jun 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants