Skip to content

REF: Refactor Grouping class to use delegates/polymorphism #46203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
NickCrews opened this issue Mar 2, 2022 · 4 comments
Closed

REF: Refactor Grouping class to use delegates/polymorphism #46203

NickCrews opened this issue Mar 2, 2022 · 4 comments

Comments

@NickCrews
Copy link
Contributor

Hi there! Thanks for the work you're doing.

I started trying to write a PR to fix #36327, and that got me into the groupby code. Specifically, the Grouping class in core.groupby.grouper. I was thinking that this class could be ripe for a refactor.

Currently, this class handles lots of different cases, for example if self.grouping_vector is a categorical, or a Grouper object, or a general array-like. Also it deals with multi-indexes. With one class handling all these cases, the methods are littered with if-elses to decide the right code path.

Possible solution: Define an interface for a Grouping, and then each of these different cases implement them differently (eg a CategoricalGrouping, MultiIndexGrouping, etc) Then, instead of calling Grouping.__init__() to make your grouping, you create them with a factory method. I think this would greatly simplify reasoning about how a Grouping works.

This seemed likely that someone has already thought of this, so I wanted to check. I haven't tried implementing it yet, so I might be missing something that makes this a bad idea. If people think this seems like a good idea then I could write a PR for it. Tagging @jbrockmendel since it looks like they have done a lot of refactoring in there already. Thank you!

@jbrockmendel
Copy link
Member

IIRC the goal I had with refactoring was making Groupings less stateful, only partially successful. I'd be happy to see that baton get picked up.

Skeptical about subclasses/factories, but will keep an open mind.

@rhshadrach
Copy link
Member

One thing to be mindful of is needlessly creating multiple paths. A bug that gets fixed in one may not be fixed in the other, making them harder to squash.

NickCrews added a commit to NickCrews/pandas that referenced this issue Mar 3, 2022
NickCrews added a commit to NickCrews/pandas that referenced this issue Mar 3, 2022
NickCrews added a commit to NickCrews/pandas that referenced this issue Mar 3, 2022
NickCrews added a commit to NickCrews/pandas that referenced this issue Mar 3, 2022
NickCrews added a commit to NickCrews/pandas that referenced this issue Mar 3, 2022
@NickCrews
Copy link
Contributor Author

OK, I have something at #46207 that would love a review! It's not too drastic.

@NickCrews
Copy link
Contributor Author

Closing for now, I still think there is potential here, I just got stumped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants