-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
REF: Refactor Grouping
class to use delegates/polymorphism
#46203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
IIRC the goal I had with refactoring was making Groupings less stateful, only partially successful. I'd be happy to see that baton get picked up. Skeptical about subclasses/factories, but will keep an open mind. |
One thing to be mindful of is needlessly creating multiple paths. A bug that gets fixed in one may not be fixed in the other, making them harder to squash. |
OK, I have something at #46207 that would love a review! It's not too drastic. |
Closing for now, I still think there is potential here, I just got stumped. |
Hi there! Thanks for the work you're doing.
I started trying to write a PR to fix #36327, and that got me into the groupby code. Specifically, the
Grouping
class in core.groupby.grouper. I was thinking that this class could be ripe for a refactor.Currently, this class handles lots of different cases, for example if
self.grouping_vector
is a categorical, or aGrouper
object, or a general array-like. Also it deals with multi-indexes. With one class handling all these cases, the methods are littered with if-elses to decide the right code path.Possible solution: Define an interface for a Grouping, and then each of these different cases implement them differently (eg a CategoricalGrouping, MultiIndexGrouping, etc) Then, instead of calling
Grouping.__init__()
to make your grouping, you create them with a factory method. I think this would greatly simplify reasoning about how a Grouping works.This seemed likely that someone has already thought of this, so I wanted to check. I haven't tried implementing it yet, so I might be missing something that makes this a bad idea. If people think this seems like a good idea then I could write a PR for it. Tagging @jbrockmendel since it looks like they have done a lot of refactoring in there already. Thank you!
The text was updated successfully, but these errors were encountered: