-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF: implement groupby.ops.WrappedCythonFunc #40733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on this
pandas/core/groupby/ops.py
Outdated
|
||
codes, _, _ = self.group_info | ||
out_shape = WrappedCythonOp.get_output_shape(how, kind, ngroups, values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would go one step further and use an instance of WrappedCythonOp here, then you can pass whatever you need into the construct and just ask the instance for properties (if you want to make this simple), you can use a dataclass for example.
I find this pattern better than using a class like this (meaning the class is just a collection of methods here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated + green
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks fine, some suggestions for possible followups
Dispatch logic for functions defined in _libs.groupby | ||
""" | ||
|
||
def __init__(self, kind: str, how: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prob should be after the class variables / methods, but nbd
|
||
codes, _, _ = self.group_info | ||
out_shape = cy_op.get_output_shape(ngroups, values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
future, would just have a single method here where you pass ngroup
, values
, may be able to determine is_numeric
inside the WrappedCythonOps (instead of here), and just return the items you need, though you can take it a step further and have a WrappedAggregateOp
and WrappedTransformOp
, something like
op = WrappedCythonOps(kind, how, values.....) # other args
# op is WrappedTransform / WrappedAggregate
result = op.get_result()
@jbrockmendel can you hold off a bit with such refactoring changes to groupby/ops.py until #40672 (comment) is resolved? |
Sure. The two things I have queued up in this area are 1) a refactor aimed specifically at ameliorating the linked concern and 2) a bugfix where we currently cast int64s to float64s in a lossy way. When I post them ill put them in "draft" mode and cc you on them. |
I think this is much closer to "the correct" abstraction here (i find the layered abstractions with similar names in core.groupby hard to follow). Contains \approx 0 groupby logic (takes ngroups as an argument in some methods) and is only about how to wrap/call/unwrap the libgroupby functions.
There are a few more parts of _cython_operation I'd like to move, but I'm holding off on that pending some non-refactor work that touches those pieces.
No specific plans, but I'm hopeful that something like this can be re-used to de-duplicate logic in nanops and maybe algorithms.
This uses classmethods/staticmethods (i like the explicit statelessness) and passes how+kind as arguments, but an alternative would be to instantiate with how+kind passed to
__init__
.The only potentially-logical change is that when defining
out_dtype
, this returns "float64" instead of "float". I haven't checked if that makes a difference on 32bit builds, but after weeks of dealing with np.intp much prefer the explicitness.