Skip to content

REF: implement groupby.ops.WrappedCythonFunc #40733

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 2, 2021

Conversation

jbrockmendel
Copy link
Member

I think this is much closer to "the correct" abstraction here (i find the layered abstractions with similar names in core.groupby hard to follow). Contains \approx 0 groupby logic (takes ngroups as an argument in some methods) and is only about how to wrap/call/unwrap the libgroupby functions.

There are a few more parts of _cython_operation I'd like to move, but I'm holding off on that pending some non-refactor work that touches those pieces.

No specific plans, but I'm hopeful that something like this can be re-used to de-duplicate logic in nanops and maybe algorithms.

This uses classmethods/staticmethods (i like the explicit statelessness) and passes how+kind as arguments, but an alternative would be to instantiate with how+kind passed to __init__.

The only potentially-logical change is that when defining out_dtype, this returns "float64" instead of "float". I haven't checked if that makes a difference on 32bit builds, but after weeks of dealing with np.intp much prefer the explicitness.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on this


codes, _, _ = self.group_info
out_shape = WrappedCythonOp.get_output_shape(how, kind, ngroups, values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would go one step further and use an instance of WrappedCythonOp here, then you can pass whatever you need into the construct and just ask the instance for properties (if you want to make this simple), you can use a dataclass for example.

I find this pattern better than using a class like this (meaning the class is just a collection of methods here).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated + green

@jreback jreback added Groupby Refactor Internal refactoring of code labels Apr 1, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine, some suggestions for possible followups

Dispatch logic for functions defined in _libs.groupby
"""

def __init__(self, kind: str, how: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prob should be after the class variables / methods, but nbd


codes, _, _ = self.group_info
out_shape = cy_op.get_output_shape(ngroups, values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

future, would just have a single method here where you pass ngroup, values, may be able to determine is_numeric inside the WrappedCythonOps (instead of here), and just return the items you need, though you can take it a step further and have a WrappedAggregateOp and WrappedTransformOp, something like

op = WrappedCythonOps(kind, how, values.....) # other args
# op is WrappedTransform / WrappedAggregate
result = op.get_result()

@jreback jreback added this to the 1.3 milestone Apr 2, 2021
@jreback jreback merged commit e69df38 into pandas-dev:master Apr 2, 2021
@jbrockmendel jbrockmendel deleted the ref-gb-cyfunc branch April 2, 2021 02:22
@jorisvandenbossche
Copy link
Member

@jbrockmendel can you hold off a bit with such refactoring changes to groupby/ops.py until #40672 (comment) is resolved?

@jbrockmendel
Copy link
Member Author

can you hold off a bit with such refactoring

Sure. The two things I have queued up in this area are 1) a refactor aimed specifically at ameliorating the linked concern and 2) a bugfix where we currently cast int64s to float64s in a lossy way. When I post them ill put them in "draft" mode and cc you on them.

vladu pushed a commit to vladu/pandas that referenced this pull request Apr 5, 2021
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants