-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF: Simplify _cython_functions lookup #29246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @jbrockmendel! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-10-30 23:16:54 UTC |
values, | ||
labels, | ||
func, | ||
is_numeric, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was is_numeric
just not used at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT
pandas/core/groupby/ops.py
Outdated
): | ||
|
||
comp_ids, _, ngroups = self.group_info | ||
if values.ndim > 2: | ||
# punting for now | ||
raise NotImplementedError("number of dimensions is currently limited to 2") | ||
elif transform_func is libgroupby.group_rank: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May or may not be viable but is there a way to make transform_func
a partial somewhere upstream to just freeze the one-off arguments that rank would need rather than special-casing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for that we'd need to do a partial on all the other cases, or just add the ngroups kwarg to group_rank signature.
On the margin I'd like to have fewer partials/lambdas floating around the groupby code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just add ngroups to the rank signatures and then this would just work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update for this (and for n_th above as well) so we don't have differeing signatures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ambivalent about this. Having these two extra checks here is non-pretty, but changing the signature in the cython func means we have unused args/kwargs there, which is a code smell. @WillAyd thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Jeff and think the unused args would be preferable, at least to try and make these as generic as possible. Could also add a check within the function bodies that they are unused.
I've been guilty of this in the past myself but I think adding special cases in methods like this for one-off function applications is more difficult to track over time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been outvoted, will change. In the meantime i'll draw your attention to #29294 which should hopefully fix the CI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this is easy to do for group_rank, but would mean a small behavior change for group_nth (which currently ignores min_count). do we want that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok let's fix rank for now and discuss nth? (we can also fix the signature and just ignore the arg for now).
So I don't know off the top of my head why this design started; I know I personally added the My hesitation with this change is that it only seems to solve for |
That makes sense. For |
pandas/core/groupby/ops.py
Outdated
): | ||
|
||
comp_ids, _, ngroups = self.group_info | ||
if values.ndim > 2: | ||
# punting for now | ||
raise NotImplementedError("number of dimensions is currently limited to 2") | ||
elif transform_func is libgroupby.group_rank: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just add ngroups to the rank signatures and then this would just work?
That would work, yes |
Updated per comments. group_rank gets a new unused parameter, and group_nth we're punting on for the time being |
thanks @jbrockmendel |
The remaining two non-trivial cases are ("aggregate", "first") and ("aggregate", "median"). The "first" case I think is straightforward. The "median" case I'm reticent to change because it isn't clear to me why it is a dict at all (cc @WillAyd if you have any ideas)