Skip to content

REF: Simplify _cython_functions lookup #29246

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Oct 31, 2019

Conversation

jbrockmendel
Copy link
Member

The remaining two non-trivial cases are ("aggregate", "first") and ("aggregate", "median"). The "first" case I think is straightforward. The "median" case I'm reticent to change because it isn't clear to me why it is a dict at all (cc @WillAyd if you have any ideas)

@pep8speaks
Copy link

pep8speaks commented Oct 27, 2019

Hello @jbrockmendel! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-10-30 23:16:54 UTC

values,
labels,
func,
is_numeric,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was is_numeric just not used at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT

):

comp_ids, _, ngroups = self.group_info
if values.ndim > 2:
# punting for now
raise NotImplementedError("number of dimensions is currently limited to 2")
elif transform_func is libgroupby.group_rank:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May or may not be viable but is there a way to make transform_func a partial somewhere upstream to just freeze the one-off arguments that rank would need rather than special-casing here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for that we'd need to do a partial on all the other cases, or just add the ngroups kwarg to group_rank signature.

On the margin I'd like to have fewer partials/lambdas floating around the groupby code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just add ngroups to the rank signatures and then this would just work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update for this (and for n_th above as well) so we don't have differeing signatures

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ambivalent about this. Having these two extra checks here is non-pretty, but changing the signature in the cython func means we have unused args/kwargs there, which is a code smell. @WillAyd thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Jeff and think the unused args would be preferable, at least to try and make these as generic as possible. Could also add a check within the function bodies that they are unused.

I've been guilty of this in the past myself but I think adding special cases in methods like this for one-off function applications is more difficult to track over time

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been outvoted, will change. In the meantime i'll draw your attention to #29294 which should hopefully fix the CI

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is easy to do for group_rank, but would mean a small behavior change for group_nth (which currently ignores min_count). do we want that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok let's fix rank for now and discuss nth? (we can also fix the signature and just ignore the arg for now).

@WillAyd
Copy link
Member

WillAyd commented Oct 28, 2019

So I don't know off the top of my head why this design started; I know I personally added the rank dict following in the footsteps of first and median

My hesitation with this change is that it only seems to solve for rank and leaves the other two in their current state. I could be convinced otherwise but would rather seem them address altogether if we are going to make a change

@jbrockmendel
Copy link
Member Author

My hesitation with this change is that it only seems to solve for rank and leaves the other two in their current state. I could be convinced otherwise but would rather seem them address altogether if we are going to make a change

That makes sense. first is easy to "fix", just need to define default values for rank in group_nth (though AFAICT we never call group_nth with anything but those default values, so we may want to just hard-code it and make group_nth into group_first)

For median the tests pass if we change {"name": "group_median"} to just "group_median", but I expect its a dict for some reason (ideas @jreback?)

):

comp_ids, _, ngroups = self.group_info
if values.ndim > 2:
# punting for now
raise NotImplementedError("number of dimensions is currently limited to 2")
elif transform_func is libgroupby.group_rank:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just add ngroups to the rank signatures and then this would just work?

@jbrockmendel
Copy link
Member Author

can we just add ngroups to the rank signatures and then this would just work?

That would work, yes

@jbrockmendel
Copy link
Member Author

Updated per comments. group_rank gets a new unused parameter, and group_nth we're punting on for the time being

@jreback jreback added this to the 1.0 milestone Oct 31, 2019
@jreback jreback merged commit 3598a5e into pandas-dev:master Oct 31, 2019
@jreback
Copy link
Contributor

jreback commented Oct 31, 2019

thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the cygetattr branch October 31, 2019 17:13
Reksbril pushed a commit to Reksbril/pandas that referenced this pull request Nov 18, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants