REF: Simplify _cython_functions lookup #29246

jbrockmendel · 2019-10-27T18:05:54Z

The remaining two non-trivial cases are ("aggregate", "first") and ("aggregate", "median"). The "first" case I think is straightforward. The "median" case I'm reticent to change because it isn't clear to me why it is a dict at all (cc @WillAyd if you have any ideas)

…getattr

pep8speaks · 2019-10-27T18:05:57Z

Hello @jbrockmendel! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-10-30 23:16:54 UTC

WillAyd · 2019-10-28T20:57:59Z

pandas/core/groupby/ops.py

-                values,
-                labels,
-                func,
-                is_numeric,


Was is_numeric just not used at all?

WillAyd · 2019-10-28T21:00:25Z

pandas/core/groupby/ops.py

    ):

        comp_ids, _, ngroups = self.group_info
        if values.ndim > 2:
            # punting for now
            raise NotImplementedError("number of dimensions is currently limited to 2")
+        elif transform_func is libgroupby.group_rank:


May or may not be viable but is there a way to make transform_func a partial somewhere upstream to just freeze the one-off arguments that rank would need rather than special-casing here?

I think for that we'd need to do a partial on all the other cases, or just add the ngroups kwarg to group_rank signature.

On the margin I'd like to have fewer partials/lambdas floating around the groupby code

can we just add ngroups to the rank signatures and then this would just work?

can you update for this (and for n_th above as well) so we don't have differeing signatures

I'm ambivalent about this. Having these two extra checks here is non-pretty, but changing the signature in the cython func means we have unused args/kwargs there, which is a code smell. @WillAyd thoughts?

I agree with Jeff and think the unused args would be preferable, at least to try and make these as generic as possible. Could also add a check within the function bodies that they are unused.

I've been guilty of this in the past myself but I think adding special cases in methods like this for one-off function applications is more difficult to track over time

I've been outvoted, will change. In the meantime i'll draw your attention to #29294 which should hopefully fix the CI

so this is easy to do for group_rank, but would mean a small behavior change for group_nth (which currently ignores min_count). do we want that?

ok let's fix rank for now and discuss nth? (we can also fix the signature and just ignore the arg for now).

WillAyd · 2019-10-28T21:13:28Z

So I don't know off the top of my head why this design started; I know I personally added the rank dict following in the footsteps of first and median

My hesitation with this change is that it only seems to solve for rank and leaves the other two in their current state. I could be convinced otherwise but would rather seem them address altogether if we are going to make a change

jbrockmendel · 2019-10-28T22:00:48Z

My hesitation with this change is that it only seems to solve for rank and leaves the other two in their current state. I could be convinced otherwise but would rather seem them address altogether if we are going to make a change

That makes sense. first is easy to "fix", just need to define default values for rank in group_nth (though AFAICT we never call group_nth with anything but those default values, so we may want to just hard-code it and make group_nth into group_first)

For median the tests pass if we change {"name": "group_median"} to just "group_median", but I expect its a dict for some reason (ideas @jreback?)

jreback · 2019-10-29T16:11:15Z

pandas/core/groupby/ops.py

    ):

        comp_ids, _, ngroups = self.group_info
        if values.ndim > 2:
            # punting for now
            raise NotImplementedError("number of dimensions is currently limited to 2")
+        elif transform_func is libgroupby.group_rank:


can we just add ngroups to the rank signatures and then this would just work?

jbrockmendel · 2019-10-29T16:21:50Z

can we just add ngroups to the rank signatures and then this would just work?

That would work, yes

…getattr

jbrockmendel · 2019-10-31T00:18:46Z

Updated per comments. group_rank gets a new unused parameter, and group_nth we're punting on for the time being

jreback · 2019-10-31T17:10:34Z

thanks @jbrockmendel

jbrockmendel added 3 commits October 25, 2019 19:28

WIP: libgroupby getattr pattern

3dfef1a

Merge branch 'master' of https://github.com/pandas-dev/pandas into cy…

98ad40c

…getattr

revert median

cd67a41

blackify

2e0e76a

WillAyd reviewed Oct 28, 2019

View reviewed changes

WillAyd added Clean Groupby labels Oct 28, 2019

jreback requested changes Oct 29, 2019

View reviewed changes

jbrockmendel added 4 commits October 29, 2019 18:40

Merge branch 'master' of https://github.com/pandas-dev/pandas into cy…

877eed1

…getattr

REF: remove libgroupby dict cases

ce253d8

add dummy arg for group_rank

eed591d

Merge branch 'master' of https://github.com/pandas-dev/pandas into cy…

091c0e8

…getattr

jreback added this to the 1.0 milestone Oct 31, 2019

jreback approved these changes Oct 31, 2019

View reviewed changes

jreback merged commit 3598a5e into pandas-dev:master Oct 31, 2019

jbrockmendel deleted the cygetattr branch October 31, 2019 17:13

Reksbril pushed a commit to Reksbril/pandas that referenced this pull request Nov 18, 2019

REF: Simplify _cython_functions lookup (pandas-dev#29246)

975ce8d

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019

REF: Simplify _cython_functions lookup (pandas-dev#29246)

15179a6

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019

REF: Simplify _cython_functions lookup (pandas-dev#29246)

9ea9fde

Uh oh!

REF: Simplify _cython_functions lookup #29246

REF: Simplify _cython_functions lookup #29246

Uh oh!

Conversation

jbrockmendel commented Oct 27, 2019

Uh oh!

pep8speaks commented Oct 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2019-10-30 23:16:54 UTC

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WillAyd commented Oct 28, 2019

Uh oh!

jbrockmendel commented Oct 28, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbrockmendel commented Oct 29, 2019

Uh oh!

jbrockmendel commented Oct 31, 2019

Uh oh!

jreback commented Oct 31, 2019

Uh oh!

Uh oh!

pep8speaks commented Oct 27, 2019 •

edited

Loading