Skip to content

PERF: cythonize groupby-rank #15779

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Mar 22, 2017 · 2 comments · Fixed by #19481
Closed

PERF: cythonize groupby-rank #15779

jreback opened this issue Mar 22, 2017 · 2 comments · Fixed by #19481
Labels
Groupby Numeric Operations Arithmetic, Comparison, and Logical operations Performance Memory or execution speed performance
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Mar 22, 2017

This dispatches to each group individually. Better to have a combined group_rank to do this. It is a bit of code and ideally would share some with the actual rank algos.

In [7]: ngroups = 1000

In [8]: N = 100000

In [9]: np.random.seed(1234)

In [10]: df = DataFrame({'key': np.random.randint(0, ngroups, size=N), 'value': np.arange(N)})

In [11]: %timeit df.groupby('key').rank()
1 loop, best of 3: 392 ms per loop

# comparision with group_shift_indexer, a transforming operator
In [13]: %timeit df.groupby('key').shift()
100 loops, best of 3: 3.15 ms per loop
@jreback jreback added Difficulty Intermediate Groupby Numeric Operations Arithmetic, Comparison, and Logical operations Performance Memory or execution speed performance labels Mar 22, 2017
@jreback jreback modified the milestones: Next Major Release, Next Minor Release Mar 22, 2017
@jreback jreback modified the milestones: Interesting Issues, Next Major Release Nov 26, 2017
@WillAyd
Copy link
Member

WillAyd commented Jan 25, 2018

I can take a look at this. Any tips on what methods to explore? I was thinking of adding a method to the GroupBy class similar to the others for rank and was looking at the rank method in algos.

It wasn't immediately clear to me the best way to knit that all together so figured I'd get your thoughts if you have any

@jreback
Copy link
Contributor Author

jreback commented Jan 26, 2018

yeah u can make a separate rank routine which takes a group indexer; eg copy something like group_last and integrate with rank

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Numeric Operations Arithmetic, Comparison, and Logical operations Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants