Skip to content

PERF: lib.generate_slices #42097

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 18, 2021
Merged

Conversation

mzeitlin11
Copy link
Member

import numpy as np
import pandas._libs.lib as lib

np.random.seed(0)
inds = np.random.randint(0, 100, 1000000)
ngroups = inds.max() + 1

%timeit lib.generate_slices(inds, ngroups)

Master:
1.86 ms ± 44 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

This pr:
859 µs ± 127 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Adding the cython decorators makes a huge difference here

@mzeitlin11 mzeitlin11 added the Performance Memory or execution speed performance label Jun 18, 2021
@jreback jreback added this to the 1.3 milestone Jun 18, 2021
@jreback jreback merged commit 648eb40 into pandas-dev:master Jun 18, 2021
@jreback
Copy link
Contributor

jreback commented Jun 18, 2021

@meeseeksdev backport 1.3.x

@jreback
Copy link
Contributor

jreback commented Jun 18, 2021

thanks @mzeitlin11

@lumberbot-app
Copy link

lumberbot-app bot commented Jun 18, 2021

Something went wrong ... Please have a look at my logs.

start += group_size
group_size = 0

return np.asarray(starts), np.asarray(ends)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI there is a tiny perf bump from using starts.base instead of np.asarray(starts) in these cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks, good to know

@mzeitlin11 mzeitlin11 deleted the generate_slices branch June 18, 2021 04:49
simonjayhawkins pushed a commit that referenced this pull request Jun 18, 2021
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants