Skip to content

PERF: postpone imports in Index constructor #31423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jorisvandenbossche
Copy link
Member

xref #30790, it's for improving this one mentioned in that issue:

arr_str = np.array(["foo", "bar", "baz"], dtype=object)
%timeit pd.Index(arr_str)

Only importing the index classes when we actually need them gives a 15-20% speedup.

Last week when profiling the Index constructor, I noticed that those imports actually take quite a bit of time (and it's a change that was introduced after 0.25, #28141). I didn't get any further with profiling (this alone doesn't explain the full slowdown), but thought it's an easy one to already do anyway.

@jorisvandenbossche jorisvandenbossche added the Performance Memory or execution speed performance label Jan 29, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.0.0 milestone Jan 29, 2020
@jorisvandenbossche
Copy link
Member Author

cc @simonjayhawkins

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this slowdown persistent with in a session (is import caching not helping)?

@jorisvandenbossche
Copy link
Member Author

%timeit already runs it several times, so that should be OK (but also repeating that multiple times confirms it)
I have to say I was also surprised, I would have thought this to be cheaper, but apparently getting those from the import cache also takes time ..

@jbrockmendel
Copy link
Member

LGTM

but apparently getting those from the import cache also takes time ..

This is surprising and a good catch. I'd be curious whether there's any perf difference between

from pandas.core.indexes.numeric import Float64Index
from .numeric import Float64Index
from pandas import Float64Index

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants