-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: interpolate_1d returns function to apply columnwise #34728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: interpolate_1d returns function to apply columnwise #34728
Conversation
pandas/core/missing.py
Outdated
order=order, | ||
**kwargs, | ||
) | ||
def func(yvalues: np.ndarray) -> np.ndarray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make this a module level function and give it a nice name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc-string as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it needs to be a closure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really? can you pass any needed arguments, this makes it really hard to grok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code in this function is applied columnwise, the perf gains will come from doing some things once instead of 100x for the asv.
I prefer this functional approach to pre-validation as it keeps related code together. There is some more cleaning which could move more into the outer function once it's called less often
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand, but having this an outer function with passing args should not have any impact on performance. It simply a more understandable approach. ok with merging then refactoring to be simpler later though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its basically the processing of the index which only needs to be done once. However, with the bulk of the time on the preserve_nans set logic, there is no sig perf gains yet, hence draft for now.
originally grouped the validation see #34628. So can do it that way if preferred.
i.e. method, xvalues = missing.clean_interp_method(method, index, **kwargs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kk sure, i think you have 1 PR pending that we should merge before this i think. but lmk what you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll close this to clear the queue for now. Some deeper changes now planned so will reopen once complete. |
This doesn't provide a significant improvement for the existing asv due to the bulk of the time creating python sets which is in the function applied columnwise. see #34727
even without #34727 this could provide perf improvements for other index types ( and for unsorted indexes with creating a column-wise function for the numpy call.
will look at adding asvs for these cases.