-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PERF: fix slow s.loc[[0]] #9127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@shoyer Maybe at once add the exact same tests for |
@shoyer looks good |
Fixes GH9126
@jorisvandenbossche Good idea! Just amended in those tests. I'll merge this once Travis gives the OK. |
@shoyer first merge! congrats |
@shoyer when the original post is from SO/ml then I usually post an update (in that location), something like: they this is now fixed and will be in the xxx release |
Done! Thanks for the reminder On Mon, Dec 22, 2014 at 7:33 PM, jreback [email protected] wrote:
|
Thank you! I've also looked into an earlier issue #6683, and it looks like it still stands. Is it basically true that
|
I see... |
I am completely puzzled why this analysis matters in the slightest. If you are doing a small number of indexings, then the microseconds difference makes no difference. If you are doing a large number of lookups then this is completely the wrong approach. Simply looks them up at the same time. If you for some reason you really want to look up individual values fast, just drop down to numpy.
Using an indexer to lookup
The point is that pandas wants to have a correct lookup first. Feel free to use whatever indexer you want. |
Thank you, jreback. Now I have a very clear idea. The advice about going down to numpy level is also very helpful; I did not think about that. My system is replying to real-time calls, and when it starts to process the call, it selects some data from a large dataframe. So yes, I will definitely consider switching to the numpy level after enough testing. For other purposes, I'll switch back to |
Fixes #9126
Whee!
I also wrote this fix for
IntervalIndex
(#8707); this change pulls it out separately. I believe it changes the complexity of the lookup check from O(n*m) for length n index and length m key to O(n+m).