-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PERF: Possible performance regression for indexing from 0.12 to 0.13.1 #6882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Are those numbers microseconds? There was some microsecond-level overhead added in 0.13.1, which I've seen and tried to address, but it was agreed that shaving off several dozen (hundred?) of additional function calls might not be worth it, because that overhead didn't scale with container size. For example, on my 3.3GHz i3 one microsecond is about 6 function calls: In [1]: def foo(x): return x
In [2]: timeit foo(1)
10000000 loops, best of 3: 149 ns per loop FTR, there's a pull request with a lot of big container indexing benchmarks, most likely including ones shown here. I remember it showing some unexpected slowdowns for datetime indices, but I haven't yet looked at them. |
Hey @immerrr thanks for the quick reply! Yes these numbers are in microseconds. The reason I noticed is that I have a program that updates a large number of pre-allocated time series and data frames date after date so the increase in time was noticeable. I understand this is not a huge issue but I thought I'd bring it up since I saw no mention of it elsewhere. For my application it did lead to a significant time increase (~1.7 times slower). |
@pmorissette you need to make sure that you are vectorizing |
Hey @jreback yeah vectorizing would indeed be the way to go but for my application this is not possible. The values I am updating are only known at time t and I must loop through all the dates one at a time. It is convenient to have the data in a pandas TimeSeries for my application, but perhaps a quicker data storage solution could work and I could create a TimeSeries on demand when necessary. Some testing will be in order. Also, I will look at iat/at to see if I can get a speed improvement. Thanks for the help! |
I just ran my benchmark using .iat and .at and they too are slower in 0.13.1 vs 0.12. These two methods are also slower than the basic bracket indexing. Again, these are microseconds. Not a big deal individually but it adds up in my use case.
|
you realize that substantial changes took place in 0.13 if these microseconds matter to you |
@jreback sounds good - just wanted to bring it up since I didn't see this issue mentioned elsewhere. Pandas is great and I appreciate all the hard work that goes into this library. Thanks again. |
@pmorissette , sometimes |
@immerrr ok cool I'll take a look! |
my point before is that iat/at are faster than iloc/loc they are all prob slower than 0.12 a bit we normally don't optimize to microseconds as if that actually matter you are generally going about the problem in the wrong way |
@jreback ok understood. Thanks for the heads up. |
Hey all,
Just upgraded my pandas version from 0.12 to 0.13.1 and noticed a significant performance regression for indexing operations (get, set, and windowing).
Here is my test setup code:
Here is a table showing the results of IPython's %timeit function.
I did not see up-to-date data on http://pandas.pydata.org/pandas-docs/vbench/vb_indexing.html - am I looking at the right benchmark data? Most charts end in June 2012.
Can someone confirm this slowdown?
I am using numpy 1.8.1 by the way - let me know if you need any other version numbers.
Thanks in advance!
The text was updated successfully, but these errors were encountered: