PERF: fix slow s.loc[[0]] #9127

shoyer · 2014-12-22T07:49:35Z

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
series_loc_list_like                         |   0.2840 | 125.0146 |   0.0023 |
series_loc_array                             |   0.9947 | 125.5813 |   0.0079 |
series_loc_scalar                            |   0.0430 |   0.0424 |   1.0150 |
series_loc_slice                             |   0.0647 |   0.0606 |   1.0668 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

Whee!

I also wrote this fix for IntervalIndex (#8707); this change pulls it out separately. I believe it changes the complexity of the lookup check from O(n*m) for length n index and length m key to O(n+m).

jorisvandenbossche · 2014-12-22T09:39:39Z

@shoyer Maybe at once add the exact same tests for [], .ix[] and iloc[] ?

jreback · 2014-12-22T10:10:04Z

@shoyer looks good
needs a release note and good 2 go

Fixes GH9126

shoyer · 2014-12-22T21:18:18Z

@jorisvandenbossche Good idea! Just amended in those tests.

I'll merge this once Travis gives the OK.

PERF: fix slow s.loc[[0]]

jreback · 2014-12-23T01:27:27Z

@shoyer first merge! congrats

jreback · 2014-12-23T03:33:46Z

@shoyer when the original post is from SO/ml then I usually post an update (in that location), something like: they this is now fixed and will be in the xxx release

shoyer · 2014-12-23T09:33:30Z

Done! Thanks for the reminder

On Mon, Dec 22, 2014 at 7:33 PM, jreback [email protected] wrote:

@shoyer when the original post is from SO/ml then I usually post an update - hey this is now fixed and will be in the xxx release

Reply to this email directly or view it on GitHub:
#9127 (comment)

sergeny · 2014-12-24T01:23:30Z

Thank you! I've also looked into an earlier issue #6683, and it looks like it still stands.

Is it basically true that .ixalways outperforms .loc, .iloc ,.at, and .iat, contrary to all of the documentation, and the only reason for not using .ix is better type or bounds checking?

a=pd.DataFrame(data=np.zeros([10000,6]))
In [80]: %timeit a.ix[2343,4]
100000 loops, best of 3: 4.32 µs per loop

In [81]: %timeit a.at[2343,4]
100000 loops, best of 3: 6.53 µs per loop
# different semantics, but it is slower than both .ix and .at. Why ?!
In [82]: %timeit a.iat[2343,4]
a.ix[2343,4]100000 loops, best of 3: 7.96 µs per loop

In [87]: %timeit a.ix[xrange(1000,2000),4]
1000 loops, best of 3: 306 µs per loop
In [88]: %timeit a.loc[xrange(1000,2000),4]
1000 loops, best of 3: 998 µs per loop
In [89]: %timeit a.iloc[xrange(1000,2000),4]
1000 loops, best of 3: 538 µs per loop

sergeny · 2014-12-24T01:38:53Z

I see... .iat gets faster than .ix eventually, on a large dataset; and .iloc does get faster, but only on random sets of indices (which is what matters), not an ordered, contiguous sequences. Presumably, there is a special case in .ix to optimize that.

jreback · 2014-12-24T01:49:10Z

@Commentor

I am completely puzzled why this analysis matters in the slightest. If you are doing a small number of indexings, then the microseconds difference makes no difference. If you are doing a large number of lookups then this is completely the wrong approach. Simply looks them up at the same time.

If you for some reason you really want to look up individual values fast, just drop down to numpy.

In [4]: %timeit df.ix[2343,4]
100000 loops, best of 3: 4.27 us per loop

In [5]: %timeit df[4].values[2343]
100000 loops, best of 3: 3.19 us per loop

In [6]: x = df[4]

In [7]: x = df[4].values

In [8]: %timeit x[2343]
10000000 loops, best of 3: 87 ns per loop

Using an indexer to lookup

In [9]: indexer = np.random.randint(0,len(df),size=1000)

 [10]: %timeit df.loc[indexer,4]
1000 loops, best of 3: 1.34 ms per loop

In [11]: %timeit df.ix[indexer,4]
1000 loops, best of 3: 1.27 ms per loop

In [12]: %timeit df.iloc[indexer,4]
1000 loops, best of 3: 484 us per loop

In [13]: %timeit x[indexer]
1000000 loops, best of 3: 1.74 us per loop

The point is that pandas wants to have a correct lookup first. .ix has some odd and surpising edge cases is not strict enough for most folks with the fallback integer indexing. The decision was made to supplant this with a suite of indexers that try hard to very strictly corrrect on there behavior. This includes a variety of bounds and type checking.

Feel free to use whatever indexer you want.

sergeny · 2014-12-24T01:56:55Z

Thank you, jreback. Now I have a very clear idea.

The advice about going down to numpy level is also very helpful; I did not think about that. My system is replying to real-time calls, and when it starts to process the call, it selects some data from a large dataframe. So yes, I will definitely consider switching to the numpy level after enough testing.

For other purposes, I'll switch back to .loc after a new release that fixes #9127.

shoyer added the Performance Memory or execution speed performance label Dec 22, 2014

shoyer added this to the 0.16.0 milestone Dec 22, 2014

shoyer mentioned this pull request Dec 22, 2014

Why is DataFrame.loc[[1]] 1,800x slower than df.ix [[1]] and 3,500x than df.loc[1]? #9126

Closed

PERF: fix slow s.loc[[0]]

3979dc7

Fixes GH9126

shoyer force-pushed the fix-slow-loc branch from 34bddb8 to 3979dc7 Compare December 22, 2014 21:16

shoyer added a commit that referenced this pull request Dec 23, 2014

Merge pull request #9127 from shoyer/fix-slow-loc

ab20769

PERF: fix slow s.loc[[0]]

shoyer merged commit ab20769 into pandas-dev:master Dec 23, 2014

shoyer deleted the fix-slow-loc branch December 23, 2014 01:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: fix slow s.loc[[0]] #9127

PERF: fix slow s.loc[[0]] #9127

shoyer commented Dec 22, 2014

jorisvandenbossche commented Dec 22, 2014

jreback commented Dec 22, 2014

shoyer commented Dec 22, 2014

jreback commented Dec 23, 2014

jreback commented Dec 23, 2014

shoyer commented Dec 23, 2014

@shoyer when the original post is from SO/ml then I usually post an update - hey this is now fixed and will be in the xxx release

sergeny commented Dec 24, 2014

sergeny commented Dec 24, 2014

jreback commented Dec 24, 2014

sergeny commented Dec 24, 2014

PERF: fix slow s.loc[[0]] #9127

PERF: fix slow s.loc[[0]] #9127

Conversation

shoyer commented Dec 22, 2014

jorisvandenbossche commented Dec 22, 2014

jreback commented Dec 22, 2014

shoyer commented Dec 22, 2014

jreback commented Dec 23, 2014

jreback commented Dec 23, 2014

shoyer commented Dec 23, 2014

@shoyer when the original post is from SO/ml then I usually post an update - hey this is now fixed and will be in the xxx release

sergeny commented Dec 24, 2014

sergeny commented Dec 24, 2014

jreback commented Dec 24, 2014

sergeny commented Dec 24, 2014