CLN/PERF: remove ndarray.take and platform int conversions #13924

chris-b1 · 2016-08-06T19:32:59Z

I was looking into #13745 (GIL on merge), and there's some low hanging fruit in the non-parallel case.

a couple python any calls
unnecessary (?) platform int checks / conversions. I'm testing on Windows - I'm assuming on
Linux 64 this doesn't really make a difference.

In [1]: right = pd.DataFrame({'key': range(10), 'val': range(10)})
   ...: left = pd.DataFrame({'key': range(1,11) * 100000})

# master

In [2]: %timeit left.merge(right, how='inner')
10 loops, best of 3: 104 ms per loop

In [3]: %timeit left.merge(right, how='left')
10 loops, best of 3: 141 ms per loop

In [4]: %timeit left.merge(right, how='outer')
10 loops, best of 3: 125 ms per loop

# PR

In [2]: %timeit left.merge(right, how='inner')
10 loops, best of 3: 79.8 ms per loop

In [3]: %timeit left.merge(right, how='left')
10 loops, best of 3: 110 ms per loop

In [4]: %timeit left.merge(right, how='outer')
10 loops, best of 3: 100 ms per loop

chris-b1 · 2016-08-06T20:06:05Z

More broadly, is there ever a reason to coerce indexers back to a platform int? eg. here

Even ignoring the conversion cost, it seems like take is faster with the int64s. (this is Windows 64, python 2.7)

In [1]: np.random.seed(123)

In [2]: a = np.random.randn(1000000)

In [3]: idx = np.random.randint(0, 1000000, size=1000000).astype('int64')

In [4]: idx_plat = idx.astype(np.int_)

In [5]: %timeit a.take(idx)
100 loops, best of 3: 13.9 ms per loop

In [6]: %timeit a.take(idx_plat)
100 loops, best of 3: 17.5 ms per loop

codecov-io · 2016-08-06T20:48:07Z

Current coverage is 85.30% (diff: 100%)

Merging #13924 into master will decrease coverage by <.01%

@@             master     #13924   diff @@
==========================================
  Files           139        139          
  Lines         50143      50138     -5   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits          42777      42768     -9   
- Misses         7366       7370     +4   
  Partials          0          0

Powered by Codecov. Last update 7e15923...2fce1de

jreback · 2016-08-06T21:34:49Z

there was some discussion of this quite a while ago
it's possible but you still have to work around the problem in Windows
where take and searchsorted need int32
but we almost always have int64 indexers

so you could cast as appropriate in only certain situations (vs all of the ensure stuff that we do)

I think we have enough tests that you could try things and see what breaks

jreback · 2016-08-06T21:45:46Z

pls rebase as just merged: #13925

chris-b1 · 2016-08-06T22:03:07Z

Ah, so numpy on Windows does need a int32 indexer, but only if running 32 bit python. So I can't just throw away these checks, but I think I can instead make it so int64s are treated like a "platform int" on Win 64 (even though they aren't really).

chris-b1 · 2016-08-06T23:32:03Z

xref #3033 - platform int stuff is a can of worms

jreback · 2016-08-06T23:49:38Z

right that was the issue, yes it is a can-o-worms

chris-b1 · 2016-08-08T00:03:59Z

I guess I'm re-purposing this to now be about platform ints, closing #3033. Assuming we actually want to do this, I'd still need to adjust a bunch of tests and root out any remaining corner cases, but here's essentially what I've done / think I'm proposing:

use algos.take_nd instead of ndarray.take throughout
internally, keep all indexers as int64s as much as possible
externally, and at the few numpy interop spots remaining - cast indexers to np.intp - previously was np.int_. This would techincally be an API change for 64 bit Windows, where those aren't the same.

This generally should help performance on Windows 64, by avoiding unneeded casts and using a better indexer. I'm hoping neutral, to slighter better (fewer checks) elsewhere.

jreback · 2016-08-08T00:35:24Z

pandas/core/algorithms.py

-            sorter = values.argsort()
-            ordered = values.take(sorter)
+            sorter = _ensure_int64(values.argsort())
+            ordered = take_nd(values, sorter, allow_fill=False)


I would do as many ensures inside take_nd that u can - so we can just call it with anything and it will work

agree - take_nd already does this check - so I didn't need this

chris-b1 · 2016-08-09T22:43:10Z

I think I went down the wrong path here - there are places where we were implicitly relying on numpy's boundschecking, and I worry about introducing segfaults with the unchecked take_nd - the tests probably caught most everything, but hard to be sure.

So instead, I think I may just redefine platform_int to np.intp (which solves the Windows perf problem) and back out a lot of the other changes.

chris-b1 · 2016-08-15T23:10:03Z

Closed in favor of #13972

sinhrks added Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Aug 6, 2016

chris-b1 force-pushed the merge-perf branch from 2fce1de to 4266388 Compare August 7, 2016 16:22

chris-b1 added 5 commits August 7, 2016 13:04

PERF: merge optimization

2d501c1

wip platform int

997691a

take_nd WIP

78c94ab

WIP - take refactoring

8e31f87

cythonize boundscheck

0ac63bc

chris-b1 force-pushed the merge-perf branch from 4266388 to 0ac63bc Compare August 7, 2016 18:05

fix MultiIndex

896216a

chris-b1 changed the title ~~PERF: merge optimization~~ CLN/PERF: remove ndarray.take and platform int conversions Aug 8, 2016

jreback reviewed Aug 8, 2016
View reviewed changes

chris-b1 mentioned this pull request Aug 12, 2016

PERF/COMPAT: define platform int to np.intp #13972

Closed

chris-b1 closed this Aug 15, 2016

jreback added this to the 0.19.0 milestone Aug 17, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLN/PERF: remove ndarray.take and platform int conversions #13924

CLN/PERF: remove ndarray.take and platform int conversions #13924

chris-b1 commented Aug 6, 2016 •

edited

Loading

chris-b1 commented Aug 6, 2016

codecov-io commented Aug 6, 2016 •

edited

Loading

jreback commented Aug 6, 2016

jreback commented Aug 6, 2016

chris-b1 commented Aug 6, 2016

chris-b1 commented Aug 6, 2016

jreback commented Aug 6, 2016

chris-b1 commented Aug 8, 2016

jreback Aug 8, 2016

chris-b1 Aug 8, 2016

chris-b1 commented Aug 9, 2016

chris-b1 commented Aug 15, 2016

CLN/PERF: remove ndarray.take and platform int conversions #13924

CLN/PERF: remove ndarray.take and platform int conversions #13924

Conversation

chris-b1 commented Aug 6, 2016 • edited Loading

chris-b1 commented Aug 6, 2016

codecov-io commented Aug 6, 2016 • edited Loading

Current coverage is 85.30% (diff: 100%)

jreback commented Aug 6, 2016

jreback commented Aug 6, 2016

chris-b1 commented Aug 6, 2016

chris-b1 commented Aug 6, 2016

jreback commented Aug 6, 2016

chris-b1 commented Aug 8, 2016

jreback Aug 8, 2016

Choose a reason for hiding this comment

chris-b1 Aug 8, 2016

Choose a reason for hiding this comment

chris-b1 commented Aug 9, 2016

chris-b1 commented Aug 15, 2016

chris-b1 commented Aug 6, 2016 •

edited

Loading

codecov-io commented Aug 6, 2016 •

edited

Loading