Skip to content

CLN/PERF: remove ndarray.take and platform int conversions #13924

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

chris-b1
Copy link
Contributor

@chris-b1 chris-b1 commented Aug 6, 2016

I was looking into #13745 (GIL on merge), and there's some low hanging fruit in the non-parallel case.

  • a couple python any calls
  • unnecessary (?) platform int checks / conversions. I'm testing on Windows - I'm assuming on
    Linux 64 this doesn't really make a difference.
In [1]: right = pd.DataFrame({'key': range(10), 'val': range(10)})
   ...: left = pd.DataFrame({'key': range(1,11) * 100000})

# master

In [2]: %timeit left.merge(right, how='inner')
10 loops, best of 3: 104 ms per loop

In [3]: %timeit left.merge(right, how='left')
10 loops, best of 3: 141 ms per loop

In [4]: %timeit left.merge(right, how='outer')
10 loops, best of 3: 125 ms per loop

# PR

In [2]: %timeit left.merge(right, how='inner')
10 loops, best of 3: 79.8 ms per loop

In [3]: %timeit left.merge(right, how='left')
10 loops, best of 3: 110 ms per loop

In [4]: %timeit left.merge(right, how='outer')
10 loops, best of 3: 100 ms per loop

@chris-b1
Copy link
Contributor Author

chris-b1 commented Aug 6, 2016

More broadly, is there ever a reason to coerce indexers back to a platform int? eg. here

Even ignoring the conversion cost, it seems like take is faster with the int64s. (this is Windows 64, python 2.7)

In [1]: np.random.seed(123)

In [2]: a = np.random.randn(1000000)

In [3]: idx = np.random.randint(0, 1000000, size=1000000).astype('int64')

In [4]: idx_plat = idx.astype(np.int_)

In [5]: %timeit a.take(idx)
100 loops, best of 3: 13.9 ms per loop

In [6]: %timeit a.take(idx_plat)
100 loops, best of 3: 17.5 ms per loop

@sinhrks sinhrks added Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Aug 6, 2016
@codecov-io
Copy link

codecov-io commented Aug 6, 2016

Current coverage is 85.30% (diff: 100%)

Merging #13924 into master will decrease coverage by <.01%

@@             master     #13924   diff @@
==========================================
  Files           139        139          
  Lines         50143      50138     -5   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits          42777      42768     -9   
- Misses         7366       7370     +4   
  Partials          0          0          

Powered by Codecov. Last update 7e15923...2fce1de

@jreback
Copy link
Contributor

jreback commented Aug 6, 2016

there was some discussion of this quite a while ago
it's possible but you still have to work around the problem in Windows
where take and searchsorted need int32
but we almost always have int64 indexers

so you could cast as appropriate in only certain situations (vs all of the ensure stuff that we do)

I think we have enough tests that you could try things and see what breaks

@jreback
Copy link
Contributor

jreback commented Aug 6, 2016

pls rebase as just merged: #13925

@chris-b1
Copy link
Contributor Author

chris-b1 commented Aug 6, 2016

Ah, so numpy on Windows does need a int32 indexer, but only if running 32 bit python. So I can't just throw away these checks, but I think I can instead make it so int64s are treated like a "platform int" on Win 64 (even though they aren't really).

@chris-b1
Copy link
Contributor Author

chris-b1 commented Aug 6, 2016

xref #3033 - platform int stuff is a can of worms

@jreback
Copy link
Contributor

jreback commented Aug 6, 2016

right that was the issue, yes it is a can-o-worms

@chris-b1
Copy link
Contributor Author

chris-b1 commented Aug 8, 2016

I guess I'm re-purposing this to now be about platform ints, closing #3033. Assuming we actually want to do this, I'd still need to adjust a bunch of tests and root out any remaining corner cases, but here's essentially what I've done / think I'm proposing:

  1. use algos.take_nd instead of ndarray.take throughout
  2. internally, keep all indexers as int64s as much as possible
  3. externally, and at the few numpy interop spots remaining - cast indexers to np.intp - previously was np.int_. This would techincally be an API change for 64 bit Windows, where those aren't the same.

This generally should help performance on Windows 64, by avoiding unneeded casts and using a better indexer. I'm hoping neutral, to slighter better (fewer checks) elsewhere.

@chris-b1 chris-b1 changed the title PERF: merge optimization CLN/PERF: remove ndarray.take and platform int conversions Aug 8, 2016
sorter = values.argsort()
ordered = values.take(sorter)
sorter = _ensure_int64(values.argsort())
ordered = take_nd(values, sorter, allow_fill=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do as many ensures inside take_nd that u can - so we can just call it with anything and it will work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree - take_nd already does this check - so I didn't need this

@chris-b1
Copy link
Contributor Author

chris-b1 commented Aug 9, 2016

I think I went down the wrong path here - there are places where we were implicitly relying on numpy's boundschecking, and I worry about introducing segfaults with the unchecked take_nd - the tests probably caught most everything, but hard to be sure.

So instead, I think I may just redefine platform_int to np.intp (which solves the Windows perf problem) and back out a lot of the other changes.

@chris-b1
Copy link
Contributor Author

Closed in favor of #13972

@chris-b1 chris-b1 closed this Aug 15, 2016
@jreback jreback added this to the 0.19.0 milestone Aug 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants