PERF: upgrade khash lib to 0.2.8 #8547

immerrr · 2014-10-13T08:18:02Z

This should close #8524.

The idea is that quadratic probing (i**2 + i)/2 is faster than double hashing and can be shown to traverse all elements if nbuckets == 2**N.

This branch compiles and passes tests, but I haven't done any benchmarks yet.

TODO

(probably in other issues)

see if the "flag compression" hack can be removed, it complicates syncing and
I'm not sure if there was any noticeable performance increase
request upstream klib maintainers to add a way to override "kh_inline",
e.g. by wrapping its definition with #ifndef kh_inline, so that
PANDAS_INLINE flag can be moved away from khash.h and specified in
`khash_python.h" with

#define kh_inline PANDAS_INLINE
#include "khash.h"

I don't like it that pandas makes khint64_t definition signed. In
upstream it is unsigned which -- I agree -- is inconsistent, but bit shifting
operations on signed integers have a) undefined behaviour according to the
standard and b) different semantics which might cause performance degradation
on some hash functions:

int main(int argc, char *argv[])
{
  int64_t foo = -1;
  uint64_t ufoo = *((uint64_t*)&foo);
  printf("foo: %lx; foo>>1: %lx\n", foo, foo >> 1);
  printf("ufoo: %lx; ufoo>>1: %lx\n", ufoo, ufoo >> 1);
  return 0;
}

$ ./a.out 
foo: ffffffffffffffff; foo>>1: ffffffffffffffff
ufoo: ffffffffffffffff; ufoo>>1: 7fffffffffffffff

memory allocation functions kmalloc/krealloc/kfree should be overridden to
use PyMem interface to be shown in tracemalloc output. PyMem_Calloc is
only provided for 3.5, so it may require more work than just defining a macro
rename.
khash-0.2.8 API introduces -1 return values for operations that have failed,
most likely because of memory allocation errors. these should be handled and
reported as MemoryErrors to the interpreter

immerrr · 2014-10-23T08:25:16Z

tl;dr let's adopt xxhash for string hashing. cc-ing performance guys: @jreback, @bjonen, what do you think? Are there any other hash functions uses that may benefit by using xxhash?

As reported in #8524, new probing algorithm brings in a corner case: strings that have common prefix and differ in last few characters with x31 hash function fall into neighbouring buckets which causes a lot more collisions because step sequence is 1, 3, 6, 10... This causes serious performance degradation:

double hashing, hash=x31

In [9]: %%timeit np.random.seed(0); s = pd.util.testing.rands_array(1, 10000)
ht = pd.hashtable.StringHashTable(len(s)); ht.factorize(s)
   ...: 
10000 loops, best of 3: 157 µs per loop


In [2]: %%timeit np.random.seed(0); s = pd.util.testing.rands_array(2, 10000)
ht = pd.hashtable.StringHashTable(len(s)); ht.factorize(s)
   ....: 
1000 loops, best of 3: 537 µs per loop

In [11]: %%timeit np.random.seed(0); s = ' ' * 100 + pd.util.testing.rands_array(2, 10000)
ht = pd.hashtable.StringHashTable(len(s)); ht.factorize(s)
   ....: 
1000 loops, best of 3: 1.87 ms per loop

quadratic probing, hash=x31

In [1]: %%timeit np.random.seed(0); s = pd.util.testing.rands_array(1, 10000)
..ht = pd.hashtable.StringHashTable(len(s)); ht.factorize(s)
   ...: 
10000 loops, best of 3: 150 µs per loop

In [2]: %%timeit np.random.seed(0); s = pd.util.testing.rands_array(2, 10000)
ht = pd.hashtable.StringHashTable(len(s)); ht.factorize(s)
   ...: 
100 loops, best of 3: 2.51 ms per loop

In [3]: %%timeit np.random.seed(0); s = ' ' * 100 + pd.util.testing.rands_array(2, 10000)
ht = pd.hashtable.StringHashTable(len(s)); ht.factorize(s)
   ...: 
100 loops, best of 3: 5.17 ms per loop

quadratic probing, hash=sbox

I've tried using sbox hash as described here:

In [1]: %%timeit np.random.seed(0); s = pd.util.testing.rands_array(1, 10000)
ht = pd.hashtable.StringHashTable(len(s)); ht.factorize(s)
   ...: 
10000 loops, best of 3: 159 µs per loop

In [2]: %%timeit np.random.seed(0); s = pd.util.testing.rands_array(2, 10000)
ht = pd.hashtable.StringHashTable(len(s)); ht.factorize(s)
   ...: 
1000 loops, best of 3: 409 µs per loop

In [3]: %%timeit np.random.seed(0); s = ' ' * 100 + pd.util.testing.rands_array(2, 10000)
ht = pd.hashtable.StringHashTable(len(s)); ht.factorize(s)
   ...: 
1000 loops, best of 3: 1.6 ms per loop

It beats old version in all tested scenarios, but I was unable to find any licensing information, so bringing it in may be a grey-ish legal area

quadratic probing, hash=xxhash.

Then I went to see comparative benchmarks involving sbox and found this one mentioning xxHash. I pulled it in and it performed great on longer strings but had some slowness on the shortest ones:

In [1]: %%timeit np.random.seed(0); s = pd.util.testing.rands_array(1, 10000)
ht = pd.hashtable.StringHashTable(len(s)); ht.factorize(s)
   ...: 
1000 loops, best of 3: 211 µs per loop

In [2]: %%timeit np.random.seed(0); s = pd.util.testing.rands_array(2, 10000)
ht = pd.hashtable.StringHashTable(len(s)); ht.factorize(s)
   ...: 
1000 loops, best of 3: 505 µs per loop

In [3]: %%timeit np.random.seed(0); s = ' ' * 100 + pd.util.testing.rands_array(2, 10000)
ht = pd.hashtable.StringHashTable(len(s)); ht.factorize(s)
   ...: 
1000 loops, best of 3: 867 µs per loop

jreback · 2014-10-23T11:52:55Z

@immerrr would it be possible to do something like this (have to weight code complexitiy/ benefit here)

sample 3 values from the array, get lengths
if avg length is small:
   use exisiting hasher
else:
   use new one

(obviously this could only be used INTERNALLY in a single function as the hashes are different)

immerrr · 2014-10-23T12:01:24Z

This would require keeping two versions of khash, I strongly object against it.

It should be a lot easier to precompute xxhash values for 0-1 length strings and look them up from a table.

immerrr · 2014-10-24T07:02:02Z

Huh, so it turns out I was benchmarking wrong stuff :)

hashtable.StringHashtable is not used anywhere, pd.factorize only uses int64, float64 and generic object tables, I wonder if we could use string table for factorization anywhere at all (without incurring type inference penalty).

strbox hash table OTOH is used, but at first sight I couldn't say what purpose does it serve there and how should I put together a representative benchmark for it.

* khash_python.h: make pyobject_cmp static to avoid public symbol clashes * khash_python.h: add include-guard

kh_del is broken after removing isdel flag, ensure it's never called by deleting it altogether.

jreback · 2015-05-09T16:07:16Z

closing pls reopen if/when updated

jreback added the Performance Memory or execution speed performance label Oct 13, 2014

jreback added this to the 0.15.1 milestone Oct 13, 2014

immerrr force-pushed the upgrade-khash-lib branch 2 times, most recently from 9b3faa3 to 9745cc8 Compare October 13, 2014 12:54

immerrr mentioned this pull request Oct 13, 2014

Klib upgrade (factorizing performance increase) #8524

Open

immerrr added 10 commits October 30, 2014 08:16

* khash.h: move custom hash map definitions into khash_python.h

b6297ab

* khash_python.h: make pyobject_cmp static to avoid public symbol clashes * khash_python.h: add include-guard

khash.pxd: remove kh_del

04bf4bf

kh_del is broken after removing isdel flag, ensure it's never called by deleting it altogether.

khash.h: sync with 0.2.8 upstream release

a0e436c

Add factorization vb suite

d4ccf37

Move PANDAS_INLINE detection outside to khash_python.h

17d0b23

Add short str test

6de9afc

drop khuint64_t

33254bc

Change str hash function to xxhash

996867b

Reorganize factorization benchmarks to speed up initialization

921b897

Add hashtable for C buffer type (ptr + len) avoiding strlen overhead

275ef41

immerrr force-pushed the upgrade-khash-lib branch from 969a5bb to 275ef41 Compare October 30, 2014 05:17

jreback modified the milestones: 0.15.2, 0.16.0 Nov 26, 2014

jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015

FrancescElies mentioned this pull request Mar 13, 2015

use stock khash from attractivechaos/klib visualfabriq/bquery#25

Merged

jreback closed this May 9, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: upgrade khash lib to 0.2.8 #8547

PERF: upgrade khash lib to 0.2.8 #8547

immerrr commented Oct 13, 2014

immerrr commented Oct 23, 2014

jreback commented Oct 23, 2014

immerrr commented Oct 23, 2014

immerrr commented Oct 24, 2014

jreback commented May 9, 2015

PERF: upgrade khash lib to 0.2.8 #8547

PERF: upgrade khash lib to 0.2.8 #8547

Conversation

immerrr commented Oct 13, 2014

TODO

immerrr commented Oct 23, 2014

double hashing, hash=x31

quadratic probing, hash=x31

quadratic probing, hash=sbox

quadratic probing, hash=xxhash.

jreback commented Oct 23, 2014

immerrr commented Oct 23, 2014

immerrr commented Oct 24, 2014

jreback commented May 9, 2015