-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: upgrade khash lib to 0.2.8 #8547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9b3faa3
to
9745cc8
Compare
tl;dr let's adopt As reported in #8524, new probing algorithm brings in a corner case: strings that have common prefix and differ in last few characters with double hashing, hash=x31In [9]: %%timeit np.random.seed(0); s = pd.util.testing.rands_array(1, 10000)
ht = pd.hashtable.StringHashTable(len(s)); ht.factorize(s)
...:
10000 loops, best of 3: 157 µs per loop
In [2]: %%timeit np.random.seed(0); s = pd.util.testing.rands_array(2, 10000)
ht = pd.hashtable.StringHashTable(len(s)); ht.factorize(s)
....:
1000 loops, best of 3: 537 µs per loop
In [11]: %%timeit np.random.seed(0); s = ' ' * 100 + pd.util.testing.rands_array(2, 10000)
ht = pd.hashtable.StringHashTable(len(s)); ht.factorize(s)
....:
1000 loops, best of 3: 1.87 ms per loop quadratic probing, hash=x31In [1]: %%timeit np.random.seed(0); s = pd.util.testing.rands_array(1, 10000)
..ht = pd.hashtable.StringHashTable(len(s)); ht.factorize(s)
...:
10000 loops, best of 3: 150 µs per loop
In [2]: %%timeit np.random.seed(0); s = pd.util.testing.rands_array(2, 10000)
ht = pd.hashtable.StringHashTable(len(s)); ht.factorize(s)
...:
100 loops, best of 3: 2.51 ms per loop
In [3]: %%timeit np.random.seed(0); s = ' ' * 100 + pd.util.testing.rands_array(2, 10000)
ht = pd.hashtable.StringHashTable(len(s)); ht.factorize(s)
...:
100 loops, best of 3: 5.17 ms per loop quadratic probing, hash=sboxI've tried using sbox hash as described here: In [1]: %%timeit np.random.seed(0); s = pd.util.testing.rands_array(1, 10000)
ht = pd.hashtable.StringHashTable(len(s)); ht.factorize(s)
...:
10000 loops, best of 3: 159 µs per loop
In [2]: %%timeit np.random.seed(0); s = pd.util.testing.rands_array(2, 10000)
ht = pd.hashtable.StringHashTable(len(s)); ht.factorize(s)
...:
1000 loops, best of 3: 409 µs per loop
In [3]: %%timeit np.random.seed(0); s = ' ' * 100 + pd.util.testing.rands_array(2, 10000)
ht = pd.hashtable.StringHashTable(len(s)); ht.factorize(s)
...:
1000 loops, best of 3: 1.6 ms per loop It beats old version in all tested scenarios, but I was unable to find any licensing information, so bringing it in may be a grey-ish legal area quadratic probing, hash=xxhash.Then I went to see comparative benchmarks involving sbox and found this one mentioning In [1]: %%timeit np.random.seed(0); s = pd.util.testing.rands_array(1, 10000)
ht = pd.hashtable.StringHashTable(len(s)); ht.factorize(s)
...:
1000 loops, best of 3: 211 µs per loop
In [2]: %%timeit np.random.seed(0); s = pd.util.testing.rands_array(2, 10000)
ht = pd.hashtable.StringHashTable(len(s)); ht.factorize(s)
...:
1000 loops, best of 3: 505 µs per loop
In [3]: %%timeit np.random.seed(0); s = ' ' * 100 + pd.util.testing.rands_array(2, 10000)
ht = pd.hashtable.StringHashTable(len(s)); ht.factorize(s)
...:
1000 loops, best of 3: 867 µs per loop |
@immerrr would it be possible to do something like this (have to weight code complexitiy/ benefit here)
(obviously this could only be used INTERNALLY in a single function as the hashes are different) |
This would require keeping two versions of khash, I strongly object against it. It should be a lot easier to precompute xxhash values for 0-1 length strings and look them up from a table. |
Huh, so it turns out I was benchmarking wrong stuff :)
strbox hash table OTOH is used, but at first sight I couldn't say what purpose does it serve there and how should I put together a representative benchmark for it. |
* khash_python.h: make pyobject_cmp static to avoid public symbol clashes * khash_python.h: add include-guard
kh_del is broken after removing isdel flag, ensure it's never called by deleting it altogether.
969a5bb
to
275ef41
Compare
closing pls reopen if/when updated |
This should close #8524.
The idea is that quadratic probing
(i**2 + i)/2
is faster than double hashing and can be shown to traverse all elements ifnbuckets == 2**N
.This branch compiles and passes tests, but I haven't done any benchmarks yet.
TODO
(probably in other issues)
I'm not sure if there was any noticeable performance increase
e.g. by wrapping its definition with
#ifndef kh_inline
, so thatPANDAS_INLINE
flag can be moved away from khash.h and specified in`khash_python.h" with
khint64_t
definition signed. Inupstream it is unsigned which -- I agree -- is inconsistent, but bit shifting
operations on signed integers have a) undefined behaviour according to the
standard and b) different semantics which might cause performance degradation
on some hash functions:
kmalloc/krealloc/kfree
should be overridden touse PyMem interface to be shown in tracemalloc output.
PyMem_Calloc
isonly provided for 3.5, so it may require more work than just defining a macro
rename.
-1
return values for operations that have failed,most likely because of memory allocation errors. these should be handled and
reported as
MemoryError
s to the interpreter