-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
[PERF] taking upper 32bit of PyObject_Hash into account #39592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
asv_bench gives:
As always, when changing a hash functions, it is a mixed bag: for some cases the old hash-function was better, for some the new. However, one can see, that the O(n^2) running time is avoided now:
|
are these degenerate? |
After thinking about it, I have opted for a slighly different hash-reduction from 64bit->32bit. It now has the advantages that it fixes the O(n^2) issue but also makes otherwise minimal changes to the behavior in other cases, thus keeping a better original performance in some corner cases you have highlighted. The timings are now
|
ok @realead looks good! |
can you add a whatsnew note (alt ok to just add this issue onto one of the previous ones for hashing). merge master and ping on greenish |
@jreback green |
thanks @realead |
Until now the upper 32bits of PyObject_Hash aren't taken into account at all. Because for my built-in objects the hash function is very simple (e.g. integers) many "normal" series would have all hashes being 0, which would lead to O(n^2) running times.
Thus for 64bit builds, we need to mangle the upper 32bit into the resulting 32bit-hash.