-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Steal the algorithm used to combine hashes from tupleobject.c #15227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Here's an initial pass at stealing https://github.com/python-git/python/blob/master/Objects/tupleobject.c#L290 for the combining. I am not 100% that the problem is my (rather crude) combiner, but possibly the exact way we're using the bitmixer in hash_array. I'm trying to think about it........I think we might be maintaining undesirable linearity. May I ask, how did you encounter these collisions? |
If the basic approach looks sound I can add some comments around some of the lazy iterator wackiness. |
arrays = itertools.chain([first], arrays) | ||
|
||
mult = np.zeros_like(first) + np.uint64(1000003) | ||
out = np.zeros_like(first) + np.uint64(0x345678L) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the L is not working in py3. (remove it and its ok).
@mikegraham the collisions I found by hashing
which is basically a cartesian product of 1000 x 1000. nothing special really, just a test case I am using. |
7117b6b
to
e52c872
Compare
e52c872
to
187573b
Compare
closing in favor of in #15224 thanks @mikegraham |
closes #15227 Author: Jeff Reback <[email protected]> Author: Mike Graham <mikegraham2gmail.com> Closes #15224 from jreback/mi_hash2 and squashes the following commits: 8b1d3f9 [Jeff Reback] not correctly hashing categorical in a MI 48a2402 [Jeff Reback] support for mixed type arrays 58f682d [Jeff Reback] memory optimization 0c13df7 [Mike Graham] Steal the algorithm used to combine hashes from tupleobject.c e8dd607 [Jeff Reback] add hash_tuples 44e9c7d [Mike Graham] wipSteal the algorithm used to combine hashes from tupleobject.c e507c4a [Jeff Reback] ENH: support MultiIndex and tuple hashing
closes pandas-dev#15227 Author: Jeff Reback <[email protected]> Author: Mike Graham <mikegraham2gmail.com> Closes pandas-dev#15224 from jreback/mi_hash2 and squashes the following commits: 8b1d3f9 [Jeff Reback] not correctly hashing categorical in a MI 48a2402 [Jeff Reback] support for mixed type arrays 58f682d [Jeff Reback] memory optimization 0c13df7 [Mike Graham] Steal the algorithm used to combine hashes from tupleobject.c e8dd607 [Jeff Reback] add hash_tuples 44e9c7d [Mike Graham] wipSteal the algorithm used to combine hashes from tupleobject.c e507c4a [Jeff Reback] ENH: support MultiIndex and tuple hashing
closes #15224