-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
COMPAT: different orderings in value_counts on 32-bit platforms #11227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cc @behzadnouri |
@jreback what does it return on 32-bit linux? |
`pd.Series([2, 1, 1], index=[5., np.nan, 10.3])`` the orderings of the |
this might be fixed by consistent hashing cc @realead |
@jreback I think in general, we cannot guarantee the same hash for different platforms, e.g. for Python-objects:
the above holds for 64bits. For 32bits the hash will be something different (as the above 64bit-result cannot be stored in 32bit). Because the order of key-count-pairs provided by
However, for float64, we have the same hash function for 32 and 64 bit (at least at the moment). I guess back then, Python's hash was used for doubles and thus the hashes were different between 32 and 64 bit (see the example above), which explains different order. So the example shoud be fine now, but... I must confess, I have changed this test some time ago 4cfa97a#diff-f2bbc83024c5767a6b4afec26ad8efa194d6dd8f140276a22bcf6b5e7bd37102L1197 From my point of view, this is not a bug in the first place: hashes can be different for different platforms (for whatever reasons) thus the order is arbitrary. A way to make order non-arbitrary would be e.g. to enforce insertion order in the result of |
ok i agree. hashtables are reproducible but on that platform only. would you be able to add that test above for 32-bit (in the different ordering) so we can close this issue |
Once #39009 is merged, this issue would no longer exist, because the order would depend on the original ordering and not hash-functions (and thus would be independent of the platform). |
This occurs on 32-bit linux, a slightly different ordering is returned from the hashtable. Only guess is that it is because the indexing is
Py_ssize_t
and this is hashed and has differing values. So the test should be slightly different for those platforms.see test skipping here: d6c7a3a
Not a big deal, but here's the question. Should we guarantee these types of orderings, IOW, use a
int64
instead ofPy_ssize_t
for indexing (on all platforms)?The text was updated successfully, but these errors were encountered: