Skip to content

Fix tasty hash #4865

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Aug 7, 2018
Merged

Fix tasty hash #4865

merged 10 commits into from
Aug 7, 2018

Conversation

nicolasstucki
Copy link
Contributor

@nicolasstucki nicolasstucki commented Jul 29, 2018

There were two bugs

  • The hash of the nameBuffer was not computed
  • The computed hash was 32-bit instead of 64-bit

The combination of the two bugs made the 128-bit UUID have 0s in the first 96 bits.

Previously we where computing the hash for nameBuffer
on an empty array.
@nicolasstucki nicolasstucki requested a review from smarter July 29, 2018 14:14
@smarter
Copy link
Member

smarter commented Jul 29, 2018

So how come this didn't cause any test to fail so far (except the scripted test I guess)? And can we add a test for this?

@nicolasstucki
Copy link
Contributor Author

nicolasstucki commented Jul 29, 2018

This has nothing to do with the scripted tests. I just noted that there were too many zeros in the hash.

@nicolasstucki
Copy link
Contributor Author

nicolasstucki commented Jul 29, 2018

Xoring something with 0 does not change the value so I don't see the point of the if

True, I'm trying to figure out why this implementation pjwHash is only returning Ints

Originaly it was a C `unsigned char`
Previously the constants where set to generate 32-bit ints.
val d = data(i) & 0xFFL // Interpret byte as unsigned byte
h = (h << 8) + d
val high = h & 0xFF00000000000000L
if (high != 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the if, it's not necessary and branches should be avoided in tight loops.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@nicolasstucki nicolasstucki added this to the 0.10 Tech Preview milestone Jul 30, 2018
@nicolasstucki
Copy link
Contributor Author

@smarter don't forget to finish the review on this PR

@Blaisorblade
Copy link
Contributor

Blaisorblade commented Aug 6, 2018

The review on the algorithm can be separated to offload @smarter.
Can we test the input on reference data against some reference implementation?

I've taken two difference reference C implementations, ported them to 64 bits following Wikipedia and its DrDobbs source, and gotten 0x34545c16020230 as hash for
"abcdefghijklmnopqrstuvwxyz1234567890". With the code (I'll upload it), we can test more input data. Let's.

EDIT: not sure we should use this hash, since the original argument didn't work as well as hoped:

The context for using Elf hashing instead of CityHash was because the implementation is trivial and it is simpler to be sure that it is correctly implemented.

EDIT 2: the "reference implementation" I dug out is https://github.com/Blaisorblade/general-purpose-hash-functions

@@ -0,0 +1,25 @@
import dotty.tools.dotc.core.tasty.TastyHash.pjwHash64

object Test {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would write a junit test instead, they're faster to run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Found by comparing testcases with the C version, and by then comparing the
source code; both C versions use an unsigned right shift (or rather a right
shift on unsigned integers).

* Background:

  Right-shifting a signed number copies the sign bit into all the new "empty
  bits"; C uses the integer sign, while Java (and Scala) use a separate
  "unsigned right shift" operator.
@Blaisorblade
Copy link
Contributor

FWIW I fixed a hash bug after @smarter's review — but the hashing details can be reviewed by non-Tasty experts.

@nicolasstucki
Copy link
Contributor Author

@Blaisorblade fix LGTM

Copy link
Contributor

@Blaisorblade Blaisorblade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicolasstucki Fix LGTM, pushed a nitpick, reassigning to you 😉

@nicolasstucki nicolasstucki merged commit c42e325 into scala:master Aug 7, 2018
@Blaisorblade Blaisorblade deleted the fix-tasty-hash branch August 7, 2018 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants