Skip to content

Fix #2185: Add bytecode idempotency checks. #2272

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 5, 2017

Conversation

nicolasstucki
Copy link
Contributor

Add bytecodeIdemporency to compilation tests. This tests compiles any compilation test twice and compares the resulting bytecodes (currently dotty-compiler and tests/pos).

@@ -14,7 +14,7 @@ import Decorators._
class TastyPickler {

private val sections = new mutable.ArrayBuffer[(NameRef, TastyBuffer)]
val uuid = UUID.randomUUID()
private val uuid = UUID.fromString("3cee1b79-c03a-4125-b337-d067b5cb3a94") // TODO: use a hash of the tasty tree
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to merge this change as-is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DarkDimius, you talked about making the hash for it. Did you have something specific in mind?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DarkDimius I changes the tasty UUID to a hash of the nameBuffer and sections.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This idea looks good to me.

val blacklisted = Set(
"pos/Map/scala/collection/immutable/Map",
"pos/Map/scala/collection/immutable/AbstractMap",
"pos/t1203a/NodeSeq"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a blacklist? If these files fail the test, shoudn't issues be filed for them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will. I just haven't had time to do it yet.

Copy link
Contributor Author

@nicolasstucki nicolasstucki Apr 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added them to #2274.

compileFilesInDir("../tests/pos", opt)
}
def idempotency2() = {
compileList("dotty1", compilerSources ++ backendSources ++ backendJvmSources, opt) +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be interesting to combine this with a bootstrap test: Compile the compiler, then compile the compiler with the compiled compiler, and check equality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will be interesting, but I would add it as a second step in another PR.

Copy link
Contributor

@DarkDimius DarkDimius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@nicolasstucki nicolasstucki force-pushed the idempotency-checks branch 2 times, most recently from 5b1dd94 to 6e3b580 Compare April 19, 2017 15:14
* @see [[http://code.google.com/p/smhasher]]
*/
object MurmurLongHash3 extends MurmurLongHash3 {
final val arraySeed = 0x3c074a61
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are seeds ints?

* objects.
*
* This algorithm is designed to generate well-distributed non-cryptographic
* hashes. It is designed to hash data in 32 bit chunks (ints).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

64 now, right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes


import java.lang.Long.{ rotateLeft => rotl }

private[util] class MurmurLongHash3 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I think this should be contributed upstream to scala-library.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. But I need to reimplement the algorithm from the source. @DarkDimius showed me some issues with my simple port. I will try to make it compatible MurmurHash3 to be able to just add it inside in the future.

@DarkDimius
Copy link
Contributor

As discussed in person, using murmur3 to emmit 64-bit hashes is not evidenced to be a good hash.
Similarly, using 128bit mumur3 and taking part of is isn't good iether(murmu3 is not cryptographic hash).

I propose we don't use murmur family for this one, and instead use FarmHash / CityHash, which have been open-sourced by google and are used to hash entries in maps(our use-case).

@smarter
Copy link
Member

smarter commented Apr 26, 2017

As discussed in person, using murmur3 to emmit 64-bit hashes is not evidenced to be a good hash.

Interesting, why?

@nicolasstucki nicolasstucki force-pushed the idempotency-checks branch 2 times, most recently from 37ae02d to 9f57381 Compare April 27, 2017 12:06
@nicolasstucki
Copy link
Contributor Author

Updated hashing

@@ -0,0 +1,186 @@
package dotty.tools.dotc.util

/* Ported from https://github.com/google/cityhash/blob/master/src/city.cc */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should preserve the copyright/license header of the original file, and you should update AUTHORS.md to note this.


private final def fetch64(idx: Int)(implicit data: Array[Byte]): Long = {
var x: Long = data(idx)
x = data(idx + 1) | (x << 8)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Careful with sign extension (because Byte is signed). If I recall correctly, you need to add& 0xff everywhere you convert from a Byte to an Int to avoid that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add some unit tests for this class, to make sure it behaves like the reference implementation.

@nicolasstucki
Copy link
Contributor Author

@DarkDimius the hash function was updated.

@DarkDimius
Copy link
Contributor

As discussed in person, using murmur3 to emmit 64-bit hashes is not evidenced to be a good hash.

Interesting, why?

While real cryptographic hashes are expected to remain "good" if you take a subsection of it, you don't have this guarantee for non-cryptograhic hashes in general.
Murmur3 family is not cryptographic and does not include 64-bit hash. Using 128-bit hash and only taking a part of it may be a weak hash and I didn't quickly find an evidence to be proven otherwise.

That's why I've proposed FarmHash / CityHash that have native 64-bit variants.

@nicolasstucki, I've lost context, we did we decide to use Elf hashing instead of CityHash?

@nicolasstucki
Copy link
Contributor Author

The context for using Elf hashing instead of CityHash was because the implementation is trivial and it is simpler to be sure that it is correctly implemented. But, if we want to use CityHash we can just revert 426a33a.

@felixmulder
Copy link
Contributor

@DarkDimius - is this good to go now?

@DarkDimius DarkDimius merged commit f28c984 into scala:master May 5, 2017
@allanrenucci allanrenucci deleted the idempotency-checks branch December 14, 2017 19:23
@Blaisorblade Blaisorblade mentioned this pull request Aug 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants