Skip to content

Use util Collections more widely #9680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Sep 3, 2020

Conversation

odersky
Copy link
Contributor

@odersky odersky commented Aug 29, 2020

We now have high performance mutable set and map implementations in dotc.util. Generalize them and use them more widely.

@odersky odersky force-pushed the optimize-collections branch 3 times, most recently from 33eabb4 to 733f812 Compare August 30, 2020 10:51
@odersky
Copy link
Contributor Author

odersky commented Aug 30, 2020

test performance please

@dottybot
Copy link
Member

performance test scheduled: 1 job(s) in queue, 0 running.

@dottybot
Copy link
Member

Performance test finished successfully:

Visit http://dotty-bench.epfl.ch/9680/ to see the changes.

Benchmarks is based on merging with master (4c99388)

@odersky
Copy link
Contributor Author

odersky commented Aug 30, 2020

test performance please

@dottybot
Copy link
Member

performance test scheduled: 1 job(s) in queue, 0 running.

@dottybot
Copy link
Member

performance test failed:

Please check http://lamppc37.epfl.ch:8000/pull-9680-08-30-18.51.out for more information

@odersky
Copy link
Contributor Author

odersky commented Aug 30, 2020

test performance please

@dottybot
Copy link
Member

performance test scheduled: 1 job(s) in queue, 0 running.

@dottybot
Copy link
Member

Performance test finished successfully:

Visit http://dotty-bench.epfl.ch/9680/ to see the changes.

Benchmarks is based on merging with master (75c8176)

@odersky
Copy link
Contributor Author

odersky commented Sep 1, 2020

This PR constitutes some further steps where we use exclusively our own hash maps and hash sets in dotc.util. These use open hashing with linear scan which tends to be faster than closed hashing and also consumes less memory. The main disadvantages of this open hashing scheme are:

  • performance deteriorates faster if the hash function is not good
  • no Linked... versions (at least for now)

@odersky odersky force-pushed the optimize-collections branch from 5232d5a to 0f31e85 Compare September 1, 2020 13:14
@liufengyun
Copy link
Contributor

retest performance please

@dottybot
Copy link
Member

dottybot commented Sep 2, 2020

performance test scheduled: 15 job(s) in queue, 1 running.

@@ -1564,14 +1564,14 @@ object SymDenotations {
initPrivateWithin: Symbol)
extends SymDenotation(symbol, maybeOwner, name, initFlags, initInfo, initPrivateWithin) {

import util.HashTable
import util.IdentityHashMap
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

util.IdentityHashMap and java.util.IdentityHashMap are too easy to confuse IMHO. Can we use a different name like EqHashMap?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, if you happen to write your package declaration as package dotty.tools.dotc.core, then import util.IdentityHashMap will compile but import the Java map!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's somewhat intentional. There's no reason anymore to use a java.util.IdentityHashMap. The dotc.util map is uniformly better in both performance and types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, if you happen to write your package declaration as package dotty.tools.dotc.core, then import util.IdentityHashMap will compile but import the Java map!

Ah I see what you mean now. Yes, we should do something about that.

@dottybot
Copy link
Member

dottybot commented Sep 2, 2020

Performance test finished successfully:

Visit http://dotty-bench.epfl.ch/9680-1/ to see the changes.

Benchmarks is based on merging with master (0d22c74)

@odersky
Copy link
Contributor Author

odersky commented Sep 2, 2020

test performance please

@dottybot
Copy link
Member

dottybot commented Sep 2, 2020

performance test scheduled: 21 job(s) in queue, 1 running.

@dottybot
Copy link
Member

dottybot commented Sep 2, 2020

Performance test finished successfully:

Visit http://dotty-bench.epfl.ch/9680-1/ to see the changes.

Benchmarks is based on merging with master (95a6b44)

@odersky odersky force-pushed the optimize-collections branch from ac5f608 to 54d5466 Compare September 2, 2020 16:43
Also, give it a more standard map interface.
Also, fix removals in small tables
Breakout specialized version for `eq` and `equals`. The duplicate
some methods for efficiency.
The reason for using a linear map instead of a mutable.Map here is that
most derived instances are very small.
It is already `empty` in SimpleIdentitySet. That way
we use the same convention as in the stdlib.

# Conflicts:
#	compiler/src/dotty/tools/dotc/util/LinearIdentityMap.scala
 - Increase load factor to 0.5 -- the miss rate seems to be still OK
(around one miss per hit or better) while re-sizing costs and go down.

 - Don't lose lowest hashCode bit.
Reduce synchronized region to updates.
Per-run cache of what is returned for a staticRef
Make lookup return a Value | Null result instead, which makes it a bit safer to use.
To support this well, forward port a casting method from the explicit-nulls branch.
It's now an alias of util.IdentityHashMap[Symbol, _]
@odersky odersky force-pushed the optimize-collections branch from 54d5466 to e6597f3 Compare September 2, 2020 16:59
@odersky odersky force-pushed the optimize-collections branch from e6597f3 to 785a31b Compare September 3, 2020 08:42
I overlooked a backport of a fix from HashMap to HashSet.
@liufengyun
Copy link
Contributor

test performance please

@dottybot
Copy link
Member

dottybot commented Sep 3, 2020

performance test scheduled: 15 job(s) in queue, 1 running.

@dottybot
Copy link
Member

dottybot commented Sep 3, 2020

Performance test finished successfully:

Visit http://dotty-bench.epfl.ch/9680-1/ to see the changes.

Benchmarks is based on merging with master (0f757c5)

@odersky
Copy link
Contributor Author

odersky commented Sep 3, 2020

test performance please

@dottybot
Copy link
Member

dottybot commented Sep 3, 2020

performance test scheduled: 16 job(s) in queue, 1 running.

@dottybot
Copy link
Member

dottybot commented Sep 3, 2020

Performance test finished successfully:

Visit http://dotty-bench.epfl.ch/9680-1/ to see the changes.

Benchmarks is based on merging with master (cdfc76e)

Copy link
Contributor

@liufengyun liufengyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

The speedup is small, but at least there is no regression.

extension [T](x: T | Null):

/** Assert `x` is non null and strip `Null` from type */
inline def nn: T =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: for readability

Suggested change
inline def nn: T =
inline def notNull: T =

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current proposal is to use .nn when explicit nulls are in. The theory is that that's something you'll have to do fairly often, so it's good to have a non-obtrusive name.

* Flow-typing under explicit nulls will automatically insert many necessary
* occurrences of uncheckedNN.
*/
inline def uncheckedNN: T = x.asInstanceOf[T]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: for readability

Suggested change
inline def uncheckedNN: T = x.asInstanceOf[T]
inline def uncheckedNotNull: T = x.asInstanceOf[T]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave it as is, since I believe that will be the name of the (compiler-generated) method when explicit-nulls is enabled.

* and as a HashSet for larger sizes.
*/
opaque type LinearSet[Elem >: Null <: AnyRef] =
immutable.Set[Elem] | HashSet[Elem]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: LinearSet is not a great name in the context of data structures, linear associates closer to complexity theory than type theory :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be a good other name?

@odersky odersky merged commit b3f908d into scala:master Sep 3, 2020
@odersky odersky deleted the optimize-collections branch September 3, 2020 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants