Skip to content

Some performance related changes #16566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Jan 20, 2023
Merged

Some performance related changes #16566

merged 28 commits into from
Jan 20, 2023

Conversation

odersky
Copy link
Contributor

@odersky odersky commented Dec 21, 2022

A series of optimizations to reduce allocation rates and speed up compilation. I think in the end we got 5-10% depending on benchmark. The optimizations were driven by taking close looks at allocation and cpu profiles generated by async-profiler.

@odersky
Copy link
Contributor Author

odersky commented Dec 22, 2022

test performance please

@dottybot
Copy link
Member

performance test scheduled: 1 job(s) in queue, 0 running.

@mbovel
Copy link
Member

mbovel commented Dec 22, 2022

5 benchmarks failed in the last performance test. It's not clear to me why. I am testing the benchmarks bot on an other PR to see if the failures are specific to this PR or not. Logs are available here: https://dotty-bench.epfl.ch/logs/pull-16566-12-22-10.14.out.

@odersky odersky force-pushed the perf1 branch 3 times, most recently from 4f6eb18 to 0f8d801 Compare December 22, 2022 22:10
@odersky
Copy link
Contributor Author

odersky commented Dec 23, 2022

The idempotency tests pass locally, but fail on the CI. Who can help figure out what goes wrong there?

@mbovel
Copy link
Member

mbovel commented Dec 23, 2022

test performance please

@dottybot
Copy link
Member

performance test scheduled: 1 job(s) in queue, 0 running.

@dottybot
Copy link
Member

Performance test finished successfully:

Visit https://dotty-bench.epfl.ch/16566/ to see the changes.

Benchmarks is based on merging with main (42c361c)

@odersky odersky force-pushed the perf1 branch 5 times, most recently from d94a62b to bdfb08f Compare December 25, 2022 11:25
@odersky
Copy link
Contributor Author

odersky commented Dec 26, 2022

test performance please

@dottybot
Copy link
Member

performance test scheduled: 1 job(s) in queue, 0 running.

@dottybot
Copy link
Member

Performance test finished successfully:

Visit https://dotty-bench.epfl.ch/16566/ to see the changes.

Benchmarks is based on merging with main (6f5bb34)

@odersky
Copy link
Contributor Author

odersky commented Dec 27, 2022

test performance please

@dottybot
Copy link
Member

performance test scheduled: 1 job(s) in queue, 0 running.

@dottybot
Copy link
Member

Performance test finished successfully:

Visit https://dotty-bench.epfl.ch/16566/ to see the changes.

Benchmarks is based on merging with main (6f5bb34)

@odersky
Copy link
Contributor Author

odersky commented Dec 27, 2022

test performance please

@dottybot
Copy link
Member

performance test scheduled: 1 job(s) in queue, 0 running.

@dottybot
Copy link
Member

Performance test finished successfully:

Visit https://dotty-bench.epfl.ch/16566/ to see the changes.

Benchmarks is based on merging with main (6f5bb34)

@odersky
Copy link
Contributor Author

odersky commented Jan 5, 2023

I am going to rebuild this bit by bit.

@odersky
Copy link
Contributor Author

odersky commented Jan 5, 2023

test performance please

@dottybot
Copy link
Member

dottybot commented Jan 5, 2023

performance test scheduled: 1 job(s) in queue, 0 running.

 - Avoid boxing overheads in NameBuffer and TreeBuffer
 - Avoid repeated re-allocations in updateMapWithDeltas
Also have a context pool for committable typer states, which gets used in
isFullyDefined.
 - Don't create additional threads if ParallelPickling = false
 - Reorganize pickling to use common scratch data between different pickles
   to conserve space
 - Cache selectorNames
 - Avoid redundant expensive string computation in Definitions.FunType
We could also make other required methods lazy vals instead of defs. The `newArray`
method showed up in the profiles that's why it was changed first.
Creating diagnostics should be cheap, whereas reportiong them can be expensive.
The reason is that often diagnsotics are created nd then later discarded in normal
backtracking during Typer.

But the way it was set up, every diagnostic computed a stack trace, which is quite expensive.
Regex compilation is expensive, so we should re-use the matcher over
multiple replaceAll calls in:

 - StdNames.str.sanitize
 - Text's lengthWithoutAnsi
On demand, fill the array with zeroes instead of creating a fresh one.
This can save some array allocations.
`parentSyms` maps all parent types. We don't need that if we just want to
work on the superclass.
There are many LookaheadScanner objects, and most don't need a CharBuffer
for literals or comments at all.
Profiles showed that it accounted for a significant percentage of vtable_stub time.
isType also made up for a significant part of vtable stubs. We now compute it when
a Symbol is created and keep it around as a field.
Make it cheaper to compute whether a Period is Nowhere, and also make the
symbol and denot computations on NamedType as small as possible.
It's not that large, is only used twice, and inlining it saves
two argument closures per call.
The new version performs better also for long lists of trees.
The `ensuring` seems to be expensive. Omiting it does not seem
to cause a problem since a denotation that's valid nowhere would
certainly produce other errors when accessed.
Uses just one thread for the rest of pickling. One thread is sufficient since
there is not that much to do and we have time until the backend finishes.

We might want to partially revise that decision when we support pipelined
computation. In that case producing tasty early could be a win. But even in
that case we might want to fine-tune the number of worker threads instead
of relying on some executor. Adding more workers is easy in the new design.
This reverts soke changes from commit d4a5515.

Use again a LinkedHashMap instead of a combination of EqHashMap and ArrayBuffer
@odersky odersky merged commit 865aa63 into scala:main Jan 20, 2023
@odersky odersky deleted the perf1 branch January 20, 2023 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants