-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Some performance related changes #16566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
test performance please |
performance test scheduled: 1 job(s) in queue, 0 running. |
5 benchmarks failed in the last performance test. It's not clear to me why. I am testing the benchmarks bot on an other PR to see if the failures are specific to this PR or not. Logs are available here: https://dotty-bench.epfl.ch/logs/pull-16566-12-22-10.14.out. |
4f6eb18
to
0f8d801
Compare
The idempotency tests pass locally, but fail on the CI. Who can help figure out what goes wrong there? |
test performance please |
performance test scheduled: 1 job(s) in queue, 0 running. |
Performance test finished successfully: Visit https://dotty-bench.epfl.ch/16566/ to see the changes. Benchmarks is based on merging with main (42c361c) |
d94a62b
to
bdfb08f
Compare
test performance please |
performance test scheduled: 1 job(s) in queue, 0 running. |
Performance test finished successfully: Visit https://dotty-bench.epfl.ch/16566/ to see the changes. Benchmarks is based on merging with main (6f5bb34) |
test performance please |
performance test scheduled: 1 job(s) in queue, 0 running. |
Performance test finished successfully: Visit https://dotty-bench.epfl.ch/16566/ to see the changes. Benchmarks is based on merging with main (6f5bb34) |
test performance please |
performance test scheduled: 1 job(s) in queue, 0 running. |
Performance test finished successfully: Visit https://dotty-bench.epfl.ch/16566/ to see the changes. Benchmarks is based on merging with main (6f5bb34) |
I am going to rebuild this bit by bit. |
test performance please |
performance test scheduled: 1 job(s) in queue, 0 running. |
- Avoid boxing overheads in NameBuffer and TreeBuffer - Avoid repeated re-allocations in updateMapWithDeltas
Also have a context pool for committable typer states, which gets used in isFullyDefined.
- Don't create additional threads if ParallelPickling = false - Reorganize pickling to use common scratch data between different pickles to conserve space
- Cache selectorNames - Avoid redundant expensive string computation in Definitions.FunType
We could also make other required methods lazy vals instead of defs. The `newArray` method showed up in the profiles that's why it was changed first.
Creating diagnostics should be cheap, whereas reportiong them can be expensive. The reason is that often diagnsotics are created nd then later discarded in normal backtracking during Typer. But the way it was set up, every diagnostic computed a stack trace, which is quite expensive.
Regex compilation is expensive, so we should re-use the matcher over multiple replaceAll calls in: - StdNames.str.sanitize - Text's lengthWithoutAnsi
On demand, fill the array with zeroes instead of creating a fresh one. This can save some array allocations.
`parentSyms` maps all parent types. We don't need that if we just want to work on the superclass.
There are many LookaheadScanner objects, and most don't need a CharBuffer for literals or comments at all.
Profiles showed that it accounted for a significant percentage of vtable_stub time.
isType also made up for a significant part of vtable stubs. We now compute it when a Symbol is created and keep it around as a field.
Make it cheaper to compute whether a Period is Nowhere, and also make the symbol and denot computations on NamedType as small as possible.
It's not that large, is only used twice, and inlining it saves two argument closures per call.
The new version performs better also for long lists of trees.
The `ensuring` seems to be expensive. Omiting it does not seem to cause a problem since a denotation that's valid nowhere would certainly produce other errors when accessed.
Uses just one thread for the rest of pickling. One thread is sufficient since there is not that much to do and we have time until the backend finishes. We might want to partially revise that decision when we support pipelined computation. In that case producing tasty early could be a win. But even in that case we might want to fine-tune the number of worker threads instead of relying on some executor. Adding more workers is easy in the new design.
This reverts soke changes from commit d4a5515. Use again a LinkedHashMap instead of a combination of EqHashMap and ArrayBuffer
A series of optimizations to reduce allocation rates and speed up compilation. I think in the end we got 5-10% depending on benchmark. The optimizations were driven by taking close looks at allocation and cpu profiles generated by async-profiler.