Some performance related changes #16566

odersky · 2022-12-21T10:57:23Z

A series of optimizations to reduce allocation rates and speed up compilation. I think in the end we got 5-10% depending on benchmark. The optimizations were driven by taking close looks at allocation and cpu profiles generated by async-profiler.

odersky · 2022-12-22T09:13:28Z

test performance please

dottybot · 2022-12-22T09:14:03Z

performance test scheduled: 1 job(s) in queue, 0 running.

mbovel · 2022-12-22T16:40:50Z

5 benchmarks failed in the last performance test. It's not clear to me why. I am testing the benchmarks bot on an other PR to see if the failures are specific to this PR or not. Logs are available here: https://dotty-bench.epfl.ch/logs/pull-16566-12-22-10.14.out.

odersky · 2022-12-23T12:10:09Z

The idempotency tests pass locally, but fail on the CI. Who can help figure out what goes wrong there?

mbovel · 2022-12-23T23:13:10Z

test performance please

dottybot · 2022-12-23T23:14:56Z

performance test scheduled: 1 job(s) in queue, 0 running.

dottybot · 2022-12-24T00:21:29Z

Performance test finished successfully:

Visit https://dotty-bench.epfl.ch/16566/ to see the changes.

Benchmarks is based on merging with main (42c361c)

odersky · 2022-12-26T18:36:34Z

test performance please

dottybot · 2022-12-26T18:37:28Z

performance test scheduled: 1 job(s) in queue, 0 running.

dottybot · 2022-12-26T19:42:34Z

Performance test finished successfully:

Visit https://dotty-bench.epfl.ch/16566/ to see the changes.

Benchmarks is based on merging with main (6f5bb34)

odersky · 2022-12-27T08:35:52Z

test performance please

dottybot · 2022-12-27T08:37:17Z

performance test scheduled: 1 job(s) in queue, 0 running.

dottybot · 2022-12-27T09:41:45Z

Performance test finished successfully:

Visit https://dotty-bench.epfl.ch/16566/ to see the changes.

Benchmarks is based on merging with main (6f5bb34)

odersky · 2022-12-27T11:07:32Z

test performance please

dottybot · 2022-12-27T11:07:37Z

performance test scheduled: 1 job(s) in queue, 0 running.

dottybot · 2022-12-27T12:12:35Z

Performance test finished successfully:

Visit https://dotty-bench.epfl.ch/16566/ to see the changes.

Benchmarks is based on merging with main (6f5bb34)

odersky · 2023-01-05T13:18:38Z

I am going to rebuild this bit by bit.

odersky · 2023-01-05T13:18:46Z

test performance please

dottybot · 2023-01-05T13:19:40Z

performance test scheduled: 1 job(s) in queue, 0 running.

- Avoid boxing overheads in NameBuffer and TreeBuffer - Avoid repeated re-allocations in updateMapWithDeltas

Also have a context pool for committable typer states, which gets used in isFullyDefined.

- Don't create additional threads if ParallelPickling = false - Reorganize pickling to use common scratch data between different pickles to conserve space

- Cache selectorNames - Avoid redundant expensive string computation in Definitions.FunType

We could also make other required methods lazy vals instead of defs. The `newArray` method showed up in the profiles that's why it was changed first.

Creating diagnostics should be cheap, whereas reportiong them can be expensive. The reason is that often diagnsotics are created nd then later discarded in normal backtracking during Typer. But the way it was set up, every diagnostic computed a stack trace, which is quite expensive.

Regex compilation is expensive, so we should re-use the matcher over multiple replaceAll calls in: - StdNames.str.sanitize - Text's lengthWithoutAnsi

…types

On demand, fill the array with zeroes instead of creating a fresh one. This can save some array allocations.

`parentSyms` maps all parent types. We don't need that if we just want to work on the superclass.

There are many LookaheadScanner objects, and most don't need a CharBuffer for literals or comments at all.

Profiles showed that it accounted for a significant percentage of vtable_stub time.

isType also made up for a significant part of vtable stubs. We now compute it when a Symbol is created and keep it around as a field.

Make it cheaper to compute whether a Period is Nowhere, and also make the symbol and denot computations on NamedType as small as possible.

It's not that large, is only used twice, and inlining it saves two argument closures per call.

The new version performs better also for long lists of trees.

The `ensuring` seems to be expensive. Omiting it does not seem to cause a problem since a denotation that's valid nowhere would certainly produce other errors when accessed.

Uses just one thread for the rest of pickling. One thread is sufficient since there is not that much to do and we have time until the backend finishes. We might want to partially revise that decision when we support pipelined computation. In that case producing tasty early could be a win. But even in that case we might want to fine-tune the number of worker threads instead of relying on some executor. Adding more workers is easy in the new design.

This reverts soke changes from commit d4a5515. Use again a LinkedHashMap instead of a combination of EqHashMap and ArrayBuffer

odersky force-pushed the perf1 branch from 25ca122 to 3656091 Compare December 22, 2022 15:20

odersky force-pushed the perf1 branch 3 times, most recently from 4f6eb18 to 0f8d801 Compare December 22, 2022 22:10

odersky force-pushed the perf1 branch from 96712d7 to ba98f3d Compare December 23, 2022 13:39

odersky force-pushed the perf1 branch 5 times, most recently from d94a62b to bdfb08f Compare December 25, 2022 11:25

odersky force-pushed the perf1 branch from 4ba226e to e79d1cd Compare January 5, 2023 13:17

odersky added 27 commits January 20, 2023 12:06

Reduce allocations for pickling

7e0c815

- Avoid boxing overheads in NameBuffer and TreeBuffer - Avoid repeated re-allocations in updateMapWithDeltas

Reduce context creations for value class related ops

cb40386

Refactor context pools

2fbb891

Also have a context pool for committable typer states, which gets used in isFullyDefined.

Avoid creation of Type lists when assigning types to Apply nodes

3d21427

Pickling reorganizations

47f5b02

- Don't create additional threads if ParallelPickling = false - Reorganize pickling to use common scratch data between different pickles to conserve space

Reduce string computations

8dee679

- Cache selectorNames - Avoid redundant expensive string computation in Definitions.FunType

Avoid recomputing hot requiredMethods

ac95795

We could also make other required methods lazy vals instead of defs. The `newArray` method showed up in the profiles that's why it was changed first.

Reuse regex matcher in replaceAll calls

f8e7f78

Regex compilation is expensive, so we should re-use the matcher over multiple replaceAll calls in: - StdNames.str.sanitize - Text's lengthWithoutAnsi

Add specialized versions of tasty.Util.dble for common array element …

e0e703a

…types

Allow to reuse table of a util.{MutableHashSet,MutableHashMap}

c9a4670

On demand, fill the array with zeroes instead of creating a fresh one. This can save some array allocations.

Avoid some Some wrappers when accessing maps

e459eda

Avoid unnecessary uses of parentSyms

10359a5

`parentSyms` maps all parent types. We don't need that if we just want to work on the superclass.

Avoid some boxings of vars

03555e8

Avoid creating large CharBuffers in LookaheadScanners

4e7ab60

There are many LookaheadScanner objects, and most don't need a CharBuffer for literals or comments at all.

Avoid expensive settings lookup in setDenot

50c1595

Make validFor monomorphic

9f13e56

Profiles showed that it accounted for a significant percentage of vtable_stub time.

Cache isType in SymDenotations

84c4fe0

isType also made up for a significant part of vtable stubs. We now compute it when a Symbol is created and keep it around as a field.

Streamline some hot compuations

b28faf9

Make it cheaper to compute whether a Period is Nowhere, and also make the symbol and denot computations on NamedType as small as possible.

Inline rollbackGadtUnless in GadtConstraint

f992f7a

It's not that large, is only used twice, and inlining it saves two argument closures per call.

Replace tpd.mapInline by flattenMapConserve

c920896

The new version performs better also for long lists of trees.

Optimize period equality tests

bc8df3e

Drop expensive escapeToNext

b64b73c

The `ensuring` seems to be expensive. Omiting it does not seem to cause a problem since a denotation that's valid nowhere would certainly produce other errors when accessed.

Drop redundant catch and re-throw

78f1be0

Revert some changes in NameBuffer

586c459

This reverts soke changes from commit d4a5515. Use again a LinkedHashMap instead of a combination of EqHashMap and ArrayBuffer

Make lazy vals threadunsafe.

f2caf05

odersky force-pushed the perf1 branch from 7364f53 to f2caf05 Compare January 20, 2023 11:07

odersky merged commit 865aa63 into scala:main Jan 20, 2023

odersky deleted the perf1 branch January 20, 2023 16:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some performance related changes #16566

Some performance related changes #16566

odersky commented Dec 21, 2022 •

edited

Loading

odersky commented Dec 22, 2022

dottybot commented Dec 22, 2022

mbovel commented Dec 22, 2022

odersky commented Dec 23, 2022

mbovel commented Dec 23, 2022

dottybot commented Dec 23, 2022

dottybot commented Dec 24, 2022

odersky commented Dec 26, 2022

dottybot commented Dec 26, 2022

dottybot commented Dec 26, 2022

odersky commented Dec 27, 2022

dottybot commented Dec 27, 2022

dottybot commented Dec 27, 2022

odersky commented Dec 27, 2022

dottybot commented Dec 27, 2022

dottybot commented Dec 27, 2022

odersky commented Jan 5, 2023

odersky commented Jan 5, 2023

dottybot commented Jan 5, 2023

Some performance related changes #16566

Some performance related changes #16566

Conversation

odersky commented Dec 21, 2022 • edited Loading

odersky commented Dec 22, 2022

dottybot commented Dec 22, 2022

mbovel commented Dec 22, 2022

odersky commented Dec 23, 2022

mbovel commented Dec 23, 2022

dottybot commented Dec 23, 2022

dottybot commented Dec 24, 2022

odersky commented Dec 26, 2022

dottybot commented Dec 26, 2022

dottybot commented Dec 26, 2022

odersky commented Dec 27, 2022

dottybot commented Dec 27, 2022

dottybot commented Dec 27, 2022

odersky commented Dec 27, 2022

dottybot commented Dec 27, 2022

dottybot commented Dec 27, 2022

odersky commented Jan 5, 2023

odersky commented Jan 5, 2023

dottybot commented Jan 5, 2023

odersky commented Dec 21, 2022 •

edited

Loading