QuoteUnpicker.unpickleExpr and QuoteUnpicker.unpickleType are slow #12228

deusaquilus · 2021-04-26T15:19:26Z

Compiler version

3.0.0-RC3

I am getting ~700ms range bottlenecks from unpickleExpr and unpickleType (blue and light-blue in the chart below) for the Proto-Quill parser and expr-accumulation (for a simple query with 20 parameters). I can't accommodate large nested queries with this kind of latency.

Are there any tricks I can use to optimize unpickling?
Are there any compiler args I can use to make unpickling faster or to disable it for certain parts of the code?

Also, if you want more detailed information in the flame graph, let me know which packages you'd like me to instrument and filter for (since I can't instrument the entire Dotty).

nicolasstucki · 2021-04-26T15:44:30Z

Is it possible to get a wider flame graph or an interactive one? It is impossible to see the method names.

Can you create a small self-contained example that shows the same issue?

nicolasstucki · 2021-04-26T15:51:48Z

It would also be quite useful to see the timings of the code within unpickleTerm https://github.com/lampepfl/dotty/blob/9632f6b68ea726cf2b219f55995e0bd80abd66c1/compiler/src/dotty/tools/dotc/quoted/PickledQuotes.scala#L56-L63

We should know what are the proportions of unpickle, spliceTypes and spliceTerms are for this method.

deusaquilus · 2021-04-26T16:28:09Z

I'll try to get some more detailed information about what happening is inside of unpickleExpr/unpickleType.
It's hard to make a self-contained example because this flame-graph is basically the entire ProtoQuill quotation subsystem. I could publish the ProtoQuill library to maven-central and then a small code-sample that uses it if that helps.

deusaquilus · 2021-04-26T16:41:00Z

Also, do you have JProfiler? If you do, I can send you the saved JProfile session that you can interactively explore.
If not, you can get a trial version. They also offer academic licenses.

deusaquilus · 2021-04-26T17:08:11Z

Here is a bit more info. I have colorized unpickle (pink), unpickleTerm (red), and spliceTerms (purple). I didn't find spliceTypes anywhere in the chart. It seems QuoteUnpickler.unpickle is the one that takes the most time.
Color legend is below.

I can send you the profiling session separately if you'd like.

deusaquilus · 2021-04-26T17:56:28Z

Here is even more info with additional colorizations. You can see DottyUnpicker.tree (light green) slow things down in many places, also RefTree.isTerm() and NamedType.denot() (both are light blue in different shades).

deusaquilus · 2021-04-26T18:02:37Z

Also, it is worth noting that TreeAccumulator.foldOverTree also appears in many places (in yellow). It might also be useful to optimize.

deusaquilus · 2021-04-26T18:03:22Z

Hopefully, this provides some additional useful information. Please let me know if you need something else.

deusaquilus · 2021-04-26T22:23:54Z

Here's one more chart with the info about "hot-spots" in the dotty.tools.dotc call stack.

Here is the call-tree of one instance where I use unpickleTerm.

Here's the flame-graph of that unpickleTerm call.

Hope this helps.

odersky · 2021-04-27T07:35:36Z

It seems to read a very large number of names. We should analyze which names are serialized when we pickle quotes and whether we can cut the number somehow.

nicolasstucki · 2021-04-27T07:54:02Z

One thing we should be able to do is cache the result of PickleQuotes.unpickle.

nicolasstucki · 2021-04-27T09:54:03Z

@deusaquilus we will need a small self-contained example that is representative of the kind of code you have and that shows latency in the performance. This will be used for our tests and benchmarks to ensure we properly fix the issue and do not get regressions in the future.

deusaquilus · 2021-04-27T12:03:21Z

One thing we should be able to do is cache the result of PickleQuotes.unpickle.

Yes! I think that would be really good.

In order to make a self contained example, I need to implement at least a Quill AST and a Unlifter. Maybe also a Quoter and a Parser. It would basically be a mini-quill implementation. It will be have to be more like a “medium sized” self contained example.

nicolasstucki · 2021-04-27T14:06:29Z

Otherwise, do you know which quote is taking the longes time to unpickle in your profile?

nicolasstucki · 2021-04-27T14:37:07Z

After doing some tests I found that unpickling '{ println($e + "abc") } where e = Expr("abc") I found the following results.
These are quick and dirty benchmarks using some System.nanoTime in unpickleTerm. Nevertheless, they show the potential improvement.

Warm compiler

unpickle> 15.41μs
spliceTypes> 0.041μs
spliceTerms> 2.464μs

Warm compiler (cached unpickle)

unpickle> 0.047μs
spliceTypes> 0.029μs
spliceTerms> 1.508μs

Cold compiler

// first unpickling
unpickle> 3888.469μs
spliceTypes> 3.571μs
spliceTerms> 1831.784μs

// second unpickling
unpickle> 473.927μs
spliceTypes> 0.401μs
spliceTerms> 97.896μs

// third unpickling
unpickle> 310.548μs
spliceTypes> 0.25μs
spliceTerms> 74.405μs

Cold compiler (cached unpickle)

// first unpickling
unpickle> 3576.882μs
spliceTypes> 3.56μs
spliceTerms> 1736.942μs

// second unpickling
unpickle> 11.535μs
spliceTypes> 0.545μs
spliceTerms> 133.605μs

// third unpickling
unpickle> 2.536μs
spliceTypes> 0.27μs
spliceTerms> 81.706μs

nicolasstucki · 2021-04-27T14:41:10Z

There is still a question of the granularity of the cache

Per macro expansion?
Per compilation unit?
Per run?

Should we have a maximum size for the cache? Keys should be cheap as they contain strings that exist in the bytecode. On the other hand, trees might cause some memory pressure.

deusaquilus · 2021-04-27T14:56:41Z

The quotation that this was benchmarked from was this.

val result = testContext.run(liftQuery(insertValues).foreach(e => insert(e)))

This was from PostgresAsyncEncodingSpec.scala which is in quill-jasync-postgres. I can create a branch where this is the only thing under quill-sql if that helps.

I think the granularity of the cache should be at least Per macro expansion if not even higher, in Quill the results of the LiftMacro and the QuoteMacro are frequently reused. Any lower of a granularity I think would be totally useless for Quill's use-case.

In terms of memory size, I think there should be no limit by default, memory is fairly cheap these days. Maybe there should be a compiler-arg that sets this value if needed, that might be useful for builds in CI environments where resources are more constrained.

nicolasstucki · 2021-04-27T15:31:32Z

@deusaquilus I think now I can create a benchmark from scratch. Will be simpler to tune and diagnose.

deusaquilus · 2021-04-28T04:04:31Z

@nicolasstucki Please have a look at this. I have created a miniature quill implementation of ~270 lines of code that represents the majority of Quill's use case:
https://github.com/deusaquilus/miniquill/blob/master/src/main/scala/org/stuff/MiniQuill.scala

I also have created a file that represents some large entities and join clauses that are characteristic of the kinds of things that I see in the corporate world (or at least as much as it can be with such a small implementation). This macro takes ~1 second to compile on a warm JVM and 5+ seconds on a cold jvm. It spends the vast majority of that time inside unpickleTerm.
https://github.com/deusaquilus/miniquill/blob/master/src/main/scala/org/stuff/UseMiniQuillLarge.scala

I also have a smaller query that is not a good benchmarking example but it is a good reference for what the most basic kind of query looks like:
https://github.com/deusaquilus/miniquill/blob/master/src/main/scala/org/stuff/UseMiniQuill.scala

Please, please have a look at MiniQuill.scala and UseMiniQuillLarge.scala and use them in your benchmarking tests. I think they are a great example of what not only Quill will use the macros to do, but many other macro-based frameworks as well.

deusaquilus · 2021-04-28T04:07:40Z

Also, please keep in mind that the queries you see in MiniQuillLarge.scala file's Queries object can either be in the same file, or spread across multiple files. Both kinds of things exist in Quill codebases.

deusaquilus · 2021-04-28T05:54:43Z

Also, feel free to play around with the number of parameters in the LargeModel object if that helps at all. If this is a benchmark you want to run hundreds of times it might be useful to decrease them. That will however most likely make the unpicker look like a smaller part of the latency.

nicolasstucki · 2021-04-29T12:08:37Z

Speed improvements after #12242 can be seen here https://dotty-bench2.epfl.ch/12242/

deusaquilus · 2021-05-03T19:24:41Z

This is awesome!!! In my MiniQuill example, the unpickle cache makes it go down from 13ms to 6.5ms.

Before:

After:

deusaquilus added the itype:bug label Apr 26, 2021

nicolasstucki added itype:enhancement stat:needs info and removed itype:bug labels Apr 26, 2021

nicolasstucki added the area:metaprogramming label Apr 27, 2021

nicolasstucki mentioned this issue Apr 27, 2021

Cache quote unpickling #12242

Merged

nicolasstucki linked a pull request Apr 28, 2021 that will close this issue

Cache quote unpickling #12242

Merged

nicolasstucki closed this as completed in #12242 Apr 29, 2021

QuoteUnpicker.unpickleExpr and QuoteUnpicker.unpickleType are slow #12228

QuoteUnpicker.unpickleExpr and QuoteUnpicker.unpickleType are slow #12228

Comments

deusaquilus commented Apr 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Compiler version

nicolasstucki commented Apr 26, 2021

Uh oh!

nicolasstucki commented Apr 26, 2021

Uh oh!

deusaquilus commented Apr 26, 2021

Uh oh!

deusaquilus commented Apr 26, 2021

Uh oh!

deusaquilus commented Apr 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deusaquilus commented Apr 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deusaquilus commented Apr 26, 2021

Uh oh!

deusaquilus commented Apr 26, 2021

Uh oh!

deusaquilus commented Apr 26, 2021

Uh oh!

odersky commented Apr 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicolasstucki commented Apr 27, 2021

Uh oh!

nicolasstucki commented Apr 27, 2021

Uh oh!

deusaquilus commented Apr 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicolasstucki commented Apr 27, 2021

Uh oh!

nicolasstucki commented Apr 27, 2021

Uh oh!

nicolasstucki commented Apr 27, 2021

Uh oh!

deusaquilus commented Apr 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicolasstucki commented Apr 27, 2021

Uh oh!

deusaquilus commented Apr 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deusaquilus commented Apr 28, 2021

Uh oh!

deusaquilus commented Apr 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicolasstucki commented Apr 29, 2021

Uh oh!

deusaquilus commented May 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deusaquilus commented Apr 26, 2021 •

edited

Loading

deusaquilus commented Apr 26, 2021 •

edited

Loading

deusaquilus commented Apr 26, 2021 •

edited

Loading

odersky commented Apr 27, 2021 •

edited

Loading

deusaquilus commented Apr 27, 2021 •

edited

Loading

deusaquilus commented Apr 27, 2021 •

edited

Loading

deusaquilus commented Apr 28, 2021 •

edited

Loading

deusaquilus commented Apr 28, 2021 •

edited

Loading

deusaquilus commented May 3, 2021 •

edited

Loading