Skip to content

Proposal: Decouple Dotty macros from inlining #5122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
allanrenucci opened this issue Sep 19, 2018 · 16 comments
Closed

Proposal: Decouple Dotty macros from inlining #5122

allanrenucci opened this issue Sep 19, 2018 · 16 comments

Comments

@allanrenucci
Copy link
Contributor

Currently macros in Dotty rely on inlining. A call to a macro is first inlined at call site, then the macro is evaluated. However, inlining must preserve the semantic of the program and perform unpredictable tree transformations.

Macros authors now need to understand what happens during inlining, massage the inline definition to get the trees they expect and reverse some transformation performed by the inliner (e.g. deal with Inline trees).

Let's look through an example that illustrate my point. We would like to implement a macro that can inspect its receiver and its arguments.

import scala.quoted._

class Foo {
  rewrite def myMacro(x: Int): Int = ~Foo.macroImpl('(this), '(x))
}

object Foo {
  def macroImpl(foo: Expr[Foo], x: Expr[Int]): Expr[Int] = // do something
}

And a simple use case:

def foo() = new Foo
def bar() = 1
foo().myMacro(bar())

Here is what we get after inlining:

val x: Int = this.bar()
val Foo_this: Foo = this.foo()
~Foo.macroImpl(Expr(Foo_this), Expr(x))

The inliner lifted out the receiver and the argument of the call to the macro and only give us a reference to them. To workaround this issue, one needs to massage the macro definition a lot in order to get the expected trees:

import scala.quoted._

class Foo

class FooOps(foo: => Foo) {
  rewrite def myMacro(x: => Int) = ~Foo.macroImpl('(foo), '(x))
}

object Foo {
  // now implicit conversion must be in scope to call the macro
  implicit rewrite def FooOps(foo: => Foo): FooOps = new FooOps(foo)

  def macroImpl(foo: Expr[Foo], x: Expr[Int]): Expr[Int] = x
}

Note: This can possibly improve with extension methods

And here is what we get after inlining:

// This can possibly be dead code eliminated if `FooOps` and `foo` are proven pure.
// More massaging can help: FooOps extends AnyVal
val FooOps_this: FooOps =
  new FooOps(this.foo())

Foo.macroImpl(
  Expr(this.foo()),
  Expr(this.bar())
)

Our macro can now inspect inspect the trees of its receiver and its arguments. This is a lot of ceremony for something that is straightforward in scalac and I believe a common use case of macros.

I propose to decouple inlining from macros and re-use the semantic we have in scalac:

For a call receiver.myMacro(args), if myMacro is a macro, the compiler will expand that application by invoking the corresponding macro implementation method, with the abstract-syntax trees of the argument expressions receiver and args as arguments.

One could imagine the syntax being something like (this is not a proposal about the syntax):

class Foo {
  macro def myMacro(x: Int): Int = ~Foo.macroImpl('(this), '(x))
}
object Foo {
  def macroImpl(foo: Expr[Foo], x: Expr[Int]): Expr[Int] = // do something
}

@xeno-by, @liufengyun, @olafurpg, @nicolasstucki I would like to here your opinion and know if there are any concerns or drawbacks about going back to the Scala 2 semantic. I also discussed this with @sjrd and @OlivierBlanvillain.

@nicolasstucki
Copy link
Contributor

The whole point of having inline an top level ~ is to decouple the macro expansion from the code inlininig. What you seem to propose is to recouple the two.

The current scheme also ensures the correct handling of the language call semantics. No need to have special handling of macros.

Another important difference is that a macro in Dotty is just the top-level splice. The definition itself is not what should be referred as the macro, it is just another normal function.

The arguments that the macro recives should always retain the semantics of by value or by name parameters. This ensures that when spliced the generated code will not break the semantics.

If the parameter is by name and it is used only once the original tree should be available. There are still some bugs there.

To inspect a lifted argument there is also the possibility of looking at the RHS of the lifted val. Which will be possible after #4968 is merged.

We are also still missing an optimizer that will remove the useless binding after macro expansion. Previously we relied of the local optimizer that we do not have anymore. We can probably reuse the same optimizer that is used when inlininig after the expansion to cleanup the useless vals.

@OlivierBlanvillain
Copy link
Contributor

The whole point of having inline an top level ~ is to decouple the macro expansion from the code inlininig.

As a macro author why would I care about this decoupling? It seems like an implementation detail of the compiler rather than a feature. Indeed, as a macro author, I always want my macros to produce new trees at call site, at compile time.

@allanrenucci
Copy link
Contributor Author

The whole point of having inline an top level ~ is to decouple the macro expansion from the code inlininig. What you seem to propose is to recouple the two.

In my proposition, there is no such thing as inlining. There is no reason you would need to inline anything before a call to a macro.

The current scheme also ensures the correct handling of the language call semantics. No need to have special handling of macros.

I propose to change the semantic for macros. So if the semantic for a macro is not the language call semantic, I don't think we can say it breaks the semantic.

Another important difference is that a macro in Dotty is just the top-level splice. The definition itself is not what should be referred as the macro, it is just another normal function.

Sure. Is there anything that justify this difference? I propose to change this. The macro is not a "normal" function in my proposal.

The arguments that the macro recives should always retain the semantics of by value or by name parameters. This ensures that when spliced the generated code will not break the semantics.

Same as above. It depends how you define the semantic of a macro.

If the parameter is by name and it is used only once the original tree should be available. There are still some bugs there. To inspect a lifted argument there is also the possibility of looking at the RHS of the lifted val. Which will be possible after #4968 is merged.

This seems like a lot of added complexity for the macros writer for something that should be straightforward. You shouldn't have to lookup a synthetic definition generated by the inliner to recover the tree you want to inspect

We are also still missing an optimizer that will remove the useless binding after macro expansion. Previously we relied of the local optimizer that we do not have anymore. We can probably reuse the same optimizer that is used when inlininig after the expansion to cleanup the useless vals.

You are still missing the ability to elide a tree that cannot be removed by the inliner (for example, if there is a side effect).

Overall I think it is very fragile for macros to takes as input a tree produced by the inliner. Even more if this inliner start performing optimisations

@nicolasstucki
Copy link
Contributor

The current scheme also ensures the correct handling of the language call semantics. No need to have special handling of macros.

I propose to change the semantic for macros. So if the semantic for a macro is not the language call semantic, I don't think we can say it breaks the semantic.

Having a special semantics for macros is an language overhead that is not required. Having the same semantics simplifies the language.

Another important difference is that a macro in Dotty is just the top-level splice. The definition itself is not what should be referred as the macro, it is just another normal function.

Sure. Is there anything that justify this difference? I propose to change this. The macro is not a "normal" function in my proposal.

The idea is to not make the language more complex with no reason.

If the parameter is by name and it is used only once the original tree should be available. There are still some bugs there. To inspect a lifted argument there is also the possibility of looking at the RHS of the lifted val. Which will be possible after #4968 is merged.

This seems like a lot of added complexity for the macros writer for something that should be straightforward. You shouldn't have to lookup a synthetic definition generated by the inliner to recover the tree you want to inspect

This could be provided by a function in the library.

We are also still missing an optimizer that will remove the useless binding after macro expansion. Previously we relied of the local optimizer that we do not have anymore. We can probably reuse the same optimizer that is used when inlininig after the expansion to cleanup the useless vals.

You are still missing the ability to elide a tree that cannot be removed by the inliner (for example, if there is a side effect).

Side effects should never be droped. The language would be quite inconsistent if we did that.

Overall I think it is very fragile for macros to takes as input a tree produced by the inliner. Even more if this inliner start performing optimisations

It seems that this fragility only comes from a lack of a spec for what happens with parameters to inline parameters. Which we will perform at some point.

@LPTK
Copy link
Contributor

LPTK commented Sep 19, 2018

As a macro author, I have to say that getting full trees is also often annoying, because then you have to manually let-bind them to local values or risk duplication and unintended semantic changes. It's all too easy (and I've done it!) to forget that a receiver tree may actually be a complicated and possibly effectful expression, as opposed to a simple stable reference (which it is in the majority of the macro's use cases, so the bug may only be discovered very late).

If the macro is intended to duplicate or suppress effects, then it seems totally reasonable to me that it should specify its parameter as by-name. It's also useful as some basic documentation for users who don't actually want to go and read the implementation of the macro.

You shouldn't have to lookup a synthetic definition generated by the inliner to recover the tree you want to inspect

I think the problem of easily inspecting a tree's definition to inform the behavior of the macro is an orthogonal one. It would be much better if we could make this seamless; for example, have the extractors used for inspection do the legwork for us behind the scenes, of finding definitions attached to each symbol. This is how many DSL compilers work (like those based on LMS).

This approach would have the huge advantage (over the scalac way) that suddenly you can also see the definition of a value that the user actually let-bound themselves (i.e., when they did not syntactically pass the full tree as an argument).

@nicolasstucki
Copy link
Contributor

For reference here is the old idea of inline/meta which also aimed to make Scala 2 macros preserve the call semantics (https://docs.scala-lang.org/sips/inline-meta.html#inlinemeta). In particular:

Before inlining a method application, the compiler first hoists the prefix and by-value arguments of the application into temporary variables. This is done in order to guarantee that applications of inline methods are semantically equivalent to applications of regular methods.

@nicolasstucki
Copy link
Contributor

Thanks for your feedback @LPTK. The inspection of the tree definition will be possible with the changes in #4968. Which goes a bit further and also allows inspection of definitions in already compiled library code.

@liufengyun
Copy link
Contributor

Generally, I like the current approach based on inlining, which is more principled and can be reasoned by programers.

However, maybe macro arguments should never be lifted to avoid the repetitive => trick, as all macros need by-name arguments. If macro authors are given the possibility to choose whether to use by-name or not, it seems they can better control the intended semantics of macros. That depends on whether there are macros that don't need by-name, i.e. don't inspect argument trees. I think there aren't any, as macros that don't inspect trees can be easily implemented by pure inlining without macros.

I think lifting of prefix is acceptable. The first argument is to preserve semantics, as mentioned by @LPTK. The second argument is that from my experience with macros, very few macros inspect trees of the receiver, the typical ones are interpolator macros. The reason is that class member macros are supposed to work with any prefix, thus most macros don't mess with the receiver.

The Inline tree mentioned by @allanrenucci is a little hassle. It would nice if it can be avoided. However, I think it's tolerable as it only happens to a tiny category of interpolater macros that deal with the receiver, and there is a known and principled solution how to deal with that.

@LPTK
Copy link
Contributor

LPTK commented Sep 20, 2018

@liufengyun

maybe macro arguments should never be lifted to avoid the repetitive => trick, as all macros need by-name arguments

I think few macros will need => for all their arguments, and many will not need => for any of their arguments. Not having => does not preclude inspection; it's just a safer default. For example, I don't see why an interpolator macro would need => at all.

If macro authors are given the possibility to choose whether to use by-name or not, it seems they can better control the intended semantics of macros.

Yes, exactly.

That depends on whether there are macros that don't need by-name, i.e. don't inspect argument trees. I think there aren't any

Again, I think the criterion for using => is not whether the macro wants to inspect its argument or not, since inspection of lifted trees can be made convenient by smart extractors; the criterion is whether the macro wants to transform that argument, potentially changing its meaning and effects, in which case I would rather have it marked expicitly as by-name.

@liufengyun
Copy link
Contributor

For example, I don't see why an interpolator macro would need => at all.

Thanks for reminding this use case @LPTK , then I agree with you, it's better to keep => to be consistent with inlining.

@allanrenucci
Copy link
Contributor Author

After discussion, the conclusion was that we would stick to the current macro scheme. However some things need to be improved:

  • Macro authors should be able to easily inspect original trees. They shouldn't have to make anything by name to be able to inspect trees.
  • Inlining details shouldn't leak to macro authors. This means that the inliner should "just inline" and not perform optimisations (e.g. contant folding, partial evaluation, dead code elimination) as it is currently the case. Not sure to which extent we can hide other details such as inline accessors

@nicolasstucki
Copy link
Contributor

More concisely:

  • Preserve call semantics by default
  • Provide a simple way to inspect the contents of the original tree of any argument
  • Do not recouple the inlining logic with macros (i.e. have a macro keyword)

@allanrenucci
Copy link
Contributor Author

Closing in favor of #5132

@LPTK
Copy link
Contributor

LPTK commented Sep 20, 2018

@allanrenucci I don't know if that's on the table, but can it be made possible to inspect the trees associated with val-bound symbols in general? i.e., not just those the original argument trees of those val symbols that were inserted by the inliner.

Conceptually, there should be no difference between the macro user writing foo(bar(42)) and the macro user writing val b = bar(42); foo(b) if foo is a macro that does not take its argument by name.

@nicolasstucki
Copy link
Contributor

@LPTK in general it will always be possible to inspect the right hand side of any definition. The trees for the arguments that @allanrenucci refers are some other functionality that allows you to as for the tree that was originally in the call. Effectively it will only remove sintactic overhead to figure to put yourself to place yourself back into the Scala 2 mindset. As you noted, we will be able to do much more by being able to inspect trees outside the macro expansion.

@Blaisorblade
Copy link
Contributor

The Inline tree mentioned by @allanrenucci is a little hassle. It would nice if it can be avoided. However, I think it's tolerable as it only happens to a tiny category of interpolater macros that deal with the receiver, and there is a known and principled solution how to deal with that.

I’d rather hide or specify those. Without Inline nodes, tree visitors cannot recover source positions: they say where the containing file changes, so that trees can keep storing only positions. However, not sure that tasty-reflect should expose them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants