Skip to content

Synthesize Representable type class #3663

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 32 commits into from

Conversation

OlivierBlanvillain
Copy link
Contributor

This PR is the first step towards integrating parts shapeless' generic programming facilities in Dotty.

A Representable type class in added in package dotty.generic, alongside Sum and Prod types to implement the equivalent of shapeless.Generic. Representable[A] is synthesized as a fallback to implicit search, when A is a case class or a sealed trait. To make that possible, child annotations to sealed classes are added in typer instead of PostTyper.

Plans for future work are as follows:

  • Include metadata in the generic representation (starting from labels)

  • Support higher kinded types via Representable1, Sum1 and Prod1. Early experiments (library based) show that this can be done very elegantly in Dotty by using higher kinded counterparts of Representable, Sum and Prod, as done in GHC-generics.

  • Experiment with offload more of the type class derivation work to the compiler (doing less in implicit search). The idea would be require users to write their derivation in a ReprFold type class (similar to shapeless.TypeClass), then implement the actual fold in a Deriving type class, compiler generated:

// Similar to shapeless.TypeClass
trait ReprFold[TC[_]] {
  def imap[A, B](fa: TC[A])(f: A => B)(g: B => A): TC[B] // cats.Invariant
  def unit: TC[Unit]                                     // cats.Cartesian
  def product[A, B](fa: TC[A], fb: TC[B]): TC[(A, B)]    // cats.Cartesian
  def sum[A, B](fa: => TC[A], fb: => TC[B]): TC[Either[A, B]] // ??
}

trait Deriving[A] {
  type Mk[TC[_]] // = implicit (TC[Int], TC[String]) => TC[A]
                 // for A = sealed trait Bar
                 // case class Bis(i: Int)    extends Bar
                 // case class Buz(s: String) extends Bar

  def materialize[TC[_]]
    (implicit g: ReprFold[TC]): Mk[TC] // Compiler generated
}

@milessabin
Copy link
Contributor

milessabin commented Dec 13, 2017

This looks interesting, but I recommend against committing to a concrete representation type, or if you do, go with Tuple2/Unit and Either/Nothing, at least as an interim measure, or perhaps make this parameterizable.

The big issue is with kinding as you've already started to observe with Representable1 etc. I believe that the right approach here is to address kind polymorphism head on.

Also note that shapeless's TypeClass type class is really very limited. One thing that we immediately encountered was a need to be able to thread auxiliary type classes through a derivation. Various attempts were made to generalize TypeClass to support this, but none of them proved satisfactory. The current shapeless model using Lazy (byname in Dotty) and implicit resolution has proved to be a great deal more flexible in practice.

While I support the idea of making functionality of this sort a language intrinsic, I recommend against baking in an implementation which commits to replicating boilerplate at the kind level and limits derivations to such a simple form.

@OlivierBlanvillain
Copy link
Contributor Author

@milessabin Thanks for your input!

The big issue is with kinding as you've already started to observe with Representable1 etc. I believe that the right approach here is to address kind polymorphism head on.

I want to experiment with the GHC.generics approach where higher kinded sum & product are used for both ground types and HKT, something along these lines:

sealed trait Prod[X]
final case class PCons[H[_], T[t] <: Prod[t], X](head: H[X], tail: T[X]) extends Prod[X]
final case class PNil[X]() extends Prod[X]

sealed trait Sum[X]
sealed trait SCons[H[_], T[t] <: Sum[t], X] extends Sum[X]
final case class SLeft[H[_], T[t] <: Sum[t], X](head: H[X]) extends SCons[H, T, X]
final case class SRight[H[_], T[t] <: Sum[t], X](tail: T[X]) extends SCons[H, T, X]
sealed trait SNil[X] extends Sum[X]

trait Representable[A] {
  type Repr[t] <: Sum[t] | Prod[t]

  def to[T](a: A): Repr[T]
  def from[T](r: Repr[T]): A
}

trait Representable1[A[_]] {
  type Repr[t] <: Sum[t] | Prod[t]

  def to[T](a: A[T]): Repr[T]
  def from[T](r: Repr[T]): A[T]
}

// Syntax for ground types
type &:[H, T[t] <: Prod[t]] = [X] => PCons[[Y] => H, T, X]
type |:[H, T[t] <: Sum[t]] = [X] => SCons[[Y] => H, T, X]

// Syntax for HKT
type :&:[H[_], T[t] <: Prod[t]] = [X] => PCons[H, T, X]
type :|:[H[_], T[t] <: Sum[t]] = [X] => SCons[H, T, X]

type Id[t] = t
type Const[t] = [X] => t
sealed trait Tree[T]
case class Leaf[T](t: T) extends Tree[T]
case class Node[T](l: Tree[T], r: Tree[T]) extends Tree[T]

Representable1[Node] { type Repr = Tree :&: Tree :&: PNil }
Representable1[Leaf] { type Repr = Id :&: PNil }
Representable1[Tree] { type Repr = Leaf :|: Node :|: SNil }

Representable[Node[A]] { type Repr = Tree[A] &: Tree[A] &: PNil }
Representable[Leaf[A]] { type Repr = A &: PNil }
Representable[Tree[A]] { type Repr = Leaf[A] |: Node[A] |: SNil }

Also note that shapeless's TypeClass type class is really very limited. One thing that we immediately encountered was a need to be able to thread auxiliary type classes through a derivation.

Are you refering to cases where additional/external type classes are mixed in during derivation? Is it common in practice?

@milessabin
Copy link
Contributor

I find something that stops at * -> * a bit unsatisfying ... it'd be much nicer to have a uniform solution for, eg. Tuple2[A, B] etc. I'd also like to experiment with a "no representation type" approach via Church encoding or similar. Anyhow, I'd say it's not a good idea to bake in something so similar to shapeless now that it's clearer what shapeless's limitations in practice are.

Are you refering to cases where additional/external type classes are mixed in during derivation? Is it
common in practice?

Yes, very. The most common is to thread Witness/ValueOf through when working with values produced by shapeless's LabelledGeneric to get hold of terms corresponding to the singleton label types, but there are plenty of others. See shapeless's implementation of Scrap Your Boilerplate, or this recently contributed implementation of recursion schemes. Neither of these could be implemented in terms of shapeless's TypeClass.

@julienrf
Copy link
Contributor

julienrf commented Dec 22, 2017

Following a discussion I had with @OlivierBlanvillain, I’d like to share a more elaborate example than Show and discuss some points that I think are important to address.

This example is inspired from a library that describes data types and then derives typeclass instances from these descriptions (API doc is here). For simplicity, only record types (case classes) are considered, the case of sum types is similar but uses Either instead of Tuple2, essentially.

Consider the following typeclass for serializing/deserializing data into/from JSON documents:

trait Codec[A] {
  def encode(a: A): Json
  def decode(json: Json): Either[ValidationErrors, A]
}

Here is how we would manually define an instance of Codec[User]:

case class User(name: String, age: Int)

object User {
  implicit val codec: Codec[User] =
    Codec.obj2(
      "name" -> Codec.string,
      "age" -> Codec.integer
    ) { case (n, a) => User(n, a) } { user => (user.name, user.age) }
}

It assumes that the following operations are available:

object Codec {
  /** JSON String */
  implicit def string: Codec[String] =  /** JSON number */
  implicit def integer: Codec[Int] =  /** JSON object with two fields */
  def obj2[A, B, C](
    fieldA: (String, Codec[A]), fieldB: (String, Codec[B])
  )(
    f: (A, B) => C
  )(
    g: C => (A, B)
  ): Codec[C]
}

Ideally, we would like generically derived instances of Codec to be exactly like User.codec.

However, with the shapeless.Generic approach it does not seem possible because this approach abstracts over the arity of the case classes by using an inductive representation of record fields (with an HList). Consequently, derived instances use several intermediate transformations. Here are the required implicit definitions to make it possible to generically derive typeclass instances:

trait DerivedCodec[A] {
  def codec: Codec[A]
}

object DerivedCodec {
  /** Base rule: derives a codec for a case class with exactly one field */
  implicit def singletonField[L <: Symbol, A](
    fieldLabel: ValueOf[L],
    fieldCodec: Codec[A]
  ): DerivedCodec[FieldType[L, A] :: HNil] = new DerivedCodec[FieldType[L, A] :: HNil] {
    def codec = Codec.obj1(fieldLabel.value.name -> fieldCodec).invmap(a => field[L](a) :: HNil)(_.head)
  }

  /** Induction rule: derives a codec for a case class with n + 1 fields, given a derived codec for a case class with n fields */
  implicit def consField[L <: Symbol, H, T <: HList](implicit
    fieldLabel: ValueOf[L],
    fieldCodec: Codec[H],
    tailDerivedCodec: DerivedCodec[T]
  ): DerivedCodec[FieldType[L, H] :: T] = new DerivedCodec[FieldType[L, H] :: T] {
    def codec =
      Codec.obj1(fieldLabel.value.name -> fieldCodec).zip(tailDerivedCodec.codec)
        .invmap { case (h, t) => field[L](h) :: t } { ht => (ht.head, ht.tail) }
  }

  /** Derives a codec for a case class `A`, given a derived codec for its generic representation `R` */
  implicit def hlistToCaseClass[A, R](implicit
    gen: LabelledGeneric.Aux[A, R],
    derivedCodec: DerivedCodec[R]
  ): DerivedCodec[A] = new DerivedCodec[A] {
    def codec = derivedCodec.codec.invmap(gen.from)(gen.to)
  }
}

This example uses HList, FieldType and LabelledGeneric from shapeless, and ValueOf from SIP-23. It also assumes the following operations on Codec:

trait Codec[A] {
  /** combines `this` codec with `that` codec */
  def zip[B](that): Codec[(A, B)] =  /** transforms this `Codec[A]` into a `Codec[B]` by using a pair of inverse functions */
  def invmap[B](f: A => B)(g: B => A): Codec[B] = …
}

object Codec {
  /** JSON object with one field of type `A` */
  def obj1[A](field: (String, Codec[A])): Codec[A] = …
}

If we derive a Codec[User] using the implicit rules, it produces the following codec:

  hlistToCaseClass(
    <compiler-synthesized>,
    consField(
      'name,
      Codec.string,
      singletonField('age, Codec.integer)
    )
  )

Which, in turn, expands to:

  Codec.obj1('name.name -> Codec.string)
    .zip(Codec.obj1('age.name -> Codec.integer).invmap(a => field['age](a) :: HNil)(_.head))
    .invmap { case (h, t) => field['age](h) :: t } { ht => (ht.head, ht.tail) }
    .invmap { case n :: a :: HNil => User(n, a) } { user => user.name :: user.age :: HNil }

(I removed the intermediate DerivedCodec and just kept Codec)

As wee can see, the derived instance uses a lot of intermediate transformations (invmap calls). These transformations are necessary for two reasons:

  1. to convert to/from the HList based generic representation of the case class
  2. to progress between two induction steps

We might be able to inline the HList construction and extraction so that it doesn’t exist at runtime, but I’m wondering how we could get rid of the intermediate transformations caused by the inductive derivation process. The invmap and zip operations are user defined and the compiler has no knowledge on how to rewrite them.

Let’s try to manually rewrite the induction step without using zip (by inlining it) and invmap:

  implicit def consField[L <: Symbol, H, T <: HList](implicit
    fieldLabel: ValueOf[L],
    fieldCodec: Codec[H],
    tailDerivedCodec: DerivedCodec[T]
  ): DerivedCodec[FieldType[L, H] :: T] = new DerivedCodec[FieldType[L, H] :: T] {
    def codec = new Codec[FieldType[L, H] :: T] {
      def encode(ht: FieldType[L, H] :: T): Json =
        Json.obj(fieldLabel.value.name -> fieldCodec.encode(ht.head.value))
          .merge(tailDerivedCodec.codec.encode(ht.tail))
      def decode(json: Json): Either[ValidationError, FieldType[L, H] :: T] = {
        val headResult =
          json match { case JsonObject(fields) if fields.contains(fieldLabel.value.name) => fieldCodec.decode(fields.get(fieldLabel.value.name)) case _ => Left(MissingField(fieldLabel.value.name))
        val tailResult = tailDerivedCodec.codec.decode(json)
        (headResult, tailResult) match {
          case (Right(h), Right(t)) => Right(field[L](h) :: t)
          case (Left(e),  Right(_)) => Left(e)
          case (Right(_), Left(e))  => Left(e)
          case (Left(e1), Left(e2)) => Left(e1.concat(e2))
        }
      }
    }
  }

The derived codec would still be less performant than the manually written one (the derived one uses Codec.obj1 two times and merges their behaviors whereas the manually written one directly uses Codec.obj2).

Note that we even have to expand the code for combining Either[ValidationErrors, H] and Either[ValidationErrors, T] into Either[ValidationErrors, H :: T], just to avoid calling invmap. It seems that this generic representation is the root of our problems.

@DavidGregory084
Copy link

/cc @fommil as I think he will be interested in this

@fommil
Copy link

fommil commented Dec 25, 2017

The alternative to case classes (now mothballed because scalameta macros were abandoned) was written up at https://vovapolu.github.io/scala/stalagmite/perf/2017/09/02/stalagmite-performance.html

I'd love to return to it.

My new approach to typeclass derivation at the point of data definition is at https://gitlab.com/fommil/scalaz-deriving. I'll be writing a chapter in my book soon, plus hopefully giving a talk at lambdaconf.

@OlivierBlanvillain
Copy link
Contributor Author

Subsumed by #5540

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants