Skip to content

A crazy idea: provide a reasonable toString for case classes #1341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DarkDimius opened this issue Jun 29, 2016 · 16 comments
Closed

A crazy idea: provide a reasonable toString for case classes #1341

DarkDimius opened this issue Jun 29, 2016 · 16 comments

Comments

@DarkDimius
Copy link
Contributor

DarkDimius commented Jun 29, 2016

When having discussions with people in the community, I saw a misconception that as soon as compiler developers have some idea, it quickly gets implemented and shipped in the next release. At the same time, the misconception suggest that ideas from outsiders never get support and are never implemented.

The source of this misconception is that decisions that originate from compiler developers that do are not supported by majority in compiler team(s) are never publicly presented.
This creates a biased distribution for ideas that are presented to public: most of them are already fleshed out, discussed in private and preliminary decision is frequently made. They have been privately discussed for month if not years in order to cover all the corner-cases of both abstract idea and practical implementation.

This issue exists to show discussion that normally happens in private, to demonstrate that quite a few of ideas that we have are not implemented, as not good enough. This is an example of such a idea, that may simplify everyday life of a particular developer (me) but I believe it's not good enough (yet?) to be implemented.

The proposal

This proposal proposes a best-effort to make toString on case classes be JSON.
Another good option is proposed by @densh in comment below.
I don't have preference over either of two currently, as his version has also some convincing advantages, such as repl compatibility. Additionally, AFAIK this is what Python does.

Introduction:

I've been thinking about it for a year or so already, sometimes leaning towards "it's very easy to implement, and it makes common usage easier, so why not do it", while sometimes leaning towards "I like JSON as much as I like XML".

Recently, we've had a lot of discussions with @felixmulder on how should dottydoc compiler part and web-part communicate. And it seemed that what he intended to implement is comparable in amount of code to implementing inside the compiler.

I'd like to finally have this discussion, mostly to gather opinions of people and see if we should do it.
If we don't do it in Dotty, I guess it's better to never do it. So why not have discussion now?

Motivation

When discussing including serializer in standard library, in most cases this is seen as a bad idea, as there are multiple formats available and we do not want to bear the burden of supporting all of them. In particular there have always been an argument not to include JSON\XML\Protobuf serializer in standard library\compiler as we'll need to maintain it.

Current status

Compiler generates a toString on case classes that calls into

    ScalaRunTime..MODULE$._toString((Product)this);

That is implemented as

    x.productIterator.mkString(x.productPrefix + "(", ",", ")")

Idea

TLDR: replace parenthesis with brackets and include field names. Add a best effort fromString(s): Option[CaseClass] method.

Thinking about it, we already have a toString defined on all classes. So we already have a conversion from every case-class to a String. I propose to make this string try to be JSON.

And make companion object have a fromString method that does best-effort in trying to create the class from a JSON string.

Design guidelines

Several details of how I would prefer it to work:

  • toString generated for case-classes emits a pretty-printed version of JSON, that can be compacted by a library function that strips whitespaces;
  • fromString(s: String): Option[CaseClass] is generated on companions of case classes;
  • fromString does not use reflection, but instead relies on existence of fromString for all the fields;
  • fromString on fields that are not case classes can be user-defiend in a companion of a field class or added through implicit conversion;
  • fromString is provided for some parts of stdlib such as primitives, commonly used classes such as java.util.Date, standard collections. For java-defined classes it is added through implicit conversion;
  • when we'll have ADTs, generate a fromString for it automatically based on the hierarchy;
  • the implemented fromString and toString do not try to provide best performance and fit all requirements. They aren't to replace JSON serializer but to provide an easy to use and convenient default that makes sense(unlike current toString that is impossible to read on nested case classes)

Proposed implementation

(c: CaseClass).toString()

  1. creates a string json entry called _class$ with a fully specified class name of what is being serialized;
  2. calls toString an all fields;
  3. creates a JSON string by computing \n\t${field-name}: ${fieldValue} to every field, concatenates it with field from step 1 and wraps all this in {}.
    1. This allows to create a pretty-printed version using a single StringBuffer that should be comparable to speed as current toString. It may get a bit slower due to having to append whitespaces and field names compared to current implementation.

fromString(s: String): Option[CaseClass]

  1. does not assume presence of whitespaces in the string and does not try to be very efficient. Does not always work if fields have crazy toStrings.
  2. checks if class name is the same, if not returns None
  3. splits the string to delegate parsing of fields to other fromStrings, taking into account structure of JSON. It would need to import the scope of companion class of field type. See example below for illustration
  4. if some of nested fromStrings calls for fields returned None, returns None
  5. Otherwise, constructs Some(caseClass) using provided fields.

Demonstration:

case class TestClass(a: Int, b: Double) {
  def toString = // synthesized by desugar as an untyped tree
    q"{\"_class$$\": \"TestClass\",\n\t\"a\": ${a.toString},\n\t\"b\": ${b.toString}}"
}

case object TestClass$ {
  def fromString(s: String): Option[TestClass] = { 
      val maping: Map[String, String] = callLibraryFunctionThatParsesSingleLevelOfJson
      if (mapping.getOrElse("class$", "") != "TestClass") None
      else {
         val a: Option[Int] = {
           import Int._
           mapping.get("a").flatMap(fromString) // uses scope search to find one
                                 // it is provided by companion of class scala.Int and imported above
         }
         val b: Option[Double] = {
           import Double._
           mapping.get("b").flatMap(fromString) // uses scope search to find one
                                 // it is provided by companion of class scala.Double and imported above
         }
         if (a.isEmpty || b.isEmpty) None
         else Some(TestClass(a.get, b.get))
      }
  }
}

case class TestWrapper(tc: TestClass, date: java.util.Date) {
  def toString =  // synthesized by desugar as an untyped tree
    q"{\"_class$$\": \"TestWrapper\",\n\t\"tc\": ${tc.toString},\n\t\"date\": ${date.toString}}" 
}


case object TestWrapper$ {
  def fromString(s: String): Option[TestWrapper] = { 
      val maping: Map[String, String] = callLibraryFunctionThatParsesSingleLevelOfJson
      if (mapping.getOrElse("class$", "") != "TestWrapper") None
      else {
         val tc: Option[TestClass] = {
           import TestClass._
           mapping.get("tc").flatMap(fromString) // uses scope search to find one
                                 // it is provided by import above
         }
         val date: Option[java.util.Date] = {
           import java.util.Date._
           mapping.get("date").flatMap(fromString) // uses scope search to find one
                                 // it is provided in DottyPrefer\similar place
         }
         if (tc.isEmpty || date.isEmpty) None
         else Some(TestWrapper(tc.get, date.get))
      }
  }
}

Arrays & stuff that has crazy toString

First of all, the provided version of toString does not guarantee that it returns a valid JSON.
And even if it does return a valid one, it provides no guarantee that you can deserialize from it.
Take example of

case class Arr(a: Array[Int], stuff: BadClass)
class BadClass { def toString = "\n"}

I'm fine with this state, as this proposal does not aim to replace serialization frameworks and it does not make current situation with useless toString worse, while it would help in many common cases.

But If we decide that we would like to always generate valid JSON, we can instead of calling a toString in this code:

def toString = // synthesized by desugar as an untyped tree
    q"{\"_class$$\": \"TestClass\",\n\t\"a\": ${a.toString},\n\t\"b\": ${b.toString}}"

call a toJsonString method, that has a low-priority implicit decorator implementation that calls toString and provides correct escaping. Case classes would have a native toJsonString that forwards to toString

Drawbacks

  • toString becomes slower;
  • there may be existing code that relies on current behavior of toString, for example by splitting result on parentheses;
  • users may start relying on toString\fromString instead of using proper serialization;
  • if JSON fades away as XML did, we may get stuck with it;
  • JSON may not be the best format as it even has problems with proper support of Longs in practice;

Alternatives

  • leave as is;
  • simply change ScalaRuntime.toString(Product):String: to include some pretty printing;
  • another good option is proposed by @densh in comment

Real motivation

  1. I hate reading toString of dotty trees). We do have show, but if I need to read the internal raw structure of raw tree I still use Lisp-formatter in my emacs.
  2. DottyDoc needs to emmit JSON. It would be nice to do it without external dependencies.
@DarkDimius
Copy link
Contributor Author

misclicked and submitted too early. Still improving the proposal.

@felixmulder
Copy link
Contributor

Why _class$?

@DarkDimius
Copy link
Contributor Author

DarkDimius commented Jun 29, 2016

  1. So that it does not clash with user-defined field names(hence $)
  2. it is in the beginning so that it's easy to see the class name in IDE(and that's why it's not preceded by \n\t
  3. I've added _ to make sure that if you use language where convention is that compiler generated symbols have _ instead of $ you wouldn't define such a val.

@densh
Copy link

densh commented Jun 29, 2016

Counter-idea: make case classes .toString generate Scala code that creates an instance that is identical to this one.

  1. Fully-qualified call to the constructor
  2. All string sub-parts are properly quoted and escaped
  3. Should usually be copy-pasteable to repl and create identical value
  4. Arrays are printed as Array($elem1, $elem2, ..., $elemN) just like all the other collections

Example

Currently:

scala> case class FooBar(foo: Int, bar: String, arr: Array[Any])
scala> FooBar(42, " \" quoted \" ", Array('symbol))
res0: FooBar = FooBar(42, " quoted " ,[Ljava.lang.Object;@de77232)

This counter-proposal:

scala> case class FooBar(foo: Int, bar: String)
scala> FooBar(42, " \" quoted \" ", Array('symbol))
res0: FooBar = FooBar(42, " \" quoted \" ", Array('symbol))

Related work

Li Haoyi's pprint: http://www.lihaoyi.com/upickle-pprint/pprint/

@lrytz
Copy link
Member

lrytz commented Jun 29, 2016

Why not XML? :trollface: (I think we should not bake JSON into the language)

@DarkDimius
Copy link
Contributor Author

Added sections Introduction, Current Status and Real Motivation.

@DarkDimius DarkDimius changed the title A crazy idea: make case classes .toString generate JSON A crazy idea: provide a reasonable toString for case classes Jun 29, 2016
@DarkDimius
Copy link
Contributor Author

@densh I like your idea. I've updated the issue title&start.
Question: I understand how would your proposal work in untyped language such as Python.
But can you tell me what would you expect to get when you execute this code in your proposal:

class A[T]{}

(new A[String]).toString

@felixmulder
Copy link
Contributor

@densh - with Dmitry's proposal I guess you could just call the fromString method 😃

@DarkDimius
Copy link
Contributor Author

DarkDimius commented Jun 29, 2016

Updates: added sections Drawbacks and Alternatives.
Updated the description as it mistakenly had implicit in some places where it shouldn't. After re-reading it once again I understood that some of them made it seem like the default implementation heavily relies on implicit search. It doesn't.
It would be nice if people could have a second look now.

@adriaanm
Copy link
Contributor

adriaanm commented Jun 29, 2016

I'd rather we invest our energy in coming up with a mechanism for abstracting over the structure of case classes so that we can express the boilerplate the compiler generates for case classes in the library.

Pardon my poor quasi-quote hygiene, but wouldn't it be cool if you could implement toString once like this and have a specialized version in each of HasToString's subclasses?

@deriving trait HasToString {
  def toString: String = inline q""" s"${thisClass.name}(${thisClass.fields.map(q"""${_}""")}.mkString)" """
}

@stuhood
Copy link

stuhood commented Jun 29, 2016

100% agree with @adriaanm.

@DarkDimius
Copy link
Contributor Author

DarkDimius commented Jun 30, 2016

@adriaanm, I agree it would be nice to have a better mechanism for implementing such proposals.
Note, that case classes, we aclually do have a mechanism that is powerful enough to implement what you've suggested: Products.

Actually, the your proposed version of toString can be implemented in the library and this is how it is implemented now, though instead of inheritance it delegates to ScalaRunTime._toString.

Or am I misunderstanding what snipped you proposed does?

@xeno-by
Copy link

xeno-by commented Jun 30, 2016

I suppose what Adriaan wants is the possibility to do this at compile time.

@odersky
Copy link
Contributor

odersky commented Jun 30, 2016

I suppose what Adriaan wants is the possibility to do this at compile time.

... which could be achieved my making productPrefix and productElements
(or some analogues) inline, right? I think that's really all we need.

@DarkDimius
Copy link
Contributor Author

Added an opening to the issue, covering why this (crazy) idea was submitted.
After gathering more related work it should be complete.

@stuhood
Copy link

stuhood commented Jul 22, 2016

Mentioning #1347 here to make sure they are "linked" from github's perspective.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants