Skip to content

Get rid of = _ in variable definitions #11225

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
odersky opened this issue Jan 27, 2021 · 48 comments · Fixed by #11231
Closed

Get rid of = _ in variable definitions #11225

odersky opened this issue Jan 27, 2021 · 48 comments · Fixed by #11231

Comments

@odersky
Copy link
Contributor

odersky commented Jan 27, 2021

An obscure use of _ occurs in var definitions:

var x: T = _

It defines a concrete variable x without an initial value, or rather the default initial value that the JVM assigns to object fields. It can only be used in a class or object, not to initialize a local variable. It is essential in situations like this:

new AbstractIterator[A] {
    private[this] var hd: A = _
    private[this] var hdDefined: Boolean = false
    ...
}

Here we cannot initialize hd with a value since its type is parametric, and we do not know a value for A. The logic in AbstractIterator makes sure that hd is read only after it is assigned by checking hdDefined before reading hd.

So the idiom is rare but essential where it is needed. But can we find a less obscure notation that does not add another overload to _?

One possibility would be to define somewhere an abstraction like this:

def defaultValue[T]: T = ...

defaultValue can be expressed in terms of _:

def defaultValue[T]: T = 
  object helper { var x: T = _  }
  helper.x

It can also be a compiler-intrinsic, and then it could replace = _ in variable definitions.

But I am a bit reluctant to do that, for fear that then defaultValue will be also used in situations where it is not appropriate. The big problem with defaultValue is that it returns null for all class types including boxed numeric types. That's very confusing! We do not want to give this a more prominent status than what we have. The situation would get worse with explicit nulls. In that case, we would have:

def defaultValue[T]: T | Null

But then

var x: T = defaultValue

would be ill-typed. It would have to be:

var x: T | Null = defaultValue

This is safe, but it defeats the purpose. What should happen for

var x: T = _

is that the programmer or the init-checker proves that x will be initialized with something else before it is first referenced. That's the intended meaning.

So, maybe a better choice is to re-use compiletime.erasedValue for this?

var x: T = erasedValue

This means there is no initial value at runtime (the value is erased), so there is no initial assignment. That can be allowed for objects and classes since objects are bulk-initialized (but the init checker should check that such variables are not referenced before being assigned to). It could be allowed in the future also for local variables if we can prove that the variable is initialized by assignment before use. Everywhere else erasedValue is illegal, by the current rules.

Admittedly, erasedValue is also obscure. However

  • The whole thing is an obscure use case
  • erasedValue can be looked up, unlike _
  • erasedValue does not add to the confusion relative to the other uses of _.
@Katrix
Copy link
Contributor

Katrix commented Jan 27, 2021

Isn't erasedValue a kind of weird fit for this in terms of looking it up? If I look up erasedValue currently, I get lot's of stuff about inline, match, and macros. As a user then, if I saw that when I was looking up this, I might think I took a wrong turn somewhere.

@sjrd
Copy link
Member

sjrd commented Jan 27, 2021

As long as we're not targeting 3.0 with this change ... because this is not triggered by any feedback coming from the community, and we've already said that further language changes would only be done in response to community feedback. (Or blocking issues, which this is not since it's already an obscure use case, and as is, it works.)

@liufengyun
Copy link
Contributor

the init checker should check that such variables are not referenced before being assigned to

I think it's generally not achievable without complicating things due to aliasing --- it's the same as enforcing a typestate protocol in a language.

On the other hand, if the object can be aliased, the usage of Option[T] might be safer for performance-insensitive applications:

- var x: T = _
+ var x: Option[T] = None

@odersky
Copy link
Contributor Author

odersky commented Jan 27, 2021

@Katrix The documentation of erasedValue would have to be changed to cover the new use case.

On the other hand, if the object can be aliased, the usage of Option[T] might be safer for performance-insensitive applications:

Yes, but the use case = _ is for performance sensitive applications. I believe it is OK to demand verification by hand that the variable is assigned before use.

@odersky
Copy link
Contributor Author

odersky commented Jan 27, 2021

Another choice would be to always use null as the initializer. I.e.

new AbstractIterator[A] {
    private[this] var hd: A | Null = null

But then the danger is that we will make use of the initial value since we went through the trouble of defining it:

   if hd == null then // initialize
   else // use hd, with flow typing we know it is not null

Very tempting but wrong, since null might be a legal value of type A. So, I believe _/erasedValue is better since they do not tempt one to test for null.

@liufengyun
Copy link
Contributor

It seems to be at least in Dotty, var x: T = _ are all used in places where we know the concrete type of T. Therefore, such usage can be eliminated with either T | Null or a special sentinel value if T = Int.

In performance-sensitive applications, users can always use a cast to avoid runtime checks.

@odersky
Copy link
Contributor Author

odersky commented Jan 27, 2021

The example of AbstractIterator I gave comes straight out of the standard library. There are several others like that.

@odersky
Copy link
Contributor Author

odersky commented Jan 27, 2021

In performance-sensitive applications, users can always use a cast to avoid runtime checks.

I am not sure I agree. The point is, we want to communicate "this variable is defined but not initialized". That's a common idiom in all imperative languages. Making up a non-sensical initializer and then casting obscures the logic.

"

@sjrd
Copy link
Member

sjrd commented Jan 27, 2021

I am not sure I agree. The point is, we want to communicate "this variable is defined but not initialized". That's a common idiom in all imperative languages. Making up a non-sensical initializer and then casting obscures the logic.

We already have to do that for local vars. We can do it for fields as well. There's nothing special about fields vs local vars in this. In fact I'm pretty sure I had to invent a non-sensical initializer for local vars more often than I have needed a = _ for fields.

@odersky
Copy link
Contributor Author

odersky commented Jan 27, 2021

Maybe compiletime.noValue instead of compiletime.erasedValue? That would avoid the confusion.

@LPTK
Copy link
Contributor

LPTK commented Jan 27, 2021

Unless we can have a check that a variable is always initialized before accessing it (which seems hard to implement), leaving a variable uninitialized is a fundamentally unsafe way of programming, where the programmer basically tells the compiler "trust me on this".

So an unsafe cast seems warranted for this.

var hd: A = null.asInstanceOf[A]

Such a cast is similar in nature to the ones we need in the low-level implementations of things like ArrayBuffer (which needs casts because it's based on an underlying Array[AnyRef]).

Having a special feature like compiletime.noValue would make it look like using uninitialized variables is fine and dandy, legitimate even in non-performance-sensitive code, whereas it can actually lead to type soundness issues!

@odersky
Copy link
Contributor Author

odersky commented Jan 27, 2021

You can't write null.asInstanceOf[A] anymore, thankfully. We really need something that can be used for dropping initialization and nowhere else. So, how about

  var x: T = notInitialized

That's pretty clear, no? And it would get rid of the obscure use of _.

One can argue that notInitialized is unsound and therefore should not be used. I will not be the one to lead the battle to get rid of it in all existing code bases, but if someone else goes ahead, I am sympathetic.

@sideeffffect
Copy link
Contributor

I know I don't bring much value with this, but still: As far as know (not only) in the Scala community, unsafe things are usually named unsafe*, so how about unsafeNotInitialized or unsafeUninitialized?
Maybe we don't need to get rid of this feature altogether, but naming it like this would clearly demarcate that the user is playing with fire here 🔥

@odersky
Copy link
Contributor Author

odersky commented Jan 27, 2021

unsafeNotInitialized would work as well, but i think notInitialized is probably sufficiently clear.

@odersky
Copy link
Contributor Author

odersky commented Jan 27, 2021

We already have to do that for local vars. We can do it for fields as well. There's nothing special about fields vs local vars in this.

In fact, there is! We might not have a value for a field like here

class Memo[A](x: => A):
  private var cached: A = notInitialized
  private var known: Boolean = false
  def force = 
    if !known then
      known = true
      cached = x
    cached

But in a local context if we always have a value with which we could initialize a variable. Otherwise, why define the variable at all?
Exception: A local context that exports an instance that accesses the variable in a closure. But then we could put the variable in the class of the returned instance itself.

@sjrd
Copy link
Member

sjrd commented Jan 27, 2021

But in a local context if we always have a value with which we could initialize a variable. Otherwise, why define the variable at all?

def lastSuchThat[A](xs: List[A])(p: A => Boolean): A = {
  var found: Boolean = false
  var result: A = null.asInstanceOf[A] // no initial value here
  var rest = xs
  while (!rest.isEmpty) {
    if (p(rest.head)) {
      found = true
      result = rest.head
    }
    rest = rest.tail
  }
  if (!found)
    throw new NoSuchElementException
  result
}

Of course we could use an Option[A], but only if we don't have the performance-sensitive argument. And if we don't have the performance-sensitive argument, then we also don't have it for fields.

@liufengyun
Copy link
Contributor

liufengyun commented Jan 27, 2021

If the feature var x: T = _ is only to support performance-sensitive libraries where T is generic, then can we have an ad-hoc solution for the ad-hoc usage?

class Memo[A](x: => A):
  @embed
  private var cached: UnsafeOpt[A] = UnsafeOpt.Empty
  def force: A = 
    if cached.isEmpty then
      cached.set(x)
    cached.get

The annotation @embed will inline implementation of Box to the following after typer:

class Memo[A](x: => A):
  private var cached$value: A = _
  private var cached$init: Boolean = false
  def force = 
    if !cached$init then
      cached$init = true
      cached$value = x
    cached$value

Of course, the implementation of @embed has to check that cached does not leak, i.e. it's effectively private[this] and never used directly as arguments to methods or assignments.

@odersky
Copy link
Contributor Author

odersky commented Jan 27, 2021

I believe that an erased notInitialized value will be a lot simpler than an added annotation. PR to prove it is forthcoming.

@odersky
Copy link
Contributor Author

odersky commented Jan 27, 2021

@sjrd Yes, that makes sense. In fact a notInitialized could also make sense for local values. It would require flow typing to check, so that's an additional expense. But since we already have flow typing for nulls, it could be considered.

odersky added a commit to dotty-staging/dotty that referenced this issue Jan 27, 2021
odersky added a commit to dotty-staging/dotty that referenced this issue Jan 27, 2021
odersky added a commit to dotty-staging/dotty that referenced this issue Jan 27, 2021
@Facsimiler
Copy link

Apologies for this pedantic comment, but this issue is currently titled: Get rid of _ = in variable definitions

Shouldn't this read: Get rid of = _ in variable definitions instead?

@sjrd sjrd changed the title Get rid of _ = in variable definitions Get rid of = _ in variable definitions Jan 28, 2021
@sjrd
Copy link
Member

sjrd commented Jan 28, 2021

Indeed. Fixed.

@djspiewak
Copy link

So I still need to read through this thread and catch up, but I want to underscore that the current meaning of this construct, which results in constructors which have no assignment bytecode for those slots, is crucial for performance sensitive areas. Simply taking null and casting it does produce the same result, but at the cost of an added instruction. This mistakes the purpose of the construct: it isn't to obtain a default value, it is to obtain a variable which has an undefined value for all intents and purposes and which is half as costly.

@LPTK
Copy link
Contributor

LPTK commented Jan 29, 2021

@djspiewak I guess the cast is removed by the erasure phase. Do you have evidence suggesting that initializing a variable with the default value for that type is slower than leaving it uninitialized? I would imagine the JVM compiles these down to the same machine code.

@Jasper-M
Copy link
Contributor

IIRC in Java, leaving uninitialized and initializing with (what Java specifies as) the default value are synonyms.

@djspiewak
Copy link

The cast is removed but not the assignment. This is particularly relevant if the assignment imposes a memory fence, but even without that, the cost is measurable if you're doing enough unavoidable allocations in a critical section.

@dwijnand
Copy link
Member

So I still need to read through this thread and catch up, but I want to underscore that the current meaning of this construct, which results in constructors [..]

[..] but I want to underscore that the current meaning of this construct [..]

underscore

@nafg
Copy link

nafg commented Jan 31, 2021 via email

@djspiewak
Copy link

Either you force variables not initialized to a non null
to be | Null for safety, or you say that var fields are special.

I mean, var fields are already special.

To be clear, I don't have any problem going through some extra type system pedantry to allow for uninitialized fields, though | Null isn't quite sufficient since it doesn't cover primitives accurately. My fear if we allow notInitialized in other general contexts is that it will be abused quite heavily. null.asInstanceof[A] is already there, yes, but it looks radioactive so people generally don't touch it. When I've used it in code shared with less experienced teams, for example, I'm invariably asked to write a book in comments explaining what's going on. I doubt that notInitialized will provoke the same aversion, and I don't think that's a good thing.

Honestly the ideal syntax here really would be just leaving off the assignment, but of course that conflicts with abstract member denotation. This is part of why I actually like the = _ syntax. Underscore is already a heavily overused token within the language, and people seem to understand that it has different meanings in different contexts. Honestly, even removing this particular construct isn't going to do much to move the needle on that. When people see something like var x: Int = _, they usually understand that it's some syntactic weirdness thing and they either understand it or they make a mental note to come back and figure it out. I've never seen anyone get confused by the fact that this syntax doesn't work in any scope other than the top level of a class/trait/object (though I admit to having tried it in local scopes several times early on in my Scala explorations before I realized that Scala lacks flow typing).

I guess my general conclusion is that this is not really that big of a deal. Changing it to notInitialized (or anything else) isn't really going to materially demystify _, particularly when the construct is so rarely used anyway, and introducing a seemingly-orthogonal mechanism for this functionality (notInitialized or similar) runs the risk of introducing a whole added source of confusion due to the fact that the expression just doesn't make sense in a ton of contexts, but the compiler will either let you use it (amplifying confusion and injecting unsoundness into the language) or disallow it (amplifying other confusion and removing any orthogonality).

Particularly in light of the lateness of the hour -- and it is very late in the game to be having these kinds of conversations! -- my preference would be to just leave it as it is. Changes at this point which remove syntax should really only be considered under extreme circumstances, because otherwise we're never going to be able to, as an ecosystem, land this plane. If status quo is considered an unacceptable outcome, then my second choice would be to lean into the magicness of null.asInstanceOf[A] and just make that the new = _ replacer. Thus, instead of introducing a magical notInitialized, we just force people to use null.asInstanceOf[A] as the assigned expression and give the compiler magic knowledge of this such that it can remove the assignment altogether. This has the advantage of being generally source compatible with Scala 2, though it will result in a performance regression for any code which is cross-compiled back to 2.x, which likely means that we would want to push for additional 2.12.n and 2.13.m versions which backport the magical awareness.

Hopefully it's clear why my preference is to leave it alone. Even a small change here (making null.asInstanceOf[A] just a bit more magical) has rippling consequences which would be very painful at this stage.

@odersky
Copy link
Contributor Author

odersky commented Feb 1, 2021

Note: uninitialized is defined erased, so it cannot be misused as a regular value.

@nafg
Copy link

nafg commented Feb 1, 2021 via email

@odersky
Copy link
Contributor Author

odersky commented Feb 1, 2021

Note: uninitialized is defined erased, so it cannot be misused as a regular value.

What I meant by this is that there's a check that uninitialized cannot be used as a regular value.
The only place it is legal in normal code is as the initializer of a mutable field.

(theoretically you could use it also in typelevel code, as long as no actual bytecode gets
generated from that code)

So, to compare:

We go from a truly obscure use of _ that's impossible to google to a reference
uninitialized that has to be imported from scala.compiletime and that
has a doc comment explaining what it is.

Looks like a clear win to me.

The other reasonable alternative I see is to eliminate = _ or = uninitialized altogether.
and recommend that it's substituted by

var x: T = null.asInstanceOf[T]

I find that worse in many dimensions

  • it abuses null
  • it abuses asInstanceOf
  • it encourages to use the same idiom elsewhere for normal values.

@odersky
Copy link
Contributor Author

odersky commented Feb 1, 2021

Possible generalization for later:

Allow uninitialized also as the initializer for local variables. That would require a flow-analysis
similar to Java to ensure that such variables are assigned before use.

@nafg
Copy link

nafg commented Feb 1, 2021 via email

@odersky
Copy link
Contributor Author

odersky commented Feb 1, 2021

and introducing a seemingly-orthogonal mechanism for this functionality (notInitialized or similar) runs the risk of introducing a whole added source of confusion due to the fact that the expression just doesn't make sense in a ton of contexts, but the compiler will either let you use it (amplifying confusion and injecting unsoundness into the language) or disallow it (amplifying other confusion and removing any orthogonality).

Actually, no. The idea is that the uninitialized value is declared as erased (@compileTimeOnly in Scala-2). So it's not a special case by itself. No confusion is amplified and no orthogonality is lost. The only special action taken by the compiler is to remove
= uninitialized assignments from the code, and that is exactly the same as it was for _ =.

@smarter
Copy link
Member

smarter commented Feb 1, 2021

I already suggested something that's better than both. Call it
nullOrDefault[A] or something like that, make it legal anywhere but treat
it specially in var field initializers.

The problem with this is that in a null-safe world, nullOrDefault[A] should have type A | Nulŀ, but then it cannot be assigned to a field of type A, by contrast uninitialized would let you type it as A even if Null is not a subtype of A. This is fine (or at least not worse than the status-quo) since we cannot avoid having to deal with uninitialized fields on the JVM, and we already have an initialization checker (cf https://dotty.epfl.ch/blog/2020/03/18/23rd-dotty-milestone-release.html).

@odersky
Copy link
Contributor Author

odersky commented Feb 1, 2021

[This is essentially a duplicate of @smarter's comment, so you can just skip it.]

I considered nullOrDefault but discarded it because it would have the wrong type. It would have to be defined like this:

def nullOrDefault[T]: T | Null

That's the wrong type for primitive instances of T. Int | Null erases to AnyRef, not Int.
It's also of dubious value. Normal types don't have default values in any useful sense. So the
danger is that this would be used in areas where it is not appropriate.

@nafg
Copy link

nafg commented Feb 1, 2021 via email

@sjrd
Copy link
Member

sjrd commented Feb 1, 2021

Actually, no. The idea is that the uninitialized value is declared as erased (@compileTimeOnly in Scala-2). So it's not a special case by itself. No confusion is amplified and no orthogonality is lost. The only special action taken by the compiler is to remove
= uninitialized assignments from the code, and that is exactly the same as it was for _ =.

It's not like any other erased stuff, since it remains the only valid erased term that can on the rhs of a non-erased var definition. So it is still a special-case, even considering erased.

In addition, erased didn't make it into 3.0 anyway, so for all intents and purposes, you still have to document and explain it as a special-case, since you can't explain it in terms of erased which is not there.

Therefore, I don't think that the argument that "it's just an erased so not a special-case" holds up.

@LPTK
Copy link
Contributor

LPTK commented Feb 1, 2021

@odersky

  • It does not abuse null or asInstanceOf

I don't think null.asInstanceOf[A] is an abuse of asInstanceOf. Functionally, it's clearly similar to an unsafe cast: it provides a value that's not truly of the documented type, which constitutes a soundness hole – an escape hatch, just like asInstanceOf.

When trying to assess the type safety of a program, one needs to determine where soundness holes are introduced. To do so, we already have to look out for uses of asInstanceOf and @unchecked[Variance]. Now uninitialized would be yet another soundness hole to look out for.

I concede that null.asInstanceOf[A] still can be seen an abuse of null. We could instead make it Any.asInstanceOf[A], where val Any: Any = null would be added to Predef, and would be described in doc as an arbitrary value. This has the advantage of making the intent of the uninitialized-var use case clearer.

@odersky
Copy link
Contributor Author

odersky commented Feb 1, 2021

It's not like any other erased stuff, since it remains the only valid erased term that can on the rhs of a non-erased var definition. So it is still a special-case, even considering erased.

Yes, in the sense that it is removed when used as the initializer of a field. I mentioned that above.

In addition, erased didn't make it into 3.0 anyway, so for all intents and purposes, you still have to document and explain it as a special-case, since you can't explain it in terms of erased which is not there.

But @compileTimeOnly` exists in Scala-2 and is equivalent.

@sjrd
Copy link
Member

sjrd commented Feb 1, 2021

@compileTimeOnly has 0 common point with erased. It just emits an error if any call to an @compileTimeOnly method survives until rechecks. That also applies to calls that are themselves within an @compileTimeOnly member. It's meant for fake methods that should be captured and replaced by a surrounding macro. For example, the .value of sbt.

@nafg
Copy link

nafg commented Feb 1, 2021 via email

@odersky
Copy link
Contributor Author

odersky commented Feb 1, 2021

@nafg It's not legal.

@odersky
Copy link
Contributor Author

odersky commented Feb 1, 2021

@compileTimeOnly has 0 common point with erased. It just emits an error if any call to an @compileTimeOnly method survives until rechecks

And erased just emits an error if any call to a erased method survives until after erasure. I don't think that fine point matters here, as long as you say that uninitialized field assignments are removed before the check.

@nafg
Copy link

nafg commented Feb 1, 2021 via email

@dwijnand
Copy link
Member

dwijnand commented Feb 2, 2021

Either that means changing the meaning of null generally - making it inhabit Nothing rather than Null - or changing its meaning when on the RHS of a var. The latter seems definitely wrong to me, while the former seems too risky of a change.

Also, if I remember correctly from the discussions around -Xcheck-init, there's a difference drawn between = _ and = null where the former means uninitialised and the latter means initialised to null (similarly for 0/0.0/false..) - relevant to the initialisation checking that flag is defined for.

@nafg
Copy link

nafg commented Feb 2, 2021 via email

@dwijnand
Copy link
Member

dwijnand commented Feb 2, 2021

its initial value doesn't follow the same rules as its type claims about other reads and writes.
I don't see why the exception to its type can't be made broader

I don't follow these parts in particular, but I think I agree with you: I can't come up with how = _ is treated differently to -Xcheck-init. Maybe I misremembered. But I still believe my points on not changing null's type.

odersky added a commit to dotty-staging/dotty that referenced this issue Feb 3, 2021
odersky added a commit to dotty-staging/dotty that referenced this issue Feb 4, 2021
michelou pushed a commit to michelou/scala3 that referenced this issue Feb 5, 2021
odersky added a commit to dotty-staging/dotty that referenced this issue Mar 27, 2021
@Kordyjan Kordyjan added this to the 3.0.0 milestone Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.