Skip to content

support break and continue in blocks via enums #1619

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nikomatsakis opened this issue Jan 23, 2012 · 14 comments
Closed

support break and continue in blocks via enums #1619

nikomatsakis opened this issue Jan 23, 2012 · 14 comments
Milestone

Comments

@nikomatsakis
Copy link
Contributor

So, I like the idea of supporting break and cont (and, ideally, ret) within blocks but I am not sure how they ought to be implemented. In particular, I don't like the idea of the compiler magically inserting code around block calls to handle non-local returns. Therefore, I am throwing out this half-way scheme to see what people think. It does not allow ret within blocks but it does allow break and cont and involves relatively little magic.

The idea is to create a "well-known" enum in the core library:

enum loop_ctl {
    lc_break,
    lc_cont
}

Within a sugared closure, break and cont would be equivalent to ret lc_break and ret lc_cont. Next, if there is no existing tail expression in the sugared closure and the expected return type is loop_ctl, the tail expression lc_cont is added by default. Finally, we disallow explicit use of ret within sugared closures all together, because it is potentially confusing and (besides) hurts our current type inference.

The final condition is not needed. We could keep ret with the current meaning of "return from the closure early" and just fix up the type inference algorithm. I am somewhat indifferent but I do think it's mildly confusing, particularly if break and cont start to work.

@nikomatsakis
Copy link
Contributor Author

I forgot to write how this would be used. Iteration primitives which support break and continue, such as vec::iter, would be written like so:

fn iter<A>(v: [A], blk: fn(A) -> loop_ctl) {
    let i = 0u, n = vec::len(v);
    while i < n {
        alt blk(v[i]) {
            lc_break { ret; }
            lc_cont { /* fallthrough */ }
        }
    }
}

@BrendanEich
Copy link

In languages with methods or functions and blocks as distinct features, the Tennent's Correspondence Principle way to handle 'ret' would be that it forces return from the nearest enclosing non-block. This is what Smalltalk had, Ruby too, and my http://wiki.ecmascript.org/doku.php?id=strawman:block_lambda_revival proposal for JS does likewise.

Is this out because we want blocks to be fn-like with regard to 'ret'?

/be

@nikomatsakis
Copy link
Contributor Author

So, TCP is why I said we'd just disallow ret. It's out (at the moment) because compiling it requires either some sort of forced unwinding (slow) or else inserting magic plumbing such that a call to vec::iter(v) {|| ret; } will potentially return from its caller.

What I mean is, if we just added lc_ret to the enum, it's not enough to handle that case in vec::iter(), we'd have to handle the case in the caller to vec::iter() too.

@nikomatsakis
Copy link
Contributor Author

What's weird is that when I first started using Smalltalk I found it very surprising that ^ returned from the enclosing function and not just the block. But now that I'm used to smalltalk, I find the opposite very surprising.

@graydon
Copy link
Contributor

graydon commented Jan 27, 2012

Yeah. We've been round and round on this many times. Definitely supporting TCP for ret adds the largest share of complexity to the call protocol, compiler, etc. I've said before that the fact that we have aliasing env capture makes me think the TCP-identical-ret is non-critical. I still feel that way. We can just ban it. It's not like ret mysteriously changes meaning. It doesn't work in a block. Period. Because it would break TCP and we want to avoid that. So you can write to an option in your environment, break, then check the flag and do an outer ret manually. It's probably faster to do that anyways than the inevitable boxing and switching it'd take to unwind through your iterators properly while passing a polymorphic return value back out.

So .. short story: I'm ok with just trying for break and cont. That's what scheme does anyways :)

But taking a lesson from scheme .. with what you're proposing here, it still feels a bit overdone for the compiler to know about a core loop_ctl type, and to do the transformation when it's present. Considering that type is exactly 2-valued, might it also work to use bool, and transform break to ret false and cont to ret true when a block is bool-valued? I think you get the same effect, and nicer, the iterator can be written as while blk(v[i]) { i += 1; }

@nikomatsakis
Copy link
Contributor Author

I personally favor having a more intentional type like loop_ctl than a plain boolean, though a newtype'd boolean would be ok.

One reason I would like to add compiler sugar is that I think all of the loops which are now of type fn(A) ought to be breakable (e.g., vec::iter()). If the loop body is inlined this ought to be zero-overhead as well.

@nikomatsakis
Copy link
Contributor Author

Basically the basic "iterable" type (as defined in the iter module) would change from fn(A) to fn(A) -> loop_ctl. But with the sugar, (almost) no code would have to change. Not 100% sure if this is a good idea.

@graydon
Copy link
Contributor

graydon commented Feb 2, 2012

I'm fine with that aspect of it (break and cont turn into sugar for returns within blocks). I just think bool is the natural type here: a "keep going" value. I think it actually adds cognitive load to tell users to write loop_ctl for a breakable loop.

@nikomatsakis
Copy link
Contributor Author

I guess it's a matter of taste. I always found the use of bool to mean keep-going to be kind of overloading. In general I'm not a big fan of scalar types anyway, I guess, I prefer to use an enum or something to avoid mixing "units".

But there is one concrete issue with using bool: in that case, the compiler cannot help at all, which means that if we try to use loop_ctl ubiquitously, then every loop will look like:

vec::iter(v) {||
  ...
  cont;
}

which doesn't really strike me as a good idea. Of course you didn't want the compiler to help, which is fine too, but it strikes me that then we will always want to have two iteration primitives: one that always continues and one that may break. We could make this less painful with traits, so that for a given type you need only define the breakable iterator and the "always continue" comes for free, but it still means two iteration methods:

  vec.iter {|| ...}

vs

  vec.iter_brk {||
    ...
    cont;
  }

(Or we could re-use the all method, but I'd rather have iter_brk. Re-using all always seems like a hack, and---besides---it returns a bool, so you would need an explicit semicolon afterwards since you don't actually want this boolean value. But then it does tell you whether the loop exited via break or by falling off the end, which is useful.)

Well, there are two or three possibilities. I guess in the end I don't care that much. Maybe I'll draw up them up in more detail with some examples and send out an mail to see what other people find most appealing, since this is ultimately an ergonomics issue more than anything.

@marijnh
Copy link
Contributor

marijnh commented Feb 2, 2012

I'm in favour of mild compiler magic here (supporting break and cont, automatic return of cont at the end of the function), activated by looking at the block's return value. It seems like it'd add a lot of convenience (not to mention it'd look better).

@ghost ghost assigned marijnh Feb 28, 2012
@marijnh
Copy link
Contributor

marijnh commented Mar 1, 2012

@nikomatsakis So judging by your e-mails and irc messages, you seem to be working on this. Should I assign back to you? Or shall I implement my understanding of what you have planned?

@graydon
Copy link
Contributor

graydon commented Mar 5, 2012

(Significant followup on this in email: https://mail.mozilla.org/pipermail/rust-dev/2012-February/001432.html )

marijnh added a commit that referenced this issue Mar 26, 2012
Also adds proper checking for cont/break being inside a loop.

Closes #1854
Issue #1619
marijnh added a commit that referenced this issue Mar 27, 2012
The last argument of the call must be a block, and the type of this
argument must a function returning bool. `break` and `cont` are
supported in the body of the block, and return `false` or `true` from
the function. When the end of the function is reached, `true` is
implicitly returned.

    for vec::all([1, 2, 3]) {|elt|
        if elt == 2 { break; }
        log(error, elt);
    }

Issue #1619
marijnh added a commit that referenced this issue Mar 27, 2012
For use with the new for construct.

Issue #1619
@nikomatsakis
Copy link
Contributor Author

marijn's commits (listed above) implemented this---something better, as a matter of fact.

@graydon
Copy link
Contributor

graydon commented Apr 5, 2012

This is done.

marijnh added a commit that referenced this issue Apr 6, 2012
Most could use the each method, but because of the hack used to
disambiguate old- and new-style loops, some had to use vec::each.

(This hack will go away soon.)

Issue #1619
Kobzol pushed a commit to Kobzol/rust that referenced this issue Dec 30, 2024
This replaces link to a removed lint with a link to discussion of lints
of its type.
bors pushed a commit to rust-lang-ci/rust that referenced this issue Jan 2, 2025
This replaces link to a removed lint with a link to discussion of lints
of its type.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants