Skip to content

Add unstable frontmatter support #137193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Conversation

epage
Copy link
Contributor

@epage epage commented Feb 17, 2025

This is an implementation for #136889

This strips the frontmatter, like shebang, because nothing within rustc will be using this. rustfmt folks have said this should be good enough for now as rustfmt should successfully work with this (ie not remove it) even though proper formatting would require AST support. I had considered making this part of lexing of tokens but that led to problems with ambiguous tokens. See also zulip.

This does not do any fancy error reporting for invalid syntax. Besides keeping this first PR simple, we can punt on this because Cargo will be the front line of parsing in the majority of cases and it will need to do have as good or better error reporting than rustc, for this syntax, to avoid masking the quality of rustc's error reporting.

This includes supporting the syntax in

  • the parser
  • pretty printing
  • rustfmt (nop, just needed a test)

This leaves to the future

  • r-a support (called out in the tracking issue)

@rustbot
Copy link
Collaborator

rustbot commented Feb 17, 2025

r? @fee1-dead

rustbot has assigned @fee1-dead.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@epage epage changed the title Add u Add unstable frontmatter support Feb 17, 2025
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 17, 2025
@rust-log-analyzer

This comment has been minimized.

@ytmimi
Copy link
Contributor

ytmimi commented Feb 18, 2025

Might be worth adding a test case to src/tools/rustfmt/tests/target just to show that rustfmt won't strip the frontmatter.

@rust-log-analyzer

This comment has been minimized.

@rustbot
Copy link
Collaborator

rustbot commented Feb 18, 2025

Some changes occurred in src/tools/rustfmt

cc @rust-lang/rustfmt

@fmease fmease added the F-frontmatter `#![feature(frontmatter)]` label Feb 19, 2025
@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 19, 2025
@epage epage mentioned this pull request Feb 20, 2025
6 tasks
@epage epage force-pushed the frontmatter branch 3 times, most recently from ed5aad9 to 64cecab Compare February 20, 2025 16:03
@fee1-dead fee1-dead added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 22, 2025
return None;
}
let (fence_pattern, rest) = rest.split_at(fence_end);
let (info, rest) = rest.split_once("\n").unwrap_or((rest, ""));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is a frontmatter like ---cargo--- going to be parsed? or ---cargo ---?

It looks like they aren't considered to be valid. Consider making that a bit clearer (maybe here? to say that if there is no newline it will be invalid) and add tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything after the 3+ - is the infostring, including any dashes that appear after something else. We check if its an identifier on the following lines which will reject that. We have a test case for invalid identifiers.

Or is the concern more with

strip_frontmatter("---")

(ie a test case where there is no newline after the opening)

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 27, 2025
@fee1-dead fee1-dead added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 28, 2025
@bors
Copy link
Collaborator

bors commented Mar 4, 2025

☔ The latest upstream changes (presumably #137927) made this pull request unmergeable. Please resolve the merge conflicts.

@epage
Copy link
Contributor Author

epage commented Mar 27, 2025

That was a T-lang RFC, it should not specify what the compiler does to implement the feature. How it is implemented in the compiler should be left to the compiler team.

My suggestion is fully compatible with it being valid Rust syntax supported by the compiler.

With that direction, we are either

  • Deferring making it supported Rust syntax in the compiler
    • This is us kicking this down the road (keep in mind that Cargo has already been kicking this down the road for two years and this is me getting it added)
    • If we stabilize that way,
      • cargo, cargo fmt, rust-analyzer, user tools, etc would all have to duplicate this to then change again once its supported
      • we are delaying users getting the benefits of rustc support
      • we would be lacking real world experience on the rustc implementation but we'd be stuck with it as stabilized. If we had gone that way from the beginning, we would have likely overlooked having any T-compiler sign off.
  • Not actually making it supported Rust syntax but a container format specific to Cargo
    • The RFC spoke specifically to this and was not what was decided

If you think my suggestion is valid, then we can totally just land an unstable compiler flag that can only be used by cargo which instructs rustc to skip x number of bytes when parsing the file.

As I said, it was a valid option to be considered but we decided to go a different direction in the RFC (referring specifically to the "container format solution", not "defer implementation").

@fee1-dead
Copy link
Member

fee1-dead commented Mar 27, 2025

  1. I'm not sure what functional difference it would make towards unlocking the parallel development which was one of the goals of this PR, for us to ship a support with poor diagnostics that certainly needs to change vs shipping a minimal support for cargo to tell us how much to skip.
  2. I'd have a much higher bar in terms of code quality to make this supported syntax by rustc. I'd personally consider it unacceptable to ship the feature (yes, even as an unstable feature) without any consideration for error reporting.
  3. You did mention that it would have been better if this was from someone who is actually interested in writing code for the compiler. I agree. I'd be willing to implement the parsing part myself if you're okay with us landing the solution for rustc being told how much to skip for now. My suggestions all received considerable pushback and in my opinion it is hard to make progress since you disagree with my opinions. But like oli, feel free to find another compiler reviewer who is willing to approve this code, as I will only block this if I am assigned to it (and I am already not assigned, so you can just disregard what I say completely)
  4. To be fair, stabilization is still contingent on there to be a solid implementation in the compiler. The goal of my suggestion was to eliminate the need for any additional back and forths here and actual push forward since you seemed averse to amending the code as much as I'd wanted it to when I reviewed it.

@epage
Copy link
Contributor Author

epage commented Mar 28, 2025

Pushing the front matter parsing out of rustc is basically the world we've been living in the last two years but instead of telling rustc how to skip the content, Cargo strips it from a .rs file and writing out a new one in a different location.

The parallel development isn't just in Cargo but in rustfmt, rustdoc, r-a, etc.
For overall development, this would help by shrinking the size of the hack in cargo, making it less visible in diagnostics and cargo metadata. On the other hand, this would require going through an MCP, implementing this one-off unstable feature, switching cargo to it, duplicating the parser logic in all relevant tools, getting the rustc_lexer change implemented and merged, and then reverting all of that previous work. I can easily see forgetting about some of these implementations until the stabilization PR when features break when running without a -Z flag.

@oli-obk
Copy link
Contributor

oli-obk commented Mar 28, 2025

That's the part I don't get. There seem to be many conflicting requirements that led to this confusing situation that we're trying to understand and you're trying to explain to us.

  • the tools have to process it anyway.
  • What you are asking for is for rustc to ignore the frontmatter so that the tools don't have to support it right away
  • at some point all tools process it, and the rustc code is never visible because all tools process this front matter and report any errors
  • only cargo is supported
  • cargo can already process it
  • cargo shouldn't tell rustc about the fact that it already did and rustc should redo the work
  • rustc shouldn't have good diagnostics for it because tools always preprocess it
  • not all tools may process it, so rustc should process it at the most basic level

On the other hand, this would require going through an MCP,

Adding a new -Z flag doesn't need an MCP

implementing this one-off unstable feature,

@fee1-dead volunteered above

switching cargo to it,

you already have to do all the work, emitting another -Z flag should not be more work than just removing the new file generation which you were going to do anyway

duplicating the parser logic in all relevant tools,

this was gonna happen anyway as they'll want their own front matter some time in the future. And you can just pass the same flag to them for now instead of implementing parser logic for frontmatter

getting the rustc_lexer change implemented and merged, and then reverting all of that previous work.

The change may be just to add another -C or similar flag during stabilization, not requiring any work to be reverted, but just a bunch of -Z be turned into -C. If we add full parsing for this to rustc, we'll need an RFC for what we want inside of that anyway. It seems generally more powerful to let cargo and other build tools just skip a bunch of bytes at the start of files entirely for their own reasons.

I can easily see forgetting about some of these implementations until the stabilization PR when features break when running without a -Z flag.

We can just keep around the -Z flag along with the future -C flag for a while to ease the transition

@epage
Copy link
Contributor Author

epage commented Mar 31, 2025

That's the part I don't get. There seem to be many conflicting requirements that led to this confusing situation that we're trying to understand and you're trying to explain to us.

There is one definition of "rust file syntax" (being specific to leave off things like rustdoc, etc). A header is included in this syntax that external tools may choose to read or write without having to have knowledge of all of the "rust file syntax". Any tools expected to work with "rust file syntax" (syntax highlighers, syn, treesitter, etc) would recognize this syntax as part of Rust.

The expected path for implementation was that rustc would parse the header. For any tool that doesn't need special support for it, this is all they need.

  • rustfmt's MVP doesn't need anything more. In the future, rustfmt likely will want richer token data from rustc's internal APIs
    • In the future, cargo fmt may need to make edits to the content inside the header. Unsure what they would use for doing this (cargo or rustc's implementation)
  • rustdoc's MVP doesn't need anything more. In the future, rustdoc may want improved diagnostics when these show up in doctests.
  • Unsure what r-a's MVP is but to at least let users to some better results, rustc's ignoring of this is what they need

If rustc won't accept this header, is it actually part of the Rust syntax or is this a Cargo container format around Rust?

If the header is Rust syntax but not supported in rustc, what does that mean for the Rust Reference, the Rust Spec, and third-parties implementing support (e.g. treesitter, syn, etc)?

If this is Rust syntax but not supported in rustc, that means it could be supported in the future. If rustc defers support for this syntax, will T-compiler be able to provide as meaningful feedback before it gets stabilized and locked-in?

If this is Rust syntax, why are we not supporting it every place Rust syntax is expected to work (e.g. in mod.rs files, in main.rs files that also have a Cargo.toml)? One of the workflows considered when developing this feature is that a Python user can run any file within your project (so long as it is "independent enough"). Another workflow that has come up since is being able to document the dependencies needed by examples. While this is blindly injecting the content into examples/*.rs right now, when we get workspace support, they can be both examples and packages. These require cases where Cargo does not know to strip the content for the compiler and would require extra work to support. Thats just speaking to potential Cargo workflows.

So within that context

the tools have to process it anyway.

Yes but for some, that processing is just "use rustc's API".

What you are asking for is for rustc to ignore the frontmatter so that the tools don't have to support it right away

No, this is the first of a linear multi-step process for all of the Rust Project tools to support this, not to defer their implementation.

at some point all tools process it, and the rustc code is never visible because all tools process this front matter and report any errors

In the short to medium term, I expect this header to only be used for Cargo. Cargo will be expected to do any error reporting before rustc gets to it. There are cases where a user could use this within files within a package that has a Cargo.toml. Cargo won't be doing the error reporting then. I'd like to further explore the potential for using this in doctests. If that ever happens, Cargo won't be doing it then either.

rustc shouldn't have good diagnostics for it because tools always preprocess it

I did not say "rustc shouldn't have good diagnostics". What it shouldn't be is a blocker adding unstable support for this feature. I question whether its a blocker for stabilizing an MVP but I recognize that I have shaky ground to stand on for that.

Adding a new -Z flag doesn't need an MCP

Ah, I misunderstood as I've seen others have one.

The change may be just to add another -C or similar flag during stabilization, not requiring any work to be reverted, but just a bunch of -Z be turned into -C.

My comment was written from the perspective of us coming back to take the approach from this PR.

If we add full parsing for this to rustc, we'll need an RFC for what we want inside of that anyway. It seems generally more powerful to let cargo and other build tools just skip a bunch of bytes at the start of files entirely for their own reasons.

We had an RFC and it said that the Rust language doesn't care whats inside of he header.

@fee1-dead
Copy link
Member

fee1-dead commented Apr 1, 2025

Okay, so it looks like I misunderstood the motivation behind this PR. My original thought was that this PR wanted it so that a tool that needs parsing the frontmatter and is already parsing the frontmatter does not have to do the work of either pre-stripping the file to a temporary file or something else. Hence we make rustc support just parsing and then skipping it.

Some feedback, I gave my motivation for and that seemed to be accepted

What I said above about requiring diagnostics is a nice to have if the primary motivation is the one I said above. And when I received pushback, I (begrudgingly) accepted that this PR can land, based on the assumption that the intention for this PR is to make cargo's life better.

However, if the intention from the very beginning is to add syntax support for rustc, or to implement the lang RFC. There's a higher bar for that. And I cannot support this PR being the starting point for properly landing syntax support, i.e. it must have good diagnostics, that's an absolute minimum for me. Syntax support under no contexts should mean rustc should ignore it when the user has done the syntax wrong.

But look, I already offered to implement it, so unless you want to actually write the code that are deemed important by at least two compiler maintainers now, I'm always up for doing the work. I also don't want to make anything look like a hostile takeover of your work. We're way past the time this PR would have landed. It would have landed if diagnostics were actually implemented. It would have landed if my nits were applied and there are no outstanding comments left to be responded to. But now we're just running in circles, putting countless words into this PR where none is needed if the code actually meets the standards of the compiler.

And that is the part that confuses me the most. Feel free to privately or publicly reach out to any compiler team member, or reroll this PR assignment (I don't speak for oli though, but it looks like he felt the diagnostics are important too) if you really believe that this code in this form is good to land, and that reviewers just need to be convinced that this code is already good. I just don't think that strategy is beneficial.

@oli-obk
Copy link
Contributor

oli-obk commented Apr 1, 2025

Ok, so wrt code review. I would like it to be implemented in the lexer (rustc_lexer::Cursor::advance_token). It's impossible for --- to occur anywhere in Rust source, so we can basically parse it similar to /*, even if we later (in parsing) reject it anywhere but at the beginning of a file based module (so not in inline modules).

I think that would simplify a few things as you

  • can reuse the lexing functions to process the bytes instead of having to manually handle spans and such
  • get diagnostics about accidentally adding things in front of the frontmatter (which the playground likes to do even for feature gates, which also have to be before everything else)
  • don't need to rewrite it when rustc/rustfmt are supposed to handle this in the future

@epage
Copy link
Contributor Author

epage commented Apr 1, 2025

But look, I already offered to implement it, so unless you want to actually write the code that are deemed important by at least two compiler maintainers now, I'm always up for doing the work.

If you are willing to implement syntax support for frontmatters in rustc, I'm fine with you taking this over. I must have missed that offer. I only remember you offering to implement the byte stripping though I lost track of that through the conversation

Ok, so wrt code review. I would like it to be implemented in the lexer (rustc_lexer::Cursor::advance_token). It's impossible for --- to occur anywhere in Rust source, so we can basically parse it similar to /*, even if we later (in parsing) reject it anywhere but at the beginning of a file based module (so not in inline modules).

I think I might be missing some of what you are intending because it appears you can have --- in Rust source (playground).

What would be the recommended Token break down for this? Rustfmt would eventually need to track open, info string, body, close, and maybe any of the whitespace between any of that. Some options I see:

  • Be like TokenKind::BlockComment and they need to re-parse the content inside (which ends up not too different from this PR)
  • Like TokenKind::Literal, we could have offsets all inside one token though that could get unwieldy and bloat the size of TokenKind which has the potential to negatively impact performance.
  • As multiple tokens, we'd need to implement more of a state machine inside of Cursor to know to specially handle the inside content

I switched to this PR's approach at the recommendation of others when working through these kinds of problems

@wesleywiser
Copy link
Member

wesleywiser commented Apr 1, 2025

There seem to be many conflicting requirements that led to this confusing situation that we're trying to understand and you're trying to explain to us.

This list doesn't seem quite right to me but perhaps I'm missing some of the details here. After reading the RFC, my understanding is that:

  • Rust (the language) now supports a single frontmatter that may optionally appear after the shebang line. This has no effect whatsoever on the compiler and should be entirely ignored except for the purpose of lexing/parsing to find the closing delimiter.
  • Cargo supports reading this frontmatter section and interpreting it.

The RFC says:

As cargo will be the first step in the process to parse this, the responsibility for high quality error messages will largely fall on cargo.

I agree that is true of errors originating from the content of the frontmatter block, but the block itself is a syntactic element of the language now and should be properly treated for the purposes of error reporting just like block comments are. We will want to have proper error reporting for things like unclosed frontmatter sections even if Cargo explicitly told rustc where the file content should start because a user could always place one later in the file and we want to give them a useful error message.

Therefore, I think the core requirement from the compiler side is that the frontend should recognize the frontmatter block and produce errors if the block is not immediately at the start of the file, the file contains more than one block at any position or the block is not closed. Whether that happens in the lexer or the parser, I'm not sure of the best approach.

Since Cargo also needs to locate and parse the frontmatter block, it seems wasteful to implement duplicate error reporting for malformed frontmatter. I would suggest that if Cargo cannot successfully parse the frontmatter block itself (eg it does not find 3 or more - characters immediately after the shebang line that are terminated by the same number of characters), it should just invoke rustc and let rustc handle the error reporting for the malformed block.

I agree that we need proper error reporting for syntactically invalid frontmatter blocks, but I don't personally think that has to be done in this very PR provided one or more bugs are opened to track that work and marked as open issues on the tracking issue and there is no regression in quality of other error messages.

cc @Noratrieb since you also suggested an implementation for this feature on that Zulip thread 🙂

@epage
Copy link
Contributor Author

epage commented Apr 2, 2025

@wesleywiser

We will want to have proper error reporting for things like unclosed frontmatter sections even if Cargo explicitly told rustc where the file content should start because a user could always place one later in the file and we want to give them a useful error message.

Therefore, I think the core requirement from the compiler side is that the frontend should recognize the frontmatter block and produce errors if the block is not immediately at the start of the file, the file contains more than one block at any position or the block is not closed. Whether that happens in the lexer or the parser, I'm not sure of the best approach.

...

I agree that we need proper error reporting for syntactically invalid frontmatter blocks, but I don't personally think that has to be done in this very PR provided one or more bugs are opened to track that work and marked as open issues on the tracking issue and there is no regression in quality of other error messages.

So if I'm understanding, you see reporting of meanigful errors for misplaced, malformed, and multiple frontmatters as a blocker for stabilizing an MVP?

Since Cargo also needs to locate and parse the frontmatter block, it seems wasteful to implement duplicate error reporting for malformed frontmatter. I would suggest that if Cargo cannot successfully parse the frontmatter block itself (eg it does not find 3 or more - characters immediately after the shebang line that are terminated by the same number of characters), it should just invoke rustc and let rustc handle the error reporting for the malformed block.

This approach was initially discussed at #137193 (comment)

@wesleywiser
Copy link
Member

wesleywiser commented Apr 2, 2025

So if I'm understanding, you see reporting of meanigful errors for misplaced, malformed, and multiple frontmatters as a blocker for stabilizing an MVP?

For me, it's a blocker for stabilization of the feature. The helpfulness/friendliness of the errors is certainly debatable and I wouldn't block stabilization on having them be absolutely perfect but the status quo does not reach the bar for me. I would say the errors at least need to describe what element the compiler thinks it saw and some useful suggestions for resolving common issues (eg, the unclosed block case should tell you how to close the frontmatter block).

As I said, I wouldn't consider that to be a blocker for landing an initial PR though as long we have issues tracking the necessary improvements to error messages and the changes do not regress the UX for any other cases.

In case it is helpful, let me provide concrete suggestions for what I think some of the errors should look like before stabilization:

  1. Unclosed block:
---
//~^ ERROR unclosed frontmatter block
//~| HELP use `---` to close the block

#![feature(frontmatter)]

fn main() {
}
  1. Invalid infostring:
---cargo,hello-world
//~^ ERROR invalid infostring identifier
//~| NOTE identifiers may not contain `,` characters

---

#![feature(frontmatter)]

fn main() {
}
  1. Multiple frontmatters
---cargo
---

---buck
//~^ ERROR only a single frontmatter may appear in a Rust file
---

#![feature(frontmatter)]

fn main() {
}

@fee1-dead
Copy link
Member

Just let me know when I can start the work, either after this lands or now :)

@oli-obk
Copy link
Contributor

oli-obk commented Apr 2, 2025

r? @wesleywiser

@rustbot rustbot assigned wesleywiser and unassigned oli-obk Apr 2, 2025
@oli-obk
Copy link
Contributor

oli-obk commented Apr 2, 2025

In case it is helpful, let me provide concrete suggestions for what I think some of the errors should look like before stabilization:

I think we should also detect

#![feature(frontmatter)]
---cargo
//~^ ERROR frontmatter block after items
---

fn main() {
}

@epage
Copy link
Contributor Author

epage commented Apr 2, 2025

Just let me know when I can start the work, either after this lands or now :)

Some circumstances have changed at work such that I will be unavailable to give this any of my attention, even to discuss this, for the next week. If you are able to move anything forward in that time, feel free to do so. Otherwise, the exact order depends on what direction the implementation for this PR needs to go. Either way, I appreciate the help!

@bors
Copy link
Collaborator

bors commented Apr 3, 2025

☔ The latest upstream changes (presumably #139301) made this pull request unmergeable. Please resolve the merge conflicts.

@fee1-dead
Copy link
Member

Update: I got a considerable amount of progress today implementing it in the lexer -> parser level with proper diagnostics (WIP commit). I expect to have a PR ready early next week.

@fmease
Copy link
Member

fmease commented Apr 13, 2025

Whatever PR wins out, please make sure that frontmatter is not observable via <TokenStream as FromStr>::from_str (in proc-macro crates) just like shebangs. That's still something everybody agrees on hopefully (I've been subscribed to this thread since the very beginning and did read all messages but it's been a while)?

Potential UI test (draft):

extern proc_macro;
use proc_macro::TokenStream;

#[proc_macro]
pub fn check(_: TokenStream) -> TokenStream {
    assert!("---\n---".parse::<TokenStream>().unwrap().is_empty());
    Default::default()
}
//@ check-pass
//@ proc-macro: makro.rs
//@ edition: 2021

makro::check!();

fn main() {}

fee1-dead added a commit to fee1-dead-contrib/rust that referenced this pull request Apr 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F-frontmatter `#![feature(frontmatter)]` S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants