-
Notifications
You must be signed in to change notification settings - Fork 602
RFC Source spans #393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC Source spans #393
Conversation
Any update on this? It's very useful. |
/// A byte span within the parsed string | ||
#[derive(Debug, Eq, Clone, Copy)] | ||
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))] | ||
pub enum Span { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about making this a wrapper for Token
?
Like:
struct WithSpan {
token: Token,
// position
...
}
Then you don't have to modify parser heavily, but wrapping the error with span at top of parser instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indeed a better approach. Since there did not seem to be much interest in this MR. I have started on my own parser to experiment with spans and proper error recovery:
Here I ended up having the lexer return (Token<'a>, Span)
pub fn next_token(&mut self) -> (Token<'a>, Span) {
I have also created a Spanned trait that is implemented for all AST nodes.
I need a parser that can produce spans for tokens and nodes, that can report multiple errors, and return an AST even if it encountered some errors.
I tried to see if I could add it to this crate, but it seems hard without breaking all existing uses of the library, perhaps it can be done somehow.
Sadly I don't have the time to help drive projects such as this. Unless one of the other maintainers wants to do so, PRs like this are not likely to be merged. I am sorry I don't have the time and I am sorry if your time was wasted. As the current state of affairs might not have been clear in the project's README. I have added #416 to try and help |
Thanks again for this PR. I am closing it for now to clean up the PR list. Please reopen if that was a mistake. |
In order to give nice error messages, it would be helpful to have byte spans on Token and Ast Nodes.
This could be used with a library such as
codespan-reporting
to produce more usefull error messages:It is hard to do this in a none invasive way. In this PR I have added a trait called
Spanned
that is implemented forToken
, where thespan()
method returns theSpan
of the token. I have also implementedSpanned
forParseError
such that one can get the span of the primary cause of a parse error.I would like to also produce error messages afterwards by performing custom passes over the AST, in order to do that it would be helpful if the various AST nodes also implements
Spanned
. There are several ways this could be done, but I would proposeto store the spans explicitly on primitive values, and then store a few choice tokens in AST nodes, such as the 'SELECT' or 'CREATE TABLE' tokens. The span for complex AST nodes could then be computed by merging the spans of the children. This has the benefit over storing an explicit span on all notes, that it is easier to produce a modified AST (from a lowering pass or whatnot) without getting the span wrong. I have not done much of this yet as it is a lot of work that I would rather wait with doing, until I know if upstream thinks I am on the right path.
In the suggested implementation the Span struct is allowed to be unset. This is done such that one can produce a modified AST with new nodes without having to invent spans.
Do you want something like this in the code, and if so what form would you like it to have. There are many possible ways of doing it, this is just the one I found the most straightforward.