Skip to content

RFC Source spans #393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

RFC Source spans #393

wants to merge 3 commits into from

Conversation

antialize
Copy link
Contributor

In order to give nice error messages, it would be helpful to have byte spans on Token and Ast Nodes.

This could be used with a library such as codespan-reporting to produce more usefull error messages:

error: sql parser error: Expected end of statement, found: ENGINE
   ┌─ test.sql:34:3
   │
34 │ ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;
   │   ^^^^^^

It is hard to do this in a none invasive way. In this PR I have added a trait called Spanned that is implemented for Token, where the span() method returns the Span of the token. I have also implemented Spanned for ParseError such that one can get the span of the primary cause of a parse error.

I would like to also produce error messages afterwards by performing custom passes over the AST, in order to do that it would be helpful if the various AST nodes also implements Spanned. There are several ways this could be done, but I would propose
to store the spans explicitly on primitive values, and then store a few choice tokens in AST nodes, such as the 'SELECT' or 'CREATE TABLE' tokens. The span for complex AST nodes could then be computed by merging the spans of the children. This has the benefit over storing an explicit span on all notes, that it is easier to produce a modified AST (from a lowering pass or whatnot) without getting the span wrong. I have not done much of this yet as it is a lot of work that I would rather wait with doing, until I know if upstream thinks I am on the right path.

In the suggested implementation the Span struct is allowed to be unset. This is done such that one can produce a modified AST with new nodes without having to invent spans.

Do you want something like this in the code, and if so what form would you like it to have. There are many possible ways of doing it, this is just the one I found the most straightforward.

@leiysky
Copy link

leiysky commented Jan 26, 2022

Any update on this? It's very useful.

/// A byte span within the parsed string
#[derive(Debug, Eq, Clone, Copy)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
pub enum Span {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about making this a wrapper for Token?

Like:

struct WithSpan {
    token: Token,
    // position
    ...
}

Then you don't have to modify parser heavily, but wrapping the error with span at top of parser instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed a better approach. Since there did not seem to be much interest in this MR. I have started on my own parser to experiment with spans and proper error recovery:
Here I ended up having the lexer return (Token<'a>, Span)

pub fn next_token(&mut self) -> (Token<'a>, Span) {

I have also created a Spanned trait that is implemented for all AST nodes.

I need a parser that can produce spans for tokens and nodes, that can report multiple errors, and return an AST even if it encountered some errors.

I tried to see if I could add it to this crate, but it seems hard without breaking all existing uses of the library, perhaps it can be done somehow.

@alamb
Copy link
Contributor

alamb commented Feb 8, 2022

Sadly I don't have the time to help drive projects such as this. Unless one of the other maintainers wants to do so, PRs like this are not likely to be merged. I am sorry I don't have the time and I am sorry if your time was wasted.

As the current state of affairs might not have been clear in the project's README. I have added #416 to try and help

@alamb
Copy link
Contributor

alamb commented Apr 29, 2022

Thanks again for this PR. I am closing it for now to clean up the PR list. Please reopen if that was a mistake.

@alamb alamb closed this Apr 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants