RFC Source spans #393

antialize · 2021-12-26T23:47:42Z

In order to give nice error messages, it would be helpful to have byte spans on Token and Ast Nodes.

This could be used with a library such as codespan-reporting to produce more usefull error messages:

error: sql parser error: Expected end of statement, found: ENGINE
   ┌─ test.sql:34:3
   │
34 │ ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;
   │   ^^^^^^

It is hard to do this in a none invasive way. In this PR I have added a trait called Spanned that is implemented for Token, where the span() method returns the Span of the token. I have also implemented Spanned for ParseError such that one can get the span of the primary cause of a parse error.

I would like to also produce error messages afterwards by performing custom passes over the AST, in order to do that it would be helpful if the various AST nodes also implements Spanned. There are several ways this could be done, but I would propose
to store the spans explicitly on primitive values, and then store a few choice tokens in AST nodes, such as the 'SELECT' or 'CREATE TABLE' tokens. The span for complex AST nodes could then be computed by merging the spans of the children. This has the benefit over storing an explicit span on all notes, that it is easier to produce a modified AST (from a lowering pass or whatnot) without getting the span wrong. I have not done much of this yet as it is a lot of work that I would rather wait with doing, until I know if upstream thinks I am on the right path.

In the suggested implementation the Span struct is allowed to be unset. This is done such that one can produce a modified AST with new nodes without having to invent spans.

Do you want something like this in the code, and if so what form would you like it to have. There are many possible ways of doing it, this is just the one I found the most straightforward.

leiysky · 2022-01-26T13:50:52Z

Any update on this? It's very useful.

leiysky · 2022-01-26T13:59:58Z

src/span.rs

+/// A byte span within the parsed string
+#[derive(Debug, Eq, Clone, Copy)]
+#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
+pub enum Span {


What about making this a wrapper for Token?

Like:

struct WithSpan { token: Token, // position ... }

Then you don't have to modify parser heavily, but wrapping the error with span at top of parser instead.

This is indeed a better approach. Since there did not seem to be much interest in this MR. I have started on my own parser to experiment with spans and proper error recovery:
Here I ended up having the lexer return (Token<'a>, Span)

pub fn next_token(&mut self) -> (Token<'a>, Span) {

I have also created a Spanned trait that is implemented for all AST nodes.

I need a parser that can produce spans for tokens and nodes, that can report multiple errors, and return an AST even if it encountered some errors.

I tried to see if I could add it to this crate, but it seems hard without breaking all existing uses of the library, perhaps it can be done somehow.

alamb · 2022-02-08T15:30:04Z

Sadly I don't have the time to help drive projects such as this. Unless one of the other maintainers wants to do so, PRs like this are not likely to be merged. I am sorry I don't have the time and I am sorry if your time was wasted.

As the current state of affairs might not have been clear in the project's README. I have added #416 to try and help

alamb · 2022-04-29T18:08:23Z

Thanks again for this PR. I am closing it for now to clean up the PR list. Please reopen if that was a mistake.

antialize force-pushed the spans branch from 44faf1a to e7ec126 Compare December 26, 2021 23:48

antialize added 3 commits December 27, 2021 16:07

Add span struct representing a continious parsed span

ab8bfef

Include spans in tokens

74f84ad

Add spans on parse errors

b732168

antialize force-pushed the spans branch from e7ec126 to b732168 Compare December 27, 2021 15:07

leiysky reviewed Jan 26, 2022

View reviewed changes

alamb mentioned this pull request Feb 8, 2022

Clarify maintenance status on README #416

Merged

alamb closed this Apr 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC Source spans #393

RFC Source spans #393

antialize commented Dec 26, 2021

leiysky commented Jan 26, 2022 •

edited

Loading

leiysky Jan 26, 2022

antialize Jan 26, 2022

alamb commented Feb 8, 2022

alamb commented Apr 29, 2022

RFC Source spans #393

RFC Source spans #393

Conversation

antialize commented Dec 26, 2021

leiysky commented Jan 26, 2022 • edited Loading

leiysky Jan 26, 2022

Choose a reason for hiding this comment

antialize Jan 26, 2022

Choose a reason for hiding this comment

alamb commented Feb 8, 2022

alamb commented Apr 29, 2022

leiysky commented Jan 26, 2022 •

edited

Loading