Skip to content

RFC Source spans #393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ pub mod ast;
pub mod dialect;
pub mod keywords;
pub mod parser;
pub mod span;
pub mod tokenizer;

#[doc(hidden)]
Expand Down
773 changes: 442 additions & 331 deletions src/parser.rs

Large diffs are not rendered by default.

142 changes: 142 additions & 0 deletions src/span.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#[cfg(not(feature = "std"))]
use alloc::{boxed::Box, vec::Vec};

#[cfg(feature = "serde")]
use serde::{Deserialize, Serialize};

/// A byte span within the parsed string
#[derive(Debug, Eq, Clone, Copy)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
pub enum Span {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about making this a wrapper for Token?

Like:

struct WithSpan {
    token: Token,
    // position
    ...
}

Then you don't have to modify parser heavily, but wrapping the error with span at top of parser instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed a better approach. Since there did not seem to be much interest in this MR. I have started on my own parser to experiment with spans and proper error recovery:
Here I ended up having the lexer return (Token<'a>, Span)

pub fn next_token(&mut self) -> (Token<'a>, Span) {

I have also created a Spanned trait that is implemented for all AST nodes.

I need a parser that can produce spans for tokens and nodes, that can report multiple errors, and return an AST even if it encountered some errors.

I tried to see if I could add it to this crate, but it seems hard without breaking all existing uses of the library, perhaps it can be done somehow.

Unset,
Set { start: usize, end: usize },
}

/// All spans are equal
impl PartialEq for Span {
fn eq(&self, _: &Self) -> bool {
true
}
}

/// All spans hash to the same value
impl core::hash::Hash for Span {
fn hash<H: core::hash::Hasher>(&self, _: &mut H) {}
}

impl Span {
pub fn new() -> Self {
Span::Unset
}

pub fn expanded(&self, item: &impl Spanned) -> Span {
match self {
Span::Unset => item.span(),
Span::Set { start: s1, end: e1 } => match item.span() {
Span::Unset => *self,
Span::Set { start: s2, end: e2 } => {
(usize::min(*s1, s2)..usize::max(*e1, e2)).into()
}
},
}
}

pub fn expand(&mut self, item: &impl Spanned) {
*self = self.expanded(item);
}

pub fn start(&self) -> Option<usize> {
match self {
Span::Unset => None,
Span::Set { start, .. } => Some(*start),
}
}

pub fn end(&self) -> Option<usize> {
match self {
Span::Unset => None,
Span::Set { end, .. } => Some(*end),
}
}

pub fn range(&self) -> Option<core::ops::Range<usize>> {
match self {
Span::Unset => None,
Span::Set { start, end } => Some(*start..*end),
}
}
}

impl Default for Span {
fn default() -> Self {
Span::Unset
}
}

impl core::convert::From<core::ops::Range<usize>> for Span {
fn from(r: core::ops::Range<usize>) -> Self {
Self::Set {
start: r.start,
end: r.end,
}
}
}

pub struct UnsetSpanError;

impl core::convert::TryInto<core::ops::Range<usize>> for Span {
type Error = UnsetSpanError;

fn try_into(self) -> Result<core::ops::Range<usize>, Self::Error> {
match self {
Span::Unset => Err(UnsetSpanError),
Span::Set { start, end } => Ok(start..end),
}
}
}

pub trait Spanned {
fn span(&self) -> Span;
}

impl Spanned for Span {
fn span(&self) -> Span {
*self
}
}

impl<T: Spanned> Spanned for Option<T> {
fn span(&self) -> Span {
match self {
Some(v) => v.span(),
None => Default::default(),
}
}
}

impl<T: Spanned> Spanned for Vec<T> {
fn span(&self) -> Span {
let mut ans = Span::new();
for v in self {
ans.expand(v);
}
ans
}
}

impl<T: Spanned> Spanned for Box<T> {
fn span(&self) -> Span {
self.as_ref().span()
}
}
Loading