Skip to content

Commit 3c8fd74

Browse files
Nyroxiffyioalamb
authored
Implement Spanned to retrieve source locations on AST nodes (#1435)
Co-authored-by: Ifeanyi Ubah <[email protected]> Co-authored-by: Andrew Lamb <[email protected]>
1 parent 0adec33 commit 3c8fd74

18 files changed

+3092
-399
lines changed

README.md

+17
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,23 @@ similar semantics are represented with the same AST. We welcome PRs to fix such
100100
issues and distinguish different syntaxes in the AST.
101101

102102

103+
## WIP: Extracting source locations from AST nodes
104+
105+
This crate allows recovering source locations from AST nodes via the [Spanned](https://docs.rs/sqlparser/latest/sqlparser/ast/trait.Spanned.html) trait, which can be used for advanced diagnostics tooling. Note that this feature is a work in progress and many nodes report missing or inaccurate spans. Please see [this document](./docs/source_spans.md#source-span-contributing-guidelines) for information on how to contribute missing improvements.
106+
107+
```rust
108+
use sqlparser::ast::Spanned;
109+
110+
// Parse SQL
111+
let ast = Parser::parse_sql(&GenericDialect, "SELECT A FROM B").unwrap();
112+
113+
// The source span can be retrieved with start and end locations
114+
assert_eq!(ast[0].span(), Span {
115+
start: Location::of(1, 1),
116+
end: Location::of(1, 16),
117+
});
118+
```
119+
103120
## SQL compliance
104121

105122
SQL was first standardized in 1987, and revisions of the standard have been

docs/source_spans.md

+52
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
2+
## Breaking Changes
3+
4+
These are the current breaking changes introduced by the source spans feature:
5+
6+
#### Added fields for spans (must be added to any existing pattern matches)
7+
- `Ident` now stores a `Span`
8+
- `Select`, `With`, `Cte`, `WildcardAdditionalOptions` now store a `TokenWithLocation`
9+
10+
#### Misc.
11+
- `TokenWithLocation` stores a full `Span`, rather than just a source location. Users relying on `token.location` should use `token.location.start` instead.
12+
## Source Span Contributing Guidelines
13+
14+
For contributing source spans improvement in addition to the general [contribution guidelines](../README.md#contributing), please make sure to pay attention to the following:
15+
16+
17+
### Source Span Design Considerations
18+
19+
- `Ident` always have correct source spans
20+
- Downstream breaking change impact is to be as minimal as possible
21+
- To this end, use recursive merging of spans in favor of storing spans on all nodes
22+
- Any metadata added to compute spans must not change semantics (Eq, Ord, Hash, etc.)
23+
24+
The primary reason for missing and inaccurate source spans at this time is missing spans of keyword tokens and values in many structures, either due to lack of time or because adding them would break downstream significantly.
25+
26+
When considering adding support for source spans on a type, consider the impact to consumers of that type and whether your change would require a consumer to do non-trivial changes to their code.
27+
28+
Example of a trivial change
29+
```rust
30+
match node {
31+
ast::Query {
32+
field1,
33+
field2,
34+
location: _, // add a new line to ignored location
35+
}
36+
```
37+
38+
If adding source spans to a type would require a significant change like wrapping that type or similar, please open an issue to discuss.
39+
40+
### AST Node Equality and Hashes
41+
42+
When adding tokens to AST nodes, make sure to store them using the [AttachedToken](https://docs.rs/sqlparser/latest/sqlparser/ast/helpers/struct.AttachedToken.html) helper to ensure that semantically equivalent AST nodes always compare as equal and hash to the same value. F.e. `select 5` and `SELECT 5` would compare as different `Select` nodes, if the select token was stored directly. f.e.
43+
44+
```rust
45+
struct Select {
46+
select_token: AttachedToken, // only used for spans
47+
/// remaining fields
48+
field1,
49+
field2,
50+
...
51+
}
52+
```

src/ast/helpers/attached_token.rs

+82
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
// Licensed to the Apache Software Foundation (ASF) under one
2+
// or more contributor license agreements. See the NOTICE file
3+
// distributed with this work for additional information
4+
// regarding copyright ownership. The ASF licenses this file
5+
// to you under the Apache License, Version 2.0 (the
6+
// "License"); you may not use this file except in compliance
7+
// with the License. You may obtain a copy of the License at
8+
//
9+
// http://www.apache.org/licenses/LICENSE-2.0
10+
//
11+
// Unless required by applicable law or agreed to in writing,
12+
// software distributed under the License is distributed on an
13+
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
// KIND, either express or implied. See the License for the
15+
// specific language governing permissions and limitations
16+
// under the License.
17+
18+
use core::cmp::{Eq, Ord, Ordering, PartialEq, PartialOrd};
19+
use core::fmt::{self, Debug, Formatter};
20+
use core::hash::{Hash, Hasher};
21+
22+
use crate::tokenizer::{Token, TokenWithLocation};
23+
24+
#[cfg(feature = "serde")]
25+
use serde::{Deserialize, Serialize};
26+
27+
#[cfg(feature = "visitor")]
28+
use sqlparser_derive::{Visit, VisitMut};
29+
30+
/// A wrapper type for attaching tokens to AST nodes that should be ignored in comparisons and hashing.
31+
/// This should be used when a token is not relevant for semantics, but is still needed for
32+
/// accurate source location tracking.
33+
#[derive(Clone)]
34+
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
35+
#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))]
36+
pub struct AttachedToken(pub TokenWithLocation);
37+
38+
impl AttachedToken {
39+
pub fn empty() -> Self {
40+
AttachedToken(TokenWithLocation::wrap(Token::EOF))
41+
}
42+
}
43+
44+
// Conditional Implementations
45+
impl Debug for AttachedToken {
46+
fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
47+
self.0.fmt(f)
48+
}
49+
}
50+
51+
// Blanket Implementations
52+
impl PartialEq for AttachedToken {
53+
fn eq(&self, _: &Self) -> bool {
54+
true
55+
}
56+
}
57+
58+
impl Eq for AttachedToken {}
59+
60+
impl PartialOrd for AttachedToken {
61+
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
62+
Some(self.cmp(other))
63+
}
64+
}
65+
66+
impl Ord for AttachedToken {
67+
fn cmp(&self, _: &Self) -> Ordering {
68+
Ordering::Equal
69+
}
70+
}
71+
72+
impl Hash for AttachedToken {
73+
fn hash<H: Hasher>(&self, _state: &mut H) {
74+
// Do nothing
75+
}
76+
}
77+
78+
impl From<TokenWithLocation> for AttachedToken {
79+
fn from(value: TokenWithLocation) -> Self {
80+
AttachedToken(value)
81+
}
82+
}

src/ast/helpers/mod.rs

+1
Original file line numberDiff line numberDiff line change
@@ -14,5 +14,6 @@
1414
// KIND, either express or implied. See the License for the
1515
// specific language governing permissions and limitations
1616
// under the License.
17+
pub mod attached_token;
1718
pub mod stmt_create_table;
1819
pub mod stmt_data_loading;

src/ast/mod.rs

+75-9
Original file line numberDiff line numberDiff line change
@@ -23,16 +23,22 @@ use alloc::{
2323
string::{String, ToString},
2424
vec::Vec,
2525
};
26+
use helpers::attached_token::AttachedToken;
2627

27-
use core::fmt::{self, Display};
2828
use core::ops::Deref;
29+
use core::{
30+
fmt::{self, Display},
31+
hash,
32+
};
2933

3034
#[cfg(feature = "serde")]
3135
use serde::{Deserialize, Serialize};
3236

3337
#[cfg(feature = "visitor")]
3438
use sqlparser_derive::{Visit, VisitMut};
3539

40+
use crate::tokenizer::Span;
41+
3642
pub use self::data_type::{
3743
ArrayElemTypeDef, CharLengthUnits, CharacterLength, DataType, ExactNumberInfo,
3844
StructBracketKind, TimezoneInfo,
@@ -87,6 +93,9 @@ mod dml;
8793
pub mod helpers;
8894
mod operator;
8995
mod query;
96+
mod spans;
97+
pub use spans::Spanned;
98+
9099
mod trigger;
91100
mod value;
92101

@@ -131,7 +140,7 @@ where
131140
}
132141

133142
/// An identifier, decomposed into its value or character data and the quote style.
134-
#[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)]
143+
#[derive(Debug, Clone, PartialOrd, Ord)]
135144
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
136145
#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))]
137146
pub struct Ident {
@@ -140,17 +149,49 @@ pub struct Ident {
140149
/// The starting quote if any. Valid quote characters are the single quote,
141150
/// double quote, backtick, and opening square bracket.
142151
pub quote_style: Option<char>,
152+
/// The span of the identifier in the original SQL string.
153+
pub span: Span,
154+
}
155+
156+
impl PartialEq for Ident {
157+
fn eq(&self, other: &Self) -> bool {
158+
let Ident {
159+
value,
160+
quote_style,
161+
// exhaustiveness check; we ignore spans in comparisons
162+
span: _,
163+
} = self;
164+
165+
value == &other.value && quote_style == &other.quote_style
166+
}
143167
}
144168

169+
impl core::hash::Hash for Ident {
170+
fn hash<H: hash::Hasher>(&self, state: &mut H) {
171+
let Ident {
172+
value,
173+
quote_style,
174+
// exhaustiveness check; we ignore spans in hashes
175+
span: _,
176+
} = self;
177+
178+
value.hash(state);
179+
quote_style.hash(state);
180+
}
181+
}
182+
183+
impl Eq for Ident {}
184+
145185
impl Ident {
146-
/// Create a new identifier with the given value and no quotes.
186+
/// Create a new identifier with the given value and no quotes and an empty span.
147187
pub fn new<S>(value: S) -> Self
148188
where
149189
S: Into<String>,
150190
{
151191
Ident {
152192
value: value.into(),
153193
quote_style: None,
194+
span: Span::empty(),
154195
}
155196
}
156197

@@ -164,6 +205,30 @@ impl Ident {
164205
Ident {
165206
value: value.into(),
166207
quote_style: Some(quote),
208+
span: Span::empty(),
209+
}
210+
}
211+
212+
pub fn with_span<S>(span: Span, value: S) -> Self
213+
where
214+
S: Into<String>,
215+
{
216+
Ident {
217+
value: value.into(),
218+
quote_style: None,
219+
span,
220+
}
221+
}
222+
223+
pub fn with_quote_and_span<S>(quote: char, span: Span, value: S) -> Self
224+
where
225+
S: Into<String>,
226+
{
227+
assert!(quote == '\'' || quote == '"' || quote == '`' || quote == '[');
228+
Ident {
229+
value: value.into(),
230+
quote_style: Some(quote),
231+
span,
167232
}
168233
}
169234
}
@@ -173,6 +238,7 @@ impl From<&str> for Ident {
173238
Ident {
174239
value: value.to_string(),
175240
quote_style: None,
241+
span: Span::empty(),
176242
}
177243
}
178244
}
@@ -919,10 +985,10 @@ pub enum Expr {
919985
/// `<search modifier>`
920986
opt_search_modifier: Option<SearchModifier>,
921987
},
922-
Wildcard,
988+
Wildcard(AttachedToken),
923989
/// Qualified wildcard, e.g. `alias.*` or `schema.table.*`.
924990
/// (Same caveats apply to `QualifiedWildcard` as to `Wildcard`.)
925-
QualifiedWildcard(ObjectName),
991+
QualifiedWildcard(ObjectName, AttachedToken),
926992
/// Some dialects support an older syntax for outer joins where columns are
927993
/// marked with the `(+)` operator in the WHERE clause, for example:
928994
///
@@ -1211,8 +1277,8 @@ impl fmt::Display for Expr {
12111277
Expr::MapAccess { column, keys } => {
12121278
write!(f, "{column}{}", display_separated(keys, ""))
12131279
}
1214-
Expr::Wildcard => f.write_str("*"),
1215-
Expr::QualifiedWildcard(prefix) => write!(f, "{}.*", prefix),
1280+
Expr::Wildcard(_) => f.write_str("*"),
1281+
Expr::QualifiedWildcard(prefix, _) => write!(f, "{}.*", prefix),
12161282
Expr::CompoundIdentifier(s) => write!(f, "{}", display_separated(s, ".")),
12171283
Expr::IsTrue(ast) => write!(f, "{ast} IS TRUE"),
12181284
Expr::IsNotTrue(ast) => write!(f, "{ast} IS NOT TRUE"),
@@ -5432,8 +5498,8 @@ pub enum FunctionArgExpr {
54325498
impl From<Expr> for FunctionArgExpr {
54335499
fn from(wildcard_expr: Expr) -> Self {
54345500
match wildcard_expr {
5435-
Expr::QualifiedWildcard(prefix) => Self::QualifiedWildcard(prefix),
5436-
Expr::Wildcard => Self::Wildcard,
5501+
Expr::QualifiedWildcard(prefix, _) => Self::QualifiedWildcard(prefix),
5502+
Expr::Wildcard(_) => Self::Wildcard,
54375503
expr => Self::Expr(expr),
54385504
}
54395505
}

0 commit comments

Comments
 (0)