-
Notifications
You must be signed in to change notification settings - Fork 602
Fix tokenization of qualified identifiers with numeric prefix. #1803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
0279883
0eda729
e32c3c8
aab48d4
a47e2ab
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -1926,6 +1926,106 @@ fn parse_select_with_numeric_prefix_column_name() { | |||||||||||||||
} | ||||||||||||||||
} | ||||||||||||||||
|
||||||||||||||||
#[test] | ||||||||||||||||
fn parse_qualified_identifiers_with_numeric_prefix() { | ||||||||||||||||
// Case 1: Qualified column name that starts with digits. | ||||||||||||||||
mysql().verified_stmt("SELECT t.15to29 FROM my_table AS t"); | ||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could we add a test case for the behavior of multiple accesses e.g. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good idea, I added such a test. |
||||||||||||||||
match mysql() | ||||||||||||||||
.parse_sql_statements("SELECT t.15to29 FROM my_table AS t") | ||||||||||||||||
.unwrap() | ||||||||||||||||
.pop() | ||||||||||||||||
{ | ||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
does this format work the same to remove the second parse statement? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that simplifies the test, I was simply ignorant of the return value. Thanks for the suggestion. |
||||||||||||||||
Some(Statement::Query(q)) => match *q.body { | ||||||||||||||||
SetExpr::Select(s) => match s.projection.last() { | ||||||||||||||||
Some(SelectItem::UnnamedExpr(Expr::CompoundIdentifier(parts))) => { | ||||||||||||||||
assert_eq!(&[Ident::new("t"), Ident::new("15to29")], &parts[..]); | ||||||||||||||||
} | ||||||||||||||||
proj => panic!("Unexpected projection: {:?}", proj), | ||||||||||||||||
}, | ||||||||||||||||
body => panic!("Unexpected statement body: {:?}", body), | ||||||||||||||||
}, | ||||||||||||||||
stmt => panic!("Unexpected statement: {:?}", stmt), | ||||||||||||||||
} | ||||||||||||||||
|
||||||||||||||||
// Case 2: Qualified column name that starts with digits and on its own represents a number. | ||||||||||||||||
mysql().verified_stmt("SELECT t.15e29 FROM my_table AS t"); | ||||||||||||||||
match mysql() | ||||||||||||||||
.parse_sql_statements("SELECT t.15e29 FROM my_table AS t") | ||||||||||||||||
.unwrap() | ||||||||||||||||
.pop() | ||||||||||||||||
{ | ||||||||||||||||
Some(Statement::Query(q)) => match *q.body { | ||||||||||||||||
SetExpr::Select(s) => match s.projection.last() { | ||||||||||||||||
Some(SelectItem::UnnamedExpr(Expr::CompoundIdentifier(parts))) => { | ||||||||||||||||
assert_eq!(&[Ident::new("t"), Ident::new("15e29")], &parts[..]); | ||||||||||||||||
} | ||||||||||||||||
proj => panic!("Unexpected projection: {:?}", proj), | ||||||||||||||||
}, | ||||||||||||||||
body => panic!("Unexpected statement body: {:?}", body), | ||||||||||||||||
}, | ||||||||||||||||
stmt => panic!("Unexpected statement: {:?}", stmt), | ||||||||||||||||
} | ||||||||||||||||
|
||||||||||||||||
// Case 3: Unqualified, the same token is parsed as a number. | ||||||||||||||||
match mysql() | ||||||||||||||||
.parse_sql_statements("SELECT 15e29 FROM my_table") | ||||||||||||||||
.unwrap() | ||||||||||||||||
.pop() | ||||||||||||||||
{ | ||||||||||||||||
Some(Statement::Query(q)) => match *q.body { | ||||||||||||||||
SetExpr::Select(s) => match s.projection.last() { | ||||||||||||||||
Some(SelectItem::UnnamedExpr(Expr::Value(ValueWithSpan { value, .. }))) => { | ||||||||||||||||
assert_eq!(&number("15e29"), value); | ||||||||||||||||
} | ||||||||||||||||
proj => panic!("Unexpected projection: {:?}", proj), | ||||||||||||||||
}, | ||||||||||||||||
body => panic!("Unexpected statement body: {:?}", body), | ||||||||||||||||
}, | ||||||||||||||||
stmt => panic!("Unexpected statement: {:?}", stmt), | ||||||||||||||||
} | ||||||||||||||||
|
||||||||||||||||
// Case 4: Quoted simple identifier. | ||||||||||||||||
mysql().verified_stmt("SELECT `15e29` FROM my_table"); | ||||||||||||||||
match mysql() | ||||||||||||||||
.parse_sql_statements("SELECT `15e29` FROM my_table") | ||||||||||||||||
.unwrap() | ||||||||||||||||
.pop() | ||||||||||||||||
{ | ||||||||||||||||
Some(Statement::Query(q)) => match *q.body { | ||||||||||||||||
SetExpr::Select(s) => match s.projection.last() { | ||||||||||||||||
Some(SelectItem::UnnamedExpr(Expr::Identifier(name))) => { | ||||||||||||||||
assert_eq!(&Ident::with_quote('`', "15e29"), name); | ||||||||||||||||
} | ||||||||||||||||
proj => panic!("Unexpected projection: {:?}", proj), | ||||||||||||||||
}, | ||||||||||||||||
body => panic!("Unexpected statement body: {:?}", body), | ||||||||||||||||
}, | ||||||||||||||||
stmt => panic!("Unexpected statement: {:?}", stmt), | ||||||||||||||||
} | ||||||||||||||||
|
||||||||||||||||
// Case 5: Quoted compound identifier. | ||||||||||||||||
mysql().verified_stmt("SELECT t.`15e29` FROM my_table"); | ||||||||||||||||
match mysql() | ||||||||||||||||
.parse_sql_statements("SELECT t.`15e29` FROM my_table AS t") | ||||||||||||||||
.unwrap() | ||||||||||||||||
.pop() | ||||||||||||||||
{ | ||||||||||||||||
Some(Statement::Query(q)) => match *q.body { | ||||||||||||||||
SetExpr::Select(s) => match s.projection.last() { | ||||||||||||||||
Some(SelectItem::UnnamedExpr(Expr::CompoundIdentifier(parts))) => { | ||||||||||||||||
assert_eq!( | ||||||||||||||||
&[Ident::new("t"), Ident::with_quote('`', "15e29")], | ||||||||||||||||
&parts[..] | ||||||||||||||||
); | ||||||||||||||||
} | ||||||||||||||||
proj => panic!("Unexpected projection: {:?}", proj), | ||||||||||||||||
}, | ||||||||||||||||
body => panic!("Unexpected statement body: {:?}", body), | ||||||||||||||||
}, | ||||||||||||||||
stmt => panic!("Unexpected statement: {:?}", stmt), | ||||||||||||||||
} | ||||||||||||||||
} | ||||||||||||||||
|
||||||||||||||||
// Don't run with bigdecimal as it fails like this on rust beta: | ||||||||||||||||
// | ||||||||||||||||
// 'parse_select_with_concatenation_of_exp_number_and_numeric_prefix_column' | ||||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we implement instead in the parser as a special for
self.dialect.supports_numeric_prefix()
? somewhat similar to what we do for BigQuery here. In that when its time to combine identifiers into a compound identifier we check if each subsequent identifier part is unquoted and prefixed by.
, and if so we drop the prefix. I imagine that would be done hereThinking that could be a smaller change since the behavior needs a bit of extra context around the tokens which the tokenizer isn't so good at handling cleanly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review and the suggestion. The latest commit implements an alternative approach within
Parser#parse_compound_expr
. However, I have to admit that I personally prefer the previous fix in the tokenizer, for a few reasons:The
Tokenizer
is part of the public API of the crate, and without fixing this in the tokenizer, it will continue to exhibit the following behavior with the Mysql dialect:t.1to2
tokenizes tot
(Word),.1to2
(Word)t.1e2
tokenizes tot
(Word),.1e2
(Number)t.`1to2`
tokenizes tot
(Word),.
(Period),1to2
(Word)t.`1e2`
tokenizes tot
(Word),.
(Period)1e2
(Word)This could be very surprising for users and arguably be considered incorrect (when using the Mysql dialect).
The handling of
Word
andNumber
tokens inparse_compound_expr
is not the most elegant with having to split off the.
and having to create the correct off-by-one spans accordingly.Unqualified identifiers that start with digits are already handled in the tokenizer - and I think have to be (see Support identifiers beginning with digits in MySQL #856). Handling the same problem of misinterpreted tokens for qualified identifiers in the parser seems a bit disconnected and a like a downstream solution to the tokenizer producing incorrect tokens, at least that is how I currently perceive it (see point 1).
Let me know which solution you prefer (with or without the latest commit) or if you have another idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's fair, we can revert back to the tokenizer version in that case? to keep the tokenizer behavior non-surprising
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!