-
Notifications
You must be signed in to change notification settings - Fork 603
Improve parsing of JSON accesses on Postgres and Snowflake #1215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pull Request Test Coverage Report for Build 8896826786Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jmhain! the changes look good to me overall - left some comments re parsing setup and potentially reusing the map syntax to express this case
src/parser/mod.rs
Outdated
Token::Colon if path.is_empty() => { | ||
path.push(self.parse_variant_object_key()?); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to instead only call this function when we expect a variant access? That would let us avoid this special case as well as the path.is_empty
checks since we can 'expect' the colon (and potentially the first part) before entering the loop to parse the remaining parts.
i.e. at the caller we would peek_token
and only call this if we see a :
, otherwise we may parse it as a map access
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A variant access can begin with either bracket or "dot" syntax (the colon) corresponding to the two sites where this function is called, so the :
token is not necessarily required initially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was thinking rather that the bracket syntax is identical to the existing map expr so it would be reasonable to call that function instead of this one (hence the peek_token beforehand)
src/ast/mod.rs
Outdated
VariantAccess { | ||
/// The VARIANT being queried. | ||
value: Box<Expr>, | ||
/// The path to the data inside the VARIANT to extract. | ||
path: Vec<VariantPathElem>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering would it be possible to reuse the existing MapAccess
expr variant to represent this scenario?
thinking that e.g with an example like below, the only difference would be an extension to the existing syntax enum
enum MapAccessSyntax {
// ...
/// Access using colon notation. `mymap:mykey`
Colon,
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this comment, I appreciate your review! So I did consider that, but my interpretation of this syntax is that :
is an operator that takes a VARIANT
expression and a JSON path expression as a single syntactic element (VARIANT : JSONPATH
).
Another SQL dialect with the same syntax explicitly says this is the case:
https://docs.databricks.com/en/sql/language-manual/functions/colonsign.html
https://docs.databricks.com/en/sql/language-manual/sql-ref-json-path-expression.html
(If we do keep this, I should actually rename it back to JsonAccess
since I do plan to add Databricks as a dialect here in the future...)
Tangentially, there are also some things I'm not happy with in MapAccess
, particularly MapAccessSyntax::Dot
:
MapAccessSyntax::Dot
overlaps withCompositeAccess
.MapAccessKey
contains anExpr
even when using dot syntax, which can only legally containIdentifier
orDoubleQuotedString
in that scenario. It would make more sense to use anIdent
in that case.
So my broader preference here is:
- Use
CompositeAccess
for all non-jsonpath dot accesses. - Use
MapAccess
for all non-jsonpath bracketed accesses (i.e. eliminateMapAccessSyntax
). - Use
JsonAccess
for all jsonpath accesss.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I did consider that, but my interpretation of this syntax is that : is an operator that takes a VARIANT expression and a JSON path expression as a single syntactic element (VARIANT : JSONPATH).
hmm not sure if that matters a lot from a parsing/ast perspective - in that were not required to represent that literally so we have some leeway to be flexible if it lets us simplify the code/representation?
So my broader preference here is:
I think a reasonable goal to aim for could be to reuse the same solution for both the map and the variant expressions - since beyond the leading :
special case, they are identical? I think this probably means merging both MapAccess and JsonAccess based on your proposal? If that seems reasonable then yeah feel free to make changes to that effect if it means revamping the current MapAccess setup.
src/ast/mod.rs
Outdated
write!(f, "{value}")?; | ||
for (i, elem) in path.iter().enumerate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the variant remains it might be good to instead impl display on the VariantPathElem
and here use the display_separated
function so that we avoid looping and the same logic an be reused in other references to VariantPathElem
in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trouble with that is that an individual VariantPathElem
needs to know if it's the initial path segment to render as a :
or a .
(and I don't want to add VariantPathElem::Colon
because (a) the two current variants map to the two documented Snowflake syntaxes and (b) it makes more AST states unrepresentable (e.g. if a user manually constructs an AST to render, they have to remember this detail). Another option if we wanted to factor this out would be to wrap the path elements in a VariantPath
type, though that doesn't really seem worth it to me I went ahead and made this change.
Co-authored-by: Ifeanyi Ubah <[email protected]>
Co-authored-by: Ifeanyi Ubah <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this will result in non trivial downstream API churn, I think it is a good improvement and sets the stage elegantly for arbitrary path access
really nice 👏
It just needs to resolve some conflicts and I think it will be good to merge.
BTW I hope to do a release this week #1214
left: Box<Expr>, | ||
operator: JsonOperator, | ||
right: Box<Expr>, | ||
/// The value being queried. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes much more sense to me
select.projection | ||
); | ||
|
||
// dot and bracket notation can be mixed (starting with : case) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very nice (and clever)
@alamb Conflicts resolved! |
🚀 |
Thanks again @jmhain |
Co-authored-by: Ifeanyi Ubah <[email protected]>
The following sql seems fail to pass after this patch: select make_array(1, 2, 3)[1:2], make_array(1.0, 2.0, 3.0)[2:3], make_array('h', 'e', 'l', 'l', 'o')[2:4];
|
Thank you for debugging this @tisonkun -- is this something we should create a PR to fix? |
Sorry about the regression here! I was initially a bit confused since we don't appear to have explicit support or tests for this array slice syntax, but after a quick investigation I can see what happened. This is what
We were parsing the contents of the |
Thank you @jmhain I am likely going to do another sqlparser release the week after next, though I can do a patch release sooner if we can get a fix for this. Unfortuantely I don't have the time to work on a fix myself I did file a ticket to track the issue so it doesn't get lost in this merged PR: #1283 |
This PR fixes a few issues I found in the parsing of JSON accesses on these two dialects.
Value::UnQuotedString
produced byparse_value
is problematic and likely to lead to misparses, so I got rid of it.I also added several tests, most of which were previously failing.