-
Notifications
You must be signed in to change notification settings - Fork 605
Add parse_multipart_identifier function to parser #860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pull Request Test Coverage Report for Build 4969813312
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can tell this can only be invoked directly (by calling parse_multipart_identifier()
) and not automatically while parsing a query. Is that correct/intended?
Yes that's correct @ankrgyl It was partly inspired by how Spark has a similar method: https://github.com/apache/spark/blob/f8604ad14b24e8c657a0305b4fb8ad7efcb84060/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParseDriver.scala#L65-L70 To allow downstream to parse identifiers by themselves (not necessarily as part of a query) |
Ah got it. Personally I'd prefer to integrate it into the main parser so that we increase our overall support for Spark SQL and Postgres. @alamb wdyt? |
@@ -4687,6 +4687,54 @@ impl<'a> Parser<'a> { | |||
Ok(idents) | |||
} | |||
|
|||
/// Parse identifiers of form ident1[.identN]* | |||
pub fn parse_multipart_identifier(&mut self) -> Result<Vec<Ident>, ParserError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this function more closely, I wonder how if we can't reuse parse_identifiers
from above?
Or maybe we could add some more documentation (like docstrings / tests) for this function to make it clearer it is designed to parse strings into identifiers 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think parse_identifiers
above is the odd one out, as it accepts any delimiter between words/idents (except equal sign) which means it would parse test + one + two
as a valid sequence of identifiers. This is why I introduced the new parse_multipart_identifier
to be more strict about what constitutes a compound/multipart identifier.
Though I did base parse_multipart_identifier
primarily on Spark SQL/Postgres syntax, so maybe parse_identifiers
is generic to accommodate some other type of syntax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon some more reflection, I think we should accept this code and update the documentation strings on parse_multipart_identifier
and parse_identifiers
to explain in more detail what they do and how they are different.
I can try and find time to update the documentation maybe next week -- or @Jefffrey do you have time to do so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've update the doc a bit, let me know any further suggestions
@@ -4687,6 +4687,54 @@ impl<'a> Parser<'a> { | |||
Ok(idents) | |||
} | |||
|
|||
/// Parse identifiers of form ident1[.identN]* | |||
pub fn parse_multipart_identifier(&mut self) -> Result<Vec<Ident>, ParserError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon some more reflection, I think we should accept this code and update the documentation strings on parse_multipart_identifier
and parse_identifiers
to explain in more detail what they do and how they are different.
I can try and find time to update the documentation maybe next week -- or @Jefffrey do you have time to do so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -4705,6 +4705,92 @@ impl<'a> Parser<'a> { | |||
Ok(idents) | |||
} | |||
|
|||
/// Parse identifiers of form ident1[.identN]* | |||
/// | |||
/// Similar in functionality to [parse_identifiers], with difference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. Thank you
* Support identifiers beginning with digits in MySQL (apache#856) * support COPY INTO in snowflake (apache#841) Signed-off-by: Pawel Leszczynski <[email protected]> * Add `dialect_from_str` and improve `Dialect` documentation (apache#848) * Add `dialect_from_str` and improve `Dialect` documentation * cleanup * fix compilation with nostd * Support multiple-table DELETE syntax (apache#855) * Support `DISTINCT ON (...)` (apache#852) * Support "DISTINCT ON (...)" * a test * fix the merge * Test trailing commas (apache#859) * test: add tests for trailing commas * tweaks * Add support for query source in COPY .. TO statement (apache#858) * Add support for query source in COPY .. TO statement * Fix compile error * Fix logical merge conflict (apache#865) * Fix tiny typo in custom_sql_parser.md (apache#864) * Make Expr::Interval its own struct (apache#872) * Make Expr::Interval its own struct * Add test interval display * Fix cargo fmt * Include license file in published crate (apache#871) * Add support for multiple expressions, order by in aggregations (apache#879) * Add support for multiple expressions, order by in aggregations * Fix formatting errors * Resolve linter errors * Add parse_multipart_identifier function to parser (apache#860) * Add parse_multipart_identifier function to parser * Update doc for parse_multipart_identifier * Fix conflict * feat: Add custom operator (apache#868) * feat: Add custom operator From apache#863 - It doesn't parse anything — I'm not sure how to parse ` SELECT 'a' REGEXP '^[a-d]';` with `REGEXP` as the operator... (but fine for my narrow purpose) - If we need tests, where would I add them? * Update src/ast/operator.rs --------- Co-authored-by: Andrew Lamb <[email protected]> * feat: Support MySQL's `DIV` operator (apache#876) * feat: MySQL's DIV operator * fix: do not use `_` prefix for used variable --------- Co-authored-by: Andrew Lamb <[email protected]> * truncate: table as optional keyword (apache#883) Signed-off-by: Maciej Obuchowski <[email protected]> * feat: add DuckDB dialect (apache#878) * feat: add DuckDB dialect * formatting * fix conflict * support // in GenericDialect * add DucDbDialect to all_dialects * add comment from suggestion Co-authored-by: Andrew Lamb <[email protected]> * fix: support // in GenericDialect --------- Co-authored-by: Andrew Lamb <[email protected]> * Add support for first, last aggregate function parsing (apache#882) * Add order by parsing to functions * Fix doc error * minor changes * Named window frames (apache#881) * after over clause, named window can be parsed with window ... as after having clause * Lint errors are fixed * Support for multiple windows * fix lint errors * simplifications * rename function * Rewrite named window search in functional style * Test added and some minor changes * Minor changes on tests and namings, and semantic check is removed --------- Co-authored-by: Mustafa Akur <[email protected]> Co-authored-by: Mehmet Ozan Kabak <[email protected]> * Fix merge conflict (apache#885) * Update CHANGELOG for `0.34.0` release (apache#884) * chore: Release sqlparser version 0.34.0 * Update criterion requirement from 0.4 to 0.5 in /sqlparser_bench (apache#890) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --------- Signed-off-by: Pawel Leszczynski <[email protected]> Signed-off-by: Maciej Obuchowski <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: AviRaboah <[email protected]> Co-authored-by: pawel.leszczynski <[email protected]> Co-authored-by: Andrew Lamb <[email protected]> Co-authored-by: Aljaž Mur Eržen <[email protected]> Co-authored-by: Armin Primadi <[email protected]> Co-authored-by: Okue <[email protected]> Co-authored-by: Andrew Kane <[email protected]> Co-authored-by: Mustafa Akur <[email protected]> Co-authored-by: Jeffrey <[email protected]> Co-authored-by: Maximilian Roos <[email protected]> Co-authored-by: eitsupi <[email protected]> Co-authored-by: Maciej Obuchowski <[email protected]> Co-authored-by: Berkay Şahin <[email protected]> Co-authored-by: Mustafa Akur <[email protected]> Co-authored-by: Mehmet Ozan Kabak <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Closes #805
To be used for parsing identifiers which when could be quoted, can get quite complex (rather than relying on simple split on dot)
Pulled from DataFusion with one change: allow whitespaces between parts in the identifier, as seems this is supported by Spark SQL and Postgres
So following idents would be parsed with the same result: