Add parse_multipart_identifier function to parser #860

Jefffrey · 2023-04-29T07:05:33Z

Closes #805

To be used for parsing identifiers which when could be quoted, can get quite complex (rather than relying on simple split on dot)

Pulled from DataFusion with one change: allow whitespaces between parts in the identifier, as seems this is supported by Spark SQL and Postgres

So following idents would be parsed with the same result:

catalog.database.table

catalog . database . table

coveralls · 2023-04-29T07:09:39Z

Pull Request Test Coverage Report for Build 4969813312

48 of 54 (88.89%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.009%) to 86.181%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/parser.rs	48	54	88.89%

Totals
Change from base Build 4931911477:	0.009%
Covered Lines:	14344
Relevant Lines:	16644

💛 - Coveralls

ankrgyl

From what I can tell this can only be invoked directly (by calling parse_multipart_identifier()) and not automatically while parsing a query. Is that correct/intended?

Jefffrey · 2023-05-02T11:54:00Z

Yes that's correct @ankrgyl

It was partly inspired by how Spark has a similar method: https://github.com/apache/spark/blob/f8604ad14b24e8c657a0305b4fb8ad7efcb84060/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParseDriver.scala#L65-L70

To allow downstream to parse identifiers by themselves (not necessarily as part of a query)

ankrgyl · 2023-05-02T15:22:16Z

Ah got it. Personally I'd prefer to integrate it into the main parser so that we increase our overall support for Spark SQL and Postgres. @alamb wdyt?

alamb · 2023-05-02T17:24:45Z

Ah got it. Personally I'd prefer to integrate it into the main parser so that we increase our overall support for Spark SQL and Postgres. @alamb wdyt?

Having it integrated makes sense to me in theory. Do you have any suggestion about how we might do this specifically @ankrgyl ?

alamb · 2023-05-10T00:48:33Z

src/parser.rs

@@ -4687,6 +4687,54 @@ impl<'a> Parser<'a> {
        Ok(idents)
    }

+    /// Parse identifiers of form ident1[.identN]*
+    pub fn parse_multipart_identifier(&mut self) -> Result<Vec<Ident>, ParserError> {


Looking at this function more closely, I wonder how if we can't reuse parse_identifiers from above?

Or maybe we could add some more documentation (like docstrings / tests) for this function to make it clearer it is designed to parse strings into identifiers 🤔

I think parse_identifiers above is the odd one out, as it accepts any delimiter between words/idents (except equal sign) which means it would parse test + one + two as a valid sequence of identifiers. This is why I introduced the new parse_multipart_identifier to be more strict about what constitutes a compound/multipart identifier.

Though I did base parse_multipart_identifier primarily on Spark SQL/Postgres syntax, so maybe parse_identifiers is generic to accommodate some other type of syntax

Upon some more reflection, I think we should accept this code and update the documentation strings on parse_multipart_identifier and parse_identifiers to explain in more detail what they do and how they are different.

I can try and find time to update the documentation maybe next week -- or @Jefffrey do you have time to do so?

I've update the doc a bit, let me know any further suggestions

alamb · 2023-05-12T12:48:44Z

src/parser.rs

@@ -4687,6 +4687,54 @@ impl<'a> Parser<'a> {
        Ok(idents)
    }

+    /// Parse identifiers of form ident1[.identN]*
+    pub fn parse_multipart_identifier(&mut self) -> Result<Vec<Ident>, ParserError> {


Upon some more reflection, I think we should accept this code and update the documentation strings on parse_multipart_identifier and parse_identifiers to explain in more detail what they do and how they are different.

I can try and find time to update the documentation maybe next week -- or @Jefffrey do you have time to do so?

alamb

I think this is a great addition -- thanks again @Jefffrey

I'll plan to merge (and release) this next week unless there are any more comments. cc @ankrgyl

alamb · 2023-05-14T11:20:54Z

src/parser.rs

@@ -4705,6 +4705,92 @@ impl<'a> Parser<'a> {
        Ok(idents)
    }

+    /// Parse identifiers of form ident1[.identN]*
+    ///
+    /// Similar in functionality to [parse_identifiers], with difference


This is great. Thank you

* Support identifiers beginning with digits in MySQL (apache#856) * support COPY INTO in snowflake (apache#841) Signed-off-by: Pawel Leszczynski <[email protected]> * Add `dialect_from_str` and improve `Dialect` documentation (apache#848) * Add `dialect_from_str` and improve `Dialect` documentation * cleanup * fix compilation with nostd * Support multiple-table DELETE syntax (apache#855) * Support `DISTINCT ON (...)` (apache#852) * Support "DISTINCT ON (...)" * a test * fix the merge * Test trailing commas (apache#859) * test: add tests for trailing commas * tweaks * Add support for query source in COPY .. TO statement (apache#858) * Add support for query source in COPY .. TO statement * Fix compile error * Fix logical merge conflict (apache#865) * Fix tiny typo in custom_sql_parser.md (apache#864) * Make Expr::Interval its own struct (apache#872) * Make Expr::Interval its own struct * Add test interval display * Fix cargo fmt * Include license file in published crate (apache#871) * Add support for multiple expressions, order by in aggregations (apache#879) * Add support for multiple expressions, order by in aggregations * Fix formatting errors * Resolve linter errors * Add parse_multipart_identifier function to parser (apache#860) * Add parse_multipart_identifier function to parser * Update doc for parse_multipart_identifier * Fix conflict * feat: Add custom operator (apache#868) * feat: Add custom operator From apache#863 - It doesn't parse anything — I'm not sure how to parse ` SELECT 'a' REGEXP '^[a-d]';` with `REGEXP` as the operator... (but fine for my narrow purpose) - If we need tests, where would I add them? * Update src/ast/operator.rs --------- Co-authored-by: Andrew Lamb <[email protected]> * feat: Support MySQL's `DIV` operator (apache#876) * feat: MySQL's DIV operator * fix: do not use `_` prefix for used variable --------- Co-authored-by: Andrew Lamb <[email protected]> * truncate: table as optional keyword (apache#883) Signed-off-by: Maciej Obuchowski <[email protected]> * feat: add DuckDB dialect (apache#878) * feat: add DuckDB dialect * formatting * fix conflict * support // in GenericDialect * add DucDbDialect to all_dialects * add comment from suggestion Co-authored-by: Andrew Lamb <[email protected]> * fix: support // in GenericDialect --------- Co-authored-by: Andrew Lamb <[email protected]> * Add support for first, last aggregate function parsing (apache#882) * Add order by parsing to functions * Fix doc error * minor changes * Named window frames (apache#881) * after over clause, named window can be parsed with window ... as after having clause * Lint errors are fixed * Support for multiple windows * fix lint errors * simplifications * rename function * Rewrite named window search in functional style * Test added and some minor changes * Minor changes on tests and namings, and semantic check is removed --------- Co-authored-by: Mustafa Akur <[email protected]> Co-authored-by: Mehmet Ozan Kabak <[email protected]> * Fix merge conflict (apache#885) * Update CHANGELOG for `0.34.0` release (apache#884) * chore: Release sqlparser version 0.34.0 * Update criterion requirement from 0.4 to 0.5 in /sqlparser_bench (apache#890) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --------- Signed-off-by: Pawel Leszczynski <[email protected]> Signed-off-by: Maciej Obuchowski <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: AviRaboah <[email protected]> Co-authored-by: pawel.leszczynski <[email protected]> Co-authored-by: Andrew Lamb <[email protected]> Co-authored-by: Aljaž Mur Eržen <[email protected]> Co-authored-by: Armin Primadi <[email protected]> Co-authored-by: Okue <[email protected]> Co-authored-by: Andrew Kane <[email protected]> Co-authored-by: Mustafa Akur <[email protected]> Co-authored-by: Jeffrey <[email protected]> Co-authored-by: Maximilian Roos <[email protected]> Co-authored-by: eitsupi <[email protected]> Co-authored-by: Maciej Obuchowski <[email protected]> Co-authored-by: Berkay Şahin <[email protected]> Co-authored-by: Mustafa Akur <[email protected]> Co-authored-by: Mehmet Ozan Kabak <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Add parse_multipart_identifier function to parser

7064b0c

ankrgyl reviewed May 1, 2023

View reviewed changes

alamb reviewed May 10, 2023

View reviewed changes

alamb approved these changes May 12, 2023

View reviewed changes

Jefffrey added 3 commits May 14, 2023 11:15

Update doc for parse_multipart_identifier

95bd30f

Merge branch 'main' into parse_multipart_ident_fn

8c0865d

Fix conflict

eb2cc8b

alamb approved these changes May 14, 2023

View reviewed changes

alamb merged commit 4559d87 into apache:main May 17, 2023

Jefffrey deleted the parse_multipart_ident_fn branch May 17, 2023 20:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parse_multipart_identifier function to parser #860

Add parse_multipart_identifier function to parser #860

Jefffrey commented Apr 29, 2023

coveralls commented Apr 29, 2023 •

edited

Loading

ankrgyl left a comment

Jefffrey commented May 2, 2023

ankrgyl commented May 2, 2023

alamb commented May 2, 2023

alamb May 10, 2023

Jefffrey May 10, 2023

alamb May 12, 2023

Jefffrey May 14, 2023

alamb May 12, 2023

alamb left a comment

alamb May 14, 2023

Add parse_multipart_identifier function to parser #860

Add parse_multipart_identifier function to parser #860

Conversation

Jefffrey commented Apr 29, 2023

coveralls commented Apr 29, 2023 • edited Loading

Pull Request Test Coverage Report for Build 4969813312

💛 - Coveralls

ankrgyl left a comment

Choose a reason for hiding this comment

Jefffrey commented May 2, 2023

ankrgyl commented May 2, 2023

alamb commented May 2, 2023

alamb May 10, 2023

Choose a reason for hiding this comment

Jefffrey May 10, 2023

Choose a reason for hiding this comment

alamb May 12, 2023

Choose a reason for hiding this comment

Jefffrey May 14, 2023

Choose a reason for hiding this comment

alamb May 12, 2023

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

alamb May 14, 2023

Choose a reason for hiding this comment

coveralls commented Apr 29, 2023 •

edited

Loading