Skip to content

Add parse_multipart_identifier function to parser #860

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 17, 2023

Conversation

Jefffrey
Copy link
Contributor

Closes #805

To be used for parsing identifiers which when could be quoted, can get quite complex (rather than relying on simple split on dot)

Pulled from DataFusion with one change: allow whitespaces between parts in the identifier, as seems this is supported by Spark SQL and Postgres

So following idents would be parsed with the same result:

catalog.database.table

catalog . database . table

@coveralls
Copy link

coveralls commented Apr 29, 2023

Pull Request Test Coverage Report for Build 4969813312

  • 48 of 54 (88.89%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.009%) to 86.181%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/parser.rs 48 54 88.89%
Totals Coverage Status
Change from base Build 4931911477: 0.009%
Covered Lines: 14344
Relevant Lines: 16644

💛 - Coveralls

Copy link
Contributor

@ankrgyl ankrgyl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell this can only be invoked directly (by calling parse_multipart_identifier()) and not automatically while parsing a query. Is that correct/intended?

@Jefffrey
Copy link
Contributor Author

Jefffrey commented May 2, 2023

Yes that's correct @ankrgyl

It was partly inspired by how Spark has a similar method: https://github.com/apache/spark/blob/f8604ad14b24e8c657a0305b4fb8ad7efcb84060/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParseDriver.scala#L65-L70

To allow downstream to parse identifiers by themselves (not necessarily as part of a query)

@ankrgyl
Copy link
Contributor

ankrgyl commented May 2, 2023

Ah got it. Personally I'd prefer to integrate it into the main parser so that we increase our overall support for Spark SQL and Postgres. @alamb wdyt?

@alamb
Copy link
Contributor

alamb commented May 2, 2023

Ah got it. Personally I'd prefer to integrate it into the main parser so that we increase our overall support for Spark SQL and Postgres. @alamb wdyt?

Having it integrated makes sense to me in theory. Do you have any suggestion about how we might do this specifically @ankrgyl ?

@@ -4687,6 +4687,54 @@ impl<'a> Parser<'a> {
Ok(idents)
}

/// Parse identifiers of form ident1[.identN]*
pub fn parse_multipart_identifier(&mut self) -> Result<Vec<Ident>, ParserError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this function more closely, I wonder how if we can't reuse parse_identifiers from above?

Or maybe we could add some more documentation (like docstrings / tests) for this function to make it clearer it is designed to parse strings into identifiers 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think parse_identifiers above is the odd one out, as it accepts any delimiter between words/idents (except equal sign) which means it would parse test + one + two as a valid sequence of identifiers. This is why I introduced the new parse_multipart_identifier to be more strict about what constitutes a compound/multipart identifier.

Though I did base parse_multipart_identifier primarily on Spark SQL/Postgres syntax, so maybe parse_identifiers is generic to accommodate some other type of syntax

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon some more reflection, I think we should accept this code and update the documentation strings on parse_multipart_identifier and parse_identifiers to explain in more detail what they do and how they are different.

I can try and find time to update the documentation maybe next week -- or @Jefffrey do you have time to do so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've update the doc a bit, let me know any further suggestions

@@ -4687,6 +4687,54 @@ impl<'a> Parser<'a> {
Ok(idents)
}

/// Parse identifiers of form ident1[.identN]*
pub fn parse_multipart_identifier(&mut self) -> Result<Vec<Ident>, ParserError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon some more reflection, I think we should accept this code and update the documentation strings on parse_multipart_identifier and parse_identifiers to explain in more detail what they do and how they are different.

I can try and find time to update the documentation maybe next week -- or @Jefffrey do you have time to do so?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great addition -- thanks again @Jefffrey

I'll plan to merge (and release) this next week unless there are any more comments. cc @ankrgyl

@@ -4705,6 +4705,92 @@ impl<'a> Parser<'a> {
Ok(idents)
}

/// Parse identifiers of form ident1[.identN]*
///
/// Similar in functionality to [parse_identifiers], with difference
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Thank you

@alamb alamb merged commit 4559d87 into apache:main May 17, 2023
@Jefffrey Jefffrey deleted the parse_multipart_ident_fn branch May 17, 2023 20:32
serprex pushed a commit to serprex/sqlparser-rs that referenced this pull request Nov 6, 2023
* Support identifiers beginning with digits in MySQL (apache#856)

* support COPY INTO in snowflake (apache#841)

Signed-off-by: Pawel Leszczynski <[email protected]>

* Add `dialect_from_str` and improve `Dialect` documentation (apache#848)

* Add `dialect_from_str` and improve `Dialect` documentation

* cleanup

* fix compilation with nostd

* Support multiple-table DELETE syntax (apache#855)

* Support `DISTINCT ON (...)` (apache#852)

* Support "DISTINCT ON (...)"

* a test

* fix the merge

* Test trailing commas (apache#859)

* test: add tests for trailing commas

* tweaks

* Add support for query source in COPY .. TO statement (apache#858)

* Add support for query source in COPY .. TO statement

* Fix compile error

* Fix logical merge conflict (apache#865)

* Fix tiny typo in custom_sql_parser.md (apache#864)

* Make Expr::Interval its own struct (apache#872)

* Make Expr::Interval its own struct

* Add test interval display

* Fix cargo fmt

* Include license file in published crate (apache#871)

* Add support for multiple expressions, order by in aggregations (apache#879)

* Add support for multiple expressions, order by in aggregations

* Fix formatting errors

* Resolve linter errors

* Add parse_multipart_identifier function to parser (apache#860)

* Add parse_multipart_identifier function to parser

* Update doc for parse_multipart_identifier

* Fix conflict

* feat: Add custom operator (apache#868)

* feat: Add custom operator

From apache#863

- It doesn't parse anything — I'm not sure how to parse ` SELECT 'a' REGEXP '^[a-d]';` with `REGEXP` as the operator... (but fine for my narrow purpose)
- If we need tests, where would I add them?

* Update src/ast/operator.rs

---------

Co-authored-by: Andrew Lamb <[email protected]>

* feat: Support MySQL's `DIV` operator (apache#876)

* feat: MySQL's DIV operator

* fix: do not use `_` prefix for used variable

---------

Co-authored-by: Andrew Lamb <[email protected]>

* truncate: table as optional keyword (apache#883)

Signed-off-by: Maciej Obuchowski <[email protected]>

* feat: add DuckDB dialect (apache#878)

* feat: add DuckDB dialect

* formatting

* fix conflict

* support // in GenericDialect

* add DucDbDialect to all_dialects

* add comment from suggestion

Co-authored-by: Andrew Lamb <[email protected]>

* fix: support // in GenericDialect

---------

Co-authored-by: Andrew Lamb <[email protected]>

* Add support for first, last aggregate function parsing (apache#882)

* Add order by parsing to functions

* Fix doc error

* minor changes

* Named window frames (apache#881)

* after over clause, named window can be parsed with window ... as after having clause

* Lint errors are fixed

* Support for multiple windows

* fix lint errors

* simplifications

* rename function

* Rewrite named window search in functional style

* Test added and some minor changes

* Minor changes on tests and namings, and semantic check is removed

---------

Co-authored-by: Mustafa Akur <[email protected]>
Co-authored-by: Mehmet Ozan Kabak <[email protected]>

* Fix merge conflict (apache#885)

* Update CHANGELOG for `0.34.0` release (apache#884)

* chore: Release sqlparser version 0.34.0

* Update criterion requirement from 0.4 to 0.5 in /sqlparser_bench (apache#890)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

---------

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Maciej Obuchowski <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: AviRaboah <[email protected]>
Co-authored-by: pawel.leszczynski <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Aljaž Mur Eržen <[email protected]>
Co-authored-by: Armin Primadi <[email protected]>
Co-authored-by: Okue <[email protected]>
Co-authored-by: Andrew Kane <[email protected]>
Co-authored-by: Mustafa Akur <[email protected]>
Co-authored-by: Jeffrey <[email protected]>
Co-authored-by: Maximilian Roos <[email protected]>
Co-authored-by: eitsupi <[email protected]>
Co-authored-by: Maciej Obuchowski <[email protected]>
Co-authored-by: Berkay Şahin <[email protected]>
Co-authored-by: Mustafa Akur <[email protected]>
Co-authored-by: Mehmet Ozan Kabak <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parse identifiers function
4 participants