Skip to content

BigQuery: Fix column identifier reserved keywords list #1678

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 29, 2025

Conversation

iffyio
Copy link
Contributor

@iffyio iffyio commented Jan 22, 2025

Parser currently uses the same RESERVED_FOR_COLUMN_ALIAS keywords list to avoid lookahead when parsing column identifiers. This assumed that all listed keywords were reservered across all dialects which isn't the case.
So that the following valid BigQuery statement previously failed due to OFFSET being flagged as a keyword.

SELECT 1, OFFSET FROM T

This updates the parser to support dialect specific RESERVED_FOR_COLUMN_ALIAS list

@iffyio iffyio force-pushed the bigquery-reserved-columns branch from bff25db to a4d445b Compare January 22, 2025 17:31
/// for the dialect.
/// For example given: `SELECT <col> AS <alias> FROM T`. The returned
/// keywords are not allowed in `<col>` and `<alias>`
fn get_reserved_keywords_for_column_identifier(&self) -> &'static [Keyword] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically, a dialect may have a different list of reserved keywords for column identifiers and column aliases, and in some cases, the parser will need to look ahead to make that decision accurately.

Perhaps we should adopt a unified approach to this check along the lines of Dialect::is_select_item_alias?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this would be nice -- perhaps we can do so in a follow on PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this sounds good! I'll take a look at it tomorrow and I'll ping to take another look at the PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this now to be similar to is_select_item_alias, taking in a parser, PTAL!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @iffyio and @yoavcloud -- this looks good to me.

@@ -213,6 +213,15 @@ fn parse_raw_literal() {
);
}

#[test]
fn parse_big_query_non_reserved_column_alias() {
let sql = r#"SELECT OFFSET, EXPLAIN, ANALYZE, SORT, TOP, VIEW FROM T"#;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤯

/// for the dialect.
/// For example given: `SELECT <col> AS <alias> FROM T`. The returned
/// keywords are not allowed in `<col>` and `<alias>`
fn get_reserved_keywords_for_column_identifier(&self) -> &'static [Keyword] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this would be nice -- perhaps we can do so in a follow on PR

Parser currently uses the same `RESERVED_FOR_COLUMN_ALIAS` keywords
list to avoid lookahead when parsing column identifiers. This
assumed that all listed keywords were reservered across all dialects
which isn't the case.
So that the following valid BigQuery statement previously failed
due to `OFFSET` being flagged as a keyword.
```sql
SELECT 1, OFFSET FROM T
```
This updates the parser to support dialect specific
`RESERVED_FOR_COLUMN_ALIAS` list
@iffyio iffyio force-pushed the bigquery-reserved-columns branch from a4d445b to 2b00afd Compare January 28, 2025 09:15
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👨‍🍳 👌

@alamb alamb merged commit 7980c86 into apache:main Jan 29, 2025
9 checks passed
Vedin pushed a commit to Embucket/datafusion-sqlparser-rs that referenced this pull request Feb 3, 2025
Vedin pushed a commit to Embucket/datafusion-sqlparser-rs that referenced this pull request Feb 3, 2025
Vedin added a commit to Embucket/datafusion-sqlparser-rs that referenced this pull request Feb 3, 2025
ayman-sigma pushed a commit to sigmacomputing/sqlparser-rs that referenced this pull request Apr 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants