Skip to content

Support DuckDB struct syntax and support list of struct syntax #1372

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Aug 15, 2024
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,5 @@ Cargo.lock
.vscode

*.swp

.DS_store
17 changes: 17 additions & 0 deletions src/parser/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2231,6 +2231,18 @@ impl<'a> Parser<'a> {
))
}

fn parse_duckdb_struct_type_def(&mut self) -> Result<Vec<StructField>, ParserError> {
self.expect_keyword(Keyword::STRUCT)?;
self.expect_token(&Token::LParen)?;
let field_defs = self
.parse_comma_separated(Self::parse_struct_field_def)?
.into_iter()
.map(|(f, _)| f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh this looks like a bug for us to silently drop the returned flag since there might be an extraneous >, seems like we would instead do something like this

let field_defs = self.parse_comma_separated(|self| {
    Ok(StructField {
        field_name: Some(self.parse_identifier(false)?), // I assume field name isn't optional for duckdb?
        field_type: self.parse_data_type(),
    })

.collect();
self.expect_token(&Token::RParen)?;
Ok(field_defs)
}

/// Parse a field definition in a [struct] or [tuple].
/// Syntax:
///
Expand Down Expand Up @@ -7229,6 +7241,11 @@ impl<'a> Parser<'a> {
))))
}
}
Keyword::STRUCT if dialect_of!(self is DuckDbDialect) => {
self.prev_token();
let field_defs = self.parse_duckdb_struct_type_def()?;
Ok(DataType::Struct(field_defs))
}
Keyword::STRUCT if dialect_of!(self is BigQueryDialect | GenericDialect) => {
self.prev_token();
let (field_defs, _trailing_bracket) =
Expand Down
10 changes: 10 additions & 0 deletions tests/sqlparser_bigquery.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,16 @@ use sqlparser::dialect::{BigQueryDialect, GenericDialect};
use sqlparser::parser::{ParserError, ParserOptions};
use test_utils::*;

#[test]
fn test_struct() {
// nested struct
let canonical = r#"CREATE TABLE t1 (s STRUCT<v VARCHAR, s STRUCT<a1 INTEGER, a2 VARCHAR>>[])"#;
let sql = r#"CREATE TABLE t1 (s STRUCT<v VARCHAR, s STRUCT<a1 INTEGER, a2 VARCHAR>>[])"#;
let select = bigquery().parse_sql_statements(sql).unwrap().pop().unwrap();
// TODO: '>>' is incorrect parsed in bigquery syntax
assert_ne!(select.to_string(), canonical);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I follow if/why this test is needed, since bigquery doesn't support [] syntax for arrays?


#[test]
fn parse_literal_string() {
let sql = concat!(
Expand Down
33 changes: 33 additions & 0 deletions tests/sqlparser_duckdb.rs
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,39 @@ fn duckdb_and_generic() -> TestedDialects {
}
}

#[test]
fn test_struct() {
// basic struct
let canonical = r#"CREATE TABLE t1 (s STRUCT<v VARCHAR, i INTEGER>)"#;
let sql = r#"CREATE TABLE t1 (s STRUCT(v VARCHAR, i INTEGER))"#;
Copy link
Contributor

@iffyio iffyio Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I think this is incorrect, no need for the input and output to not match, in this case the output is significantly different syntax from what went into the parser. We can probably add a parens_delimeter: bool or similar flag to the StructField struct so that it knows how to display accordingly?

Copy link
Contributor Author

@jayzhan211 jayzhan211 Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me. Do you mean adding flag so we have different canonical name for struct<> and struct()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah that's what I meant - my bad realised I wrote comma_delimeter but meant to write parens_delimeter instead. Yeah so like if that flag is true display uses struct (...) otherwise struct<...>

let select = duckdb().parse_sql_statements(sql).unwrap().pop().unwrap();
assert_eq!(select.to_string(), canonical);

// struct array
let canonical = r#"CREATE TABLE t1 (s STRUCT<v VARCHAR, i INTEGER>[])"#;
let sql = r#"CREATE TABLE t1 (s STRUCT(v VARCHAR, i INTEGER)[])"#;
let select = duckdb().parse_sql_statements(sql).unwrap().pop().unwrap();
assert_eq!(select.to_string(), canonical);

// nested struct
let canonical = r#"CREATE TABLE t1 (s STRUCT<v VARCHAR, s STRUCT<a1 INTEGER, a2 VARCHAR>>[])"#;
let sql = r#"CREATE TABLE t1 (s STRUCT(v VARCHAR, s STRUCT(a1 INTEGER, a2 VARCHAR))[])"#;
let select = duckdb().parse_sql_statements(sql).unwrap().pop().unwrap();
assert_eq!(select.to_string(), canonical);

// failing test
let sql_list = vec![
r#"CREATE TABLE t1 (s STRUCT(v VARCHAR, i INTEGER)))"#,
r#"CREATE TABLE t1 (s STRUCT(v VARCHAR, i INTEGER>)"#,
r#"CREATE TABLE t1 (s STRUCT<v VARCHAR, i INTEGER>)"#,
r#"CREATE TABLE t1 (s STRUCT v VARCHAR, i INTEGER )"#,
];

for sql in sql_list {
duckdb().parse_sql_statements(sql).unwrap_err();
}
}

#[test]
fn test_select_wildcard_with_exclude() {
let select = duckdb().verified_only_select("SELECT * EXCLUDE (col_a) FROM data");
Expand Down
Loading