Skip to content

Support BigQuery window function null treatment #1239

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 30, 2024

Conversation

iffyio
Copy link
Contributor

@iffyio iffyio commented Apr 27, 2024

Syntax differs for BigQuery on positioning of the
IGNORE|RESPECT NULL clause within a window function. This extends the parser to cover that syntax.

https://cloud.google.com/bigquery/docs/reference/standard-sql/navigation_functions#first_value

@coveralls
Copy link

coveralls commented Apr 27, 2024

Pull Request Test Coverage Report for Build 8900668299

Details

  • 78 of 94 (82.98%) changed or added relevant lines in 6 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-0.02%) to 89.186%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/dialect/bigquery.rs 1 2 50.0%
src/dialect/generic.rs 1 2 50.0%
src/dialect/mod.rs 1 2 50.0%
tests/sqlparser_common.rs 27 30 90.0%
src/parser/mod.rs 42 46 91.3%
src/ast/mod.rs 6 12 50.0%
Files with Coverage Reduction New Missed Lines %
src/parser/mod.rs 1 92.82%
Totals Coverage Status
Change from base Build 8899623985: -0.02%
Covered Lines: 23901
Relevant Lines: 26799

💛 - Coveralls

@iffyio iffyio force-pushed the bq-null-treatment branch from 30627b0 to e4eeee9 Compare April 27, 2024 07:23
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reading of the referenced docs don't seem to permit syntax like FIRST_VALUE(x) IGNORE NULLS OVER

Are we sure this is supported by BigQuery?

FIRST_VALUE (value_expression [{RESPECT | IGNORE} NULLS])
OVER over_clause

over_clause:
  { named_window | ( [ window_specification ] ) }

window_specification:
  [ named_window ]
  [ PARTITION BY partition_expression [, ...] ]
  ORDER BY expression [ { ASC | DESC }  ] [, ...]
  [ window_frame_clause ]

@iffyio
Copy link
Contributor Author

iffyio commented Apr 30, 2024

Are we sure this is supported by BigQuery?

@alamb yeah so bigquery doesn't support FIRST_VALUE(x) IGNORE NULLS OVER, that one we have support for in the parser today - e.g snowflake uses that variant. This one adds support for the variant linked in the doc, so FIRST_VALUE(x IGNORE NULLS) OVER with similar example. Does that clarify the difference in behavior?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this is supported by BigQuery?

@alamb yeah so bigquery doesn't support FIRST_VALUE(x) IGNORE NULLS OVER, that one we have support for in the parser today - e.g snowflake uses that variant. This one adds support for the variant linked in the doc, so FIRST_VALUE(x IGNORE NULLS) OVER with similar example. Does that clarify the difference in behavior?

Ah, I see now. BigQuery / Snowflake, why!!!! LOL

Overall this looks good to me -- I have two small suggestions but we could do them as a follow on PR too if you prefer.

Thank you @iffyio

src/ast/mod.rs Outdated
/// The declaration occurs after the function call.
///
/// ```sql
/// FIRST_VALUE(x IGNORE NULLS) OVER ()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a copy paste error?

Suggested change
/// FIRST_VALUE(x IGNORE NULLS) OVER ()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

pub over: Option<WindowType>,
// aggregate functions may specify eg `COUNT(DISTINCT x)`
/// aggregate functions may specify eg `COUNT(DISTINCT x)`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 for the driveby cleanup

src/ast/mod.rs Outdated
Comment on lines 4869 to 4880
match self.null_treatment {
Some(NullTreatmentType::FunctionArg(null_treatment)) => {
format!(" {null_treatment}")
}
_ => "".to_string(),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can avoid this seeming allocation in "".to_string() by using Cow:

Suggested change
match self.null_treatment {
Some(NullTreatmentType::FunctionArg(null_treatment)) => {
format!(" {null_treatment}")
}
_ => "".to_string(),
}
match self.null_treatment {
Some(NullTreatmentType::FunctionArg(null_treatment)) => {
Cow::Owned(format!(" {null_treatment}"))
}
_ => Cow::Borrowed("")
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Syntax differs for BigQuery on positioning of the
`IGNORE|RESPECT NULL` clause within a window function.
This extends the parser to cover that syntax.

https://cloud.google.com/bigquery/docs/reference/standard-sql/navigation_functions#first_value
@iffyio iffyio force-pushed the bq-null-treatment branch from e4eeee9 to cddb87d Compare April 30, 2024 19:42
@alamb alamb merged commit fb20f8c into apache:main Apr 30, 2024
10 checks passed
@alamb
Copy link
Contributor

alamb commented Apr 30, 2024

Thanks @iffyio

@jmhain
Copy link
Contributor

jmhain commented Apr 30, 2024

Sorry for posting this comment post-merge, but I was looking at fixing up our support for aggs that use the WITHIN GROUP syntax, and I realized that Function is going to get a bit unwieldy with additional options over time. I'd like to actually split off a separate FunctionArgList type to encapsulate a lot of these, and that would suggest another way of representing the null treatment inside-vs-outside parens issue this PR addressed:

// options inside the parens
struct FunctionArgList {
    distinct: boolean,
    args: Vec<FunctionArg>,
    limit: ... // for array_agg
    on_overflow: ... // for list_agg
    null_treatment: Option<NullTreatment>,
}

struct Function {
    args: FunctionArgList,

    // options outside the parens
    null_treatment: Option<NullTreatment>,
    within_group: ...
    over: ...
    filter: ...
}

@alamb @iffyio wdyt? If we do this, we should do it before the next release to avoid API churn...

@iffyio
Copy link
Contributor Author

iffyio commented May 1, 2024

That sounds reasonable to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants