Skip to content

Table time travel clause support, add visit_table_factor to Visitor #951

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 22, 2023

Conversation

gruuya
Copy link
Contributor

@gruuya gruuya commented Aug 18, 2023

This PR introduces support for temporal querying in BigQuery and MSSQL for now. It does so by extending the TableFactor::Table variant to record and propagate an optional version specification (atm only a timestamp, though we don't validate them).

Related to apache/datafusion#7292: a visitor method is also derived for TableFactor in order to facilitate walking/parsing of table versions used in a given statement, so that adequate datafusion::datasource::TableProvider implementations can be loaded prior to planning/execution.

src/ast/query.rs Outdated
Comment on lines 853 to 862
impl Display for TableVersion {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self {
TableVersion::Timestamp(timestamp) => {
write!(f, " FOR SYSTEM_TIME AS OF '{timestamp}'")?
}
}
Ok(())
}
}
Copy link
Contributor Author

@gruuya gruuya Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a slight problem—some other systems have completely different syntaxes here but we're missing the info on which particular dialect the table version was parsed for.

For instance, Snowflake uses AT, so if we decided to extend the support to it we wouldn't know which time travel clause to re-construct here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what has happened in the past for such changes is that we would add some new variant of TableVersion that represents At perhaps

pub enum TableVersion {
  AsOf(...)
  At (..)
}
...

Which would be backwards incompatible change, but then basically all changes to the AST are so it seems to be fine

Copy link
Contributor Author

@gruuya gruuya Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I wonder is whether the AsOf variant should be named explicitly as ForSystemTimeAsOf, given that e.g. CockroachDB also has an AS OF clause with a somewhat different syntax. Arguably that dialect is not supported now but if someone decides to implement it it would be a misfortunate conflict.

@alamb alamb changed the title Table time travel clause support Table time travel clause support, add visit_table_factor to Visitor Aug 21, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @gruuya -- this looks very nice. I had some naming suggestions, and to change the type from String to Expr, but otherwise I think this is very nearly ready to go

src/ast/query.rs Outdated
Comment on lines 853 to 862
impl Display for TableVersion {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self {
TableVersion::Timestamp(timestamp) => {
write!(f, " FOR SYSTEM_TIME AS OF '{timestamp}'")?
}
}
Ok(())
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what has happened in the past for such changes is that we would add some new variant of TableVersion that represents At perhaps

pub enum TableVersion {
  AsOf(...)
  At (..)
}
...

Which would be backwards incompatible change, but then basically all changes to the AST are so it seems to be fine

src/ast/query.rs Outdated
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))]
pub enum TableVersion {
Timestamp(String),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend you make this an Expr so that it can parse expressions like

FOR SYSTEM_TIME AS OF now() - '5 minutes'

That also seems to be supported by
https://cloud.google.com/bigquery/docs/access-historical-data

  FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR);

Perhaps something like

Suggested change
Timestamp(String),
AsOf(Expr),

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at MsSQL and it seems to allow several different phrases like BETWEEN and CONTAINED

https://learn.microsoft.com/en-us/sql/relational-databases/tables/querying-data-in-a-system-versioned-temporal-table?view=sql-server-ver16

I don't think we need to add support for any of this syntax now in this PR, but your choice of enum to represent this structure I think is prescient

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, will revise it like that.

Copy link
Contributor Author

@gruuya gruuya Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've renamed AsOf to ForSystemTimeAsOf, and made it wrap a generic Expr instead of a String. Arguably this makes the clause more versatile then is supported by the actual external system, but that's probably for the better.

@@ -661,6 +662,9 @@ pub enum TableFactor {
args: Option<Vec<FunctionArg>>,
/// MSSQL-specific `WITH (...)` hints such as NOLOCK.
with_hints: Vec<Expr>,
/// Optional version qualifier to facilitate table time-travel, as
/// supported by BigQuery and MSSQL.
version: Option<TableVersion>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given it is so specific maybe this could be called system_time: Option<SystemTime> or something? I don't feel strongly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, I imagine this field would evolve into something capable of capturing other qualifiers beyond time, e.g. something like explicit version numbers as supported by delta-rs. To that end, I'd vote to keep it more generic than system_time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I could rename it to temporal_qualifier: Option<TemporalQualifier>,? Feels more SQL-y, and at the same time sufficiently generic such that it could also store explicit version number references (besides timestamps and intervals).

@coveralls
Copy link

Pull Request Test Coverage Report for Build 5903878906

  • 100 of 119 (84.03%) changed or added relevant lines in 13 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-0.02%) to 87.329%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/ast/query.rs 7 8 87.5%
src/parser/mod.rs 10 12 83.33%
src/ast/visitor.rs 0 16 0.0%
Files with Coverage Reduction New Missed Lines %
src/ast/query.rs 1 85.5%
Totals Coverage Status
Change from base Build 5893081161: -0.02%
Covered Lines: 16183
Relevant Lines: 18531

💛 - Coveralls

@alamb
Copy link
Contributor

alamb commented Aug 21, 2023

It seems like the docs CI failure is real -- specifically https://github.com/sqlparser-rs/sqlparser-rs/actions/runs/5903878906/job/16077037912?pr=951

It is pointing out the lack of a visit_table_factors function, in the same way there is a visit_statements function. Did you mean to add visit_table_factors too?

Also rename the current enum variant to ForSystemTimeAsOf to make it more explicit.
@gruuya
Copy link
Contributor Author

gruuya commented Aug 22, 2023

Did you mean to add visit_table_factors too?

No, removed that line from the docs now, thanks.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @gruuya

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants