-
Notifications
You must be signed in to change notification settings - Fork 605
Table time travel clause support, add visit_table_factor
to Visitor
#951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/ast/query.rs
Outdated
impl Display for TableVersion { | ||
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { | ||
match self { | ||
TableVersion::Timestamp(timestamp) => { | ||
write!(f, " FOR SYSTEM_TIME AS OF '{timestamp}'")? | ||
} | ||
} | ||
Ok(()) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be a slight problem—some other systems have completely different syntaxes here but we're missing the info on which particular dialect the table version was parsed for.
For instance, Snowflake uses AT
, so if we decided to extend the support to it we wouldn't know which time travel clause to re-construct here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what has happened in the past for such changes is that we would add some new variant of TableVersion
that represents At
perhaps
pub enum TableVersion {
AsOf(...)
At (..)
}
...
Which would be backwards incompatible change, but then basically all changes to the AST are so it seems to be fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I wonder is whether the AsOf
variant should be named explicitly as ForSystemTimeAsOf
, given that e.g. CockroachDB also has an AS OF
clause with a somewhat different syntax. Arguably that dialect is not supported now but if someone decides to implement it it would be a misfortunate conflict.
visit_table_factor
to Visitor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @gruuya -- this looks very nice. I had some naming suggestions, and to change the type from String to Expr, but otherwise I think this is very nearly ready to go
src/ast/query.rs
Outdated
impl Display for TableVersion { | ||
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { | ||
match self { | ||
TableVersion::Timestamp(timestamp) => { | ||
write!(f, " FOR SYSTEM_TIME AS OF '{timestamp}'")? | ||
} | ||
} | ||
Ok(()) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what has happened in the past for such changes is that we would add some new variant of TableVersion
that represents At
perhaps
pub enum TableVersion {
AsOf(...)
At (..)
}
...
Which would be backwards incompatible change, but then basically all changes to the AST are so it seems to be fine
src/ast/query.rs
Outdated
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))] | ||
#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))] | ||
pub enum TableVersion { | ||
Timestamp(String), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend you make this an Expr
so that it can parse expressions like
FOR SYSTEM_TIME AS OF now() - '5 minutes'
That also seems to be supported by
https://cloud.google.com/bigquery/docs/access-historical-data
FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR);
Perhaps something like
Timestamp(String), | |
AsOf(Expr), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at MsSQL and it seems to allow several different phrases like BETWEEN
and CONTAINED
I don't think we need to add support for any of this syntax now in this PR, but your choice of enum
to represent this structure I think is prescient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, will revise it like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've renamed AsOf
to ForSystemTimeAsOf
, and made it wrap a generic Expr
instead of a String
. Arguably this makes the clause more versatile then is supported by the actual external system, but that's probably for the better.
@@ -661,6 +662,9 @@ pub enum TableFactor { | |||
args: Option<Vec<FunctionArg>>, | |||
/// MSSQL-specific `WITH (...)` hints such as NOLOCK. | |||
with_hints: Vec<Expr>, | |||
/// Optional version qualifier to facilitate table time-travel, as | |||
/// supported by BigQuery and MSSQL. | |||
version: Option<TableVersion>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given it is so specific maybe this could be called system_time: Option<SystemTime>
or something? I don't feel strongly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, I imagine this field would evolve into something capable of capturing other qualifiers beyond time, e.g. something like explicit version numbers as supported by delta-rs. To that end, I'd vote to keep it more generic than system_time
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I could rename it to temporal_qualifier: Option<TemporalQualifier>,
? Feels more SQL-y, and at the same time sufficiently generic such that it could also store explicit version number references (besides timestamps and intervals).
Pull Request Test Coverage Report for Build 5903878906
💛 - Coveralls |
It seems like the docs CI failure is real -- specifically https://github.com/sqlparser-rs/sqlparser-rs/actions/runs/5903878906/job/16077037912?pr=951 It is pointing out the lack of a |
Also rename the current enum variant to ForSystemTimeAsOf to make it more explicit.
No, removed that line from the docs now, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @gruuya
This PR introduces support for temporal querying in BigQuery and MSSQL for now. It does so by extending the
TableFactor::Table
variant to record and propagate an optional version specification (atm only a timestamp, though we don't validate them).Related to apache/datafusion#7292: a visitor method is also derived for
TableFactor
in order to facilitate walking/parsing of table versions used in a given statement, so that adequatedatafusion::datasource::TableProvider
implementations can be loaded prior to planning/execution.