Skip to content

Add support for generated virtual columns with expression #1051

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Nov 22, 2023

Conversation

takluyver
Copy link
Contributor

This allows the SQLite & MySQL syntax for generated virtual columns, addressing the second part of #1050.

-- These two statements are equivalent:
CREATE TABLE t1 (a INT, b INT GENERATED ALWAYS AS (a * 2) VIRTUAL);
CREATE TABLE t1 (a INT, b INT GENERATED ALWAYS AS (a * 2));

I've done a partial survey of what different databases support for generated columns:

  • SQLite: generate columns from an expression, either VIRTUAL (evaluated on read) or STORED (on write), but must ALWAYS be generated - docs, syntax reference
  • MySQL: same options as SQLite - docs
  • DuckDB: only VIRTUAL supported, as of v0.9.2 - docs.
  • Postgres: columns generated from an expression can only be STORED & ALWAYS (as of v16, the current version), but there are additional options to generate a column from a sequence (AS IDENTITY) - docs, syntax reference
  • MS-SQL has GENERATED AS syntax, but it doesn't seem to overlap with the other DBs above at all - it's something to do with versioning records, with no space for arbitrary expressions - docs. I haven't tried to cover this in this PR.

It looks like the generated-column code before this PR is based on Postgres.

I've expanded the meaning of GeneratedAs::ExpStored, which the docs previously described as Postgres specific, to any column generated from an expressions and stored. GeneratedAs::Always with generation_expr present now represents a VIRTUAL column in SQLite or MySQL. It might be neater to make stored/virtual separate from always/by-default, but that would be an API break.

@tobyhede
Copy link
Contributor

LGTM.
Is there a test for the STORED option?
And it might be worth adding tests for the MySQL dialect. Makes intention explicit and easier to track regressions.

@takluyver
Copy link
Contributor Author

There's one test covering the same GENERATED AS expr STORED syntax in Postgres:

https://github.com/sqlparser-rs/sqlparser-rs/blob/c887c4e54558db3d3940c23300a384b1752cdeaf/tests/sqlparser_postgres.rs#L208-L218

But I can add this for SQLite & MySQL too. 👍

If SQLite & MySQL have the same syntax for this, do you prefer me to copy & paste the test into their respective modules, or set up one test to run with both of them? I'll copy-paste for now.

@coveralls
Copy link

coveralls commented Nov 20, 2023

Pull Request Test Coverage Report for Build 6961152498

Warning: This coverage report may be inaccurate.

We've detected an issue with your CI configuration that might affect the accuracy of this pull request's coverage report.
To ensure accuracy in future PRs, please see these guidelines.
A quick fix for this PR: rebase it; your next report should be accurate.

  • 38 of 39 (97.44%) changed or added relevant lines in 4 files are covered.
  • 1257 unchanged lines in 8 files lost coverage.
  • Overall coverage increased (+0.04%) to 87.753%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/ast/ddl.rs 11 12 91.67%
Files with Coverage Reduction New Missed Lines %
src/dialect/mod.rs 1 90.41%
src/dialect/mssql.rs 2 86.67%
src/dialect/redshift.rs 2 78.57%
tests/sqlparser_mssql.rs 3 97.39%
tests/sqlparser_postgres.rs 26 98.06%
tests/sqlparser_common.rs 123 96.97%
src/ast/mod.rs 345 79.04%
src/parser/mod.rs 755 83.2%
Totals Coverage Status
Change from base Build 6914235970: 0.04%
Covered Lines: 17992
Relevant Lines: 20503

💛 - Coveralls

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution @takluyver (and the review @tobyhede ). I left some comments -- let me know what you think

@@ -435,7 +446,6 @@ fn sqlite_with_options(options: ParserOptions) -> TestedDialects {

fn sqlite_and_generic() -> TestedDialects {
TestedDialects {
// we don't have a separate SQLite dialect, so test only the generic dialect for now
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 nice drive by cleanup

#[test]
fn parse_create_table_gencol() {
let sql_default = "CREATE TABLE t1 (a INT, b INT GENERATED ALWAYS AS (a * 2))";
let sql_virt = "CREATE TABLE t1 (a INT, b INT GENERATED ALWAYS AS (a * 2) VIRTUAL)";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding additional support to the AST so you can recover the original SQL --- as implemented now this change will not permit people to know when the original query had VIRTUAL or not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Virtual is the default, so adding the marker doesn't make a logical difference (at least in the databases I found that support this). Elsewhere in the tests, it looks like you don't preserve differences like this - e.g. SHOW COLUMNS vs SHOW FIELDS in the MySQL tests. But if you'd like to preserve this one, I'm happy to do that. 🙂

If so, I would probably introduce an enum called something like GenExpressionQualifier (though that's an ugly name...), which could be None, Virtual or Stored. The Stored option would then be redundant with GeneratedAs::ExpStored.

(For my next PR, I'm planning to make the GENERATED ALWAYS keywords optional - as they are in MySQL and SQLite. Let me know if you want the presence or absence of those to be preserved too.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere in the tests, it looks like you don't preserve differences like this - e.g. SHOW COLUMNS vs SHOW FIELDS in the MySQL tests. But if you'd like to preserve this one, I'm happy to do that. 🙂

Yes, it is true that this crate is not always consistent. However, for newly added code let's try and preserve the ability to round trip the AST (though I see there is no note describing that detail on https://github.com/sqlparser-rs/sqlparser-rs -- I will try and add a note when I have time)

Let me know if you want the presence or absence of those to be preserved too.)

yes, please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, the latest commit (6d69a43) preserves the distinction between VIRTUAL, STORED and no qualifier. As I mentioned, that does mean there are two fields which both record if STORED was used. That's not so tidy, but it preserves compatibility for any code that's checking for GeneratedAs::ExpStored.

I've called the new field & enum generation_expr_mode & GeneratedExpressionMode, which are both quite a mouthful. Let me know if you have a better name for this. 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filed #1052 to document this property

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @takluyver -- looks good to me

#[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))]
pub enum GeneratedExpressionMode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 looks good to me -- I don't have a better name for this

@takluyver
Copy link
Contributor Author

Ooops, I left some debugging bits in the tests. I've removed them now - do you prefer me to squash that into the previous commit and force-push the branch?

@alamb
Copy link
Contributor

alamb commented Nov 22, 2023

Ooops, I left some debugging bits in the tests. I've removed them now - do you prefer me to squash that into the previous commit and force-push the branch?

On merge all the commits will be squashed into a single commit, so I would prefer that you just push new commits to your branch / PR as it makes reviewing the deltas easier

@alamb alamb merged commit 640b939 into apache:main Nov 22, 2023
@alamb
Copy link
Contributor

alamb commented Nov 22, 2023

Thanks again @takluyver

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants