Skip to content

move primitive dialect configuration into a struct #1431

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

samuelcolvin
Copy link
Contributor

Step towards #1430.

I personally think this should make it much easier to add more flags to dialects. But I can see it might not be to everyone's taste.

So I want confirmation it's a good idea before prcoeeding.

This should:

  • make it less verbose to add more options to dialects
  • make it easier to define dialects that inherit from other dialects.

@coveralls
Copy link

coveralls commented Sep 14, 2024

Pull Request Test Coverage Report for Build 10862628784

Details

  • 381 of 473 (80.55%) changed or added relevant lines in 32 files are covered.
  • 855 unchanged lines in 10 files lost coverage.
  • Overall coverage increased (+0.08%) to 89.411%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/dialect/mod.rs 40 41 97.56%
src/lib.rs 0 1 0.0%
src/tokenizer.rs 63 64 98.44%
src/dialect/ansi.rs 4 6 66.67%
src/dialect/mssql.rs 4 7 57.14%
src/dialect/redshift.rs 6 9 66.67%
src/dialect/hive.rs 4 8 50.0%
src/dialect/mysql.rs 4 8 50.0%
src/dialect/sqlite.rs 4 8 50.0%
tests/sqlparser_custom_dialect.rs 8 12 66.67%
Files with Coverage Reduction New Missed Lines %
tests/sqlparser_redshift.rs 1 97.14%
src/dialect/mod.rs 1 78.03%
src/tokenizer.rs 1 94.24%
tests/sqlparser_duckdb.rs 2 99.57%
tests/sqlparser_bigquery.rs 8 90.32%
src/parser/mod.rs 10 93.31%
tests/sqlparser_snowflake.rs 10 98.06%
tests/sqlparser_databricks.rs 16 89.61%
tests/sqlparser_postgres.rs 213 88.64%
tests/sqlparser_common.rs 593 89.8%
Totals Coverage Status
Change from base Build 10817360923: 0.08%
Covered Lines: 29487
Relevant Lines: 32979

💛 - Coveralls

Copy link
Contributor Author

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall this PR is significantly negative LOC — showing that this approach is at least less verbose.

fn dialect(&self) -> std::any::TypeId {
fn settings(&self) -> DialectSettings {
let s = self.0.settings();
DialectSettings {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is slightly ugly because WrappedDialect only implemented a few of the methods on the dialect.

But this actually demonstrates one of the advtanges of the approach in this PR:

If this test had been written with the new behaviour, this method would become

fn settings(&self) -> DialectSettings {
  let mut s = self.0.settings();
  s.custom_thing = true;
  s
}

And it would stay up-to-date even as new setting were added. Unlike the current behaviour.

I can update this test if you like?

fn is_proper_identifier_inside_quotes(
&self,
chars: std::iter::Peekable<std::str::Chars<'_>>,
chars: Peekable<Chars<'_>>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be great if clippy or rustfmt could warn about these unnecessary paths, rustrover warns, hence the drive by change.

Comment on lines 256 to 258
fn flags(&self) -> DialectFlags {
DialectFlags::default()
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like creating a new struct on every call (and there are a lot).

I'd rather store the struct on each dialect, then the signature here can before

fn flags(&self) -> &DialectFlags {
  &self.settings
}

The problem then is that every struct needs a fn new() or impl Default, and the signature to create all dialects changes everywhere, but if we're happy with that breaking change, the whole thing would become much cleaner. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@samuelcolvin
Copy link
Contributor Author

Okay having said I would wait for feedback, I couldn't resist moving flags onto the structs.

While I'm aware there's a good chance this might get closed, I think this is the right way to go if we want something of this sort.

@samuelcolvin samuelcolvin marked this pull request as ready for review September 14, 2024 13:26
Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samuelcolvin Thanks for tackling this! The changes look very reasonable to me

cc @alamb also @jmhain I think you had a similar idea for this, in case you have any thoughts

@@ -63,6 +63,152 @@ macro_rules! dialect_of {
};
}

/// Constant settings for a dialect.
///
/// For configuration to be defined here, rather than on the rate:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the rate

was this a typo?

/// * It must be a const setting, not value that's dependent on context - that has to be a function
/// * have a default that matches `Default::default()`
#[derive(Debug, Default, Clone)]
pub struct DialectFlags {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to call it e.g. DialectSettings, DialectConstants or similar if we envision this potentially containing more than boolean flags?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to change a name.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @samuelcolvin and @iffyio

In my opinion:

  1. Adding impl Default for the dialects is nice as it is sort of a usability papercut (it would also nice to have a new() function, to follow convention)
  2. In terms of DialectFlags I personally think the improvement is not worth the API churn, though I don't really know how many people have their own Dialect trait implementations 🤔

@samuelcolvin
Copy link
Contributor Author

I personally think the improvement is not worth the API churn

Okay, I'll leave this open for a few days in case anyone else is particularly keen for it, otherwise I'll close this and work on #1430 by adding lots of boolean functions.

@alamb
Copy link
Contributor

alamb commented Sep 19, 2024

Okay, I'll leave this open for a few days in case anyone else is particularly keen for it, otherwise I'll close this and work on #1430 by adding lots of boolean functions.

Maybe we should contemplate creating a parser that parameterized on the dialect (Parser<Dialect> -- I know we have discussed this before and it likely isn't a huge deal)

@jmhain
Copy link
Contributor

jmhain commented Sep 19, 2024

Maybe we should contemplate creating a parser that parameterized on the dialect (Parser -- I know we have discussed this before and it likely isn't a huge deal)

This would cause a lot of code bloat for use cases that require parsing many different dialects. What's the benefit of doing it?

@samuelcolvin
Copy link
Contributor Author

What do you mean by "a lot of code bloat"?

Surely you can create one parser for each dialect? The memory overhead would be tiny.

Avoiding dynamic dispatch would lead to better performance, smaller binaries and (most importantly) more canonical code in the library.

@alamb
Copy link
Contributor

alamb commented Sep 20, 2024

What do you mean by "a lot of code bloat"?

Surely you can create one parser for each dialect? The memory overhead would be tiny.

Avoiding dynamic dispatch would lead to better performance, smaller binaries and (most importantly) more canonical code in the library.

I think the idea is if we made a parser you could end up with multiple copies of the Parser code for each one you instantiated -- like Parser<GenericDialect> and Parser<BigQueryDialect>

I think one workaround would be to make your own DynamicDialect that dispatched dynamically and have Parser<DynamicDialect>

However I think I may have lead us offtopic here. The dynamic dispatch has always kind of annoyed me but I have never had any actual justification to change it. Sorry for starting this discusision

@alamb alamb marked this pull request as draft October 20, 2024 18:13
@alamb
Copy link
Contributor

alamb commented Oct 20, 2024

Converting to a draft as I don't think this PR is waiting on review anymore

Copy link

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale label Dec 20, 2024
@samuelcolvin
Copy link
Contributor Author

Closing as I don't think we're going this way anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants