Skip to content

Add support for IS [NOT] [form] NORMALIZED #1655

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 17, 2025

Conversation

alexander-beedie
Copy link
Contributor

@alexander-beedie alexander-beedie commented Jan 15, 2025

Adds parsing support for IS [NOT] [<form>] NORMALIZED → bool syntax:

Details from the PostgreSQL string function docs:
https://www.postgresql.org/docs/current/functions-string.html

Checks whether the string is in the specified Unicode normalization
form. The optional 'form' keyword specifies the form: NFC (the default),
NFD, NFKC, or NFKD. This expression can only be used when the server
encoding is UTF8. Note that checking for normalization using this
expression is often faster than normalizing possibly already
normalized strings.
  • NFC: Canonical Decomposition, followed by Canonical Composition.
  • NFD: Canonical Decomposition.
  • NFKC: Compatibility Decomposition, followed by Canonical Composition.
  • NFKD: Compatibility Decomposition.

As the normalised forms are fixed (there are only these four), it seemed reasonable to return the parsed form as a new Option<NormalizationForm> Enum (which helps the caller as they don't have to check the string or case-normalise it, and can then jump straight into some associated match block, etc).

(Also: fixed a few minor typos).

Examples

Default/omitted form:

strcol IS NORMALIZED
strcol IS NOT NORMALIZED

Specific form:

strcol IS NFKC NORMALIZED
strcol IS NOT NFKD NORMALIZED

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @alexander-beedie -- I think this looks quite nice and well tested

fyi @iffyio

@@ -1118,7 +1124,7 @@ impl fmt::Display for LambdaFunction {
/// `OneOrManyWithParens` implements `Deref<Target = [T]>` and `IntoIterator`,
/// so you can call slice methods on it and iterate over items
/// # Examples
/// Acessing as a slice:
/// Accessing as a slice:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for these cleanups

Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment regarding the API signature, otherwise this looks good to me!

Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @alexander-beedie!

@iffyio iffyio merged commit e9498d5 into apache:main Jan 17, 2025
9 checks passed
@alexander-beedie alexander-beedie deleted the is-normalized branch January 17, 2025 10:00
hansott added a commit to hansott/datafusion-sqlparser-rs that referenced this pull request Jan 23, 2025
…o escape-literals

* 'main' of github.com:hansott/datafusion-sqlparser-rs:
  National strings: check if dialect supports backslash escape (apache#1672)
  Add support for Create Iceberg Table statement for Snowflake parser (apache#1664)
  Add support for Snowflake account privileges (apache#1666)
  Update rat_exclude_file.txt (apache#1670)
  Update verson to 0.54.0 and update changelog (apache#1668)
  Add support for Snowflake AT/BEFORE (apache#1667)
  Add support for qualified column names in JOIN ... USING (apache#1663)
  Add support for `IS [NOT] [form] NORMALIZED` (apache#1655)
  fix parsing of `INSERT INTO ... SELECT ... RETURNING ` (apache#1661)
  Add support for Snowflake column aliases that use SQL keywords (apache#1632)
Vedin pushed a commit to Embucket/datafusion-sqlparser-rs that referenced this pull request Feb 3, 2025
Vedin pushed a commit to Embucket/datafusion-sqlparser-rs that referenced this pull request Feb 3, 2025
Vedin added a commit to Embucket/datafusion-sqlparser-rs that referenced this pull request Feb 3, 2025
ayman-sigma pushed a commit to sigmacomputing/sqlparser-rs that referenced this pull request Apr 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants