Skip to content

Support underscore separators in numbers for Clickhouse. Fixes #1659 #1677

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jan 28, 2025
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions src/dialect/clickhouse.rs
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,10 @@ impl Dialect for ClickHouseDialect {
true
}

fn supports_numeric_literal_underscores(&self) -> bool {
true
}

// ClickHouse uses this for some FORMAT expressions in `INSERT` context, e.g. when inserting
// with FORMAT JSONEachRow a raw JSON key-value expression is valid and expected.
//
Expand Down
5 changes: 5 additions & 0 deletions src/dialect/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -304,6 +304,11 @@ pub trait Dialect: Debug + Any {
false
}

/// Returns true if the dialect supports numbers containing underscores, e.g. `10_000_000`
fn supports_numeric_literal_underscores(&self) -> bool {
false
}

/// Returns true if the dialects supports specifying null treatment
/// as part of a window function's parameter list as opposed
/// to after the parameter list.
Expand Down
22 changes: 21 additions & 1 deletion src/tokenizer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1147,7 +1147,11 @@ impl<'a> Tokenizer<'a> {
s.push('.');
chars.next();
}
s += &peeking_take_while(chars, |ch| ch.is_ascii_digit());

s += &peeking_take_while(chars, |ch| {
ch.is_ascii_digit()
|| self.dialect.supports_numeric_literal_underscores() && ch == '_'
Copy link
Contributor

@hansott hansott Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if starts or ends with _?

I think ClickHouse rejects that: https://github.com/ClickHouse/ClickHouse/blob/master/src/Common/StringUtils.h#L171-L182

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be worth having a test to demonstrate the behavior with a trailing underscore

});

// No number -> Token::Period
if s == "." {
Expand Down Expand Up @@ -2223,6 +2227,22 @@ mod tests {
compare(expected, tokens);
}

#[test]
fn tokenize_numeric_literal_underscore() {
let sql = String::from("SELECT 10_000");
let dialect = ClickHouseDialect {};
let mut tokenizer = Tokenizer::new(&dialect, &sql);
let tokens = tokenizer.tokenize().unwrap();

let expected = vec![
Token::make_keyword("SELECT"),
Token::Whitespace(Whitespace::Space),
Token::Number("10_000".to_string(), false),
];

compare(expected, tokens);
}

#[test]
fn tokenize_select_exponent() {
let sql = String::from("SELECT 1e10, 1e-10, 1e+10, 1ea, 1e-10a, 1e-10-10");
Expand Down
13 changes: 13 additions & 0 deletions tests/sqlparser_clickhouse.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1646,6 +1646,19 @@ fn parse_table_sample() {
clickhouse().verified_stmt("SELECT * FROM tbl SAMPLE 1 / 10 OFFSET 1 / 2");
}

#[test]
fn parse_numbers_with_underscore() {
let select = clickhouse().verified_only_select("SELECT 10_000");

assert_eq!(
select.projection,
vec![SelectItem::UnnamedExpr(Expr::Value(Value::Number(
"10_000".to_string(),
false
))),]
)
}

fn clickhouse() -> TestedDialects {
TestedDialects::new(vec![Box::new(ClickHouseDialect {})])
}
Expand Down
Loading