-
Notifications
You must be signed in to change notification settings - Fork 606
Don't lose precision when parsing decimal fractions #130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pull Request Test Coverage Report for Build 449
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great!
One thing that bothers me is that we use Value::Date(String)
in the AST for DATE literals, instead of bundling chrono; while for decimals you've elected to rely on bigdecimal. Is there a reason for this discrepancy or do you see us adding chrono for date/time values later?
Is having values parsed in the AST useful in context of an SQL engine? I haven't studied any SQL implementations, but I imagined one would have to make at least one pass over the AST (to resolve the object names), producing a different structure before inspecting the values. I'd like to know if that assumption is wrong.
I have a slight preference for adding a way to parse the literals to this crate as an optional cargo feature, but I realize that my obsession with build times can sometimes be unhealthy:)
Yes, this bothers me too! The reason I went with this is because it seemed really disruptive to remove all integer/float parsing, but I'm starting to lean towards that approach. There's a problem right now in that someone can write
Nope, it's not that useful. It was useful when we (Materialize) were just prototyping our SQL engine, but as its evolved we've found that the values parsing was too opinionated. It's fine when the parser produces exactly the literal representation we need (e.g., using this bigdecimal crate, the ordered float crate, etc.), but that approach certainly doesn't scale as we have additional users with different and conflicting requirements.
Very on board with that! How do you feel about this as the standard pub enum Value {
/// Numeric literal
Number(String),
/// 'string value'
SingleQuotedString(String),
/// N'string value'
NationalStringLiteral(String),
/// X'hex value'
HexStringLiteral(String),
/// Boolean value true or false
Boolean(bool),
/// `DATE '...'` literals
Date(String),
/// `TIME '...'` literals
Time(String),
/// `TIMESTAMP '...'` literals
Timestamp(String),
/// INTERVAL literals, roughly in the following format:
/// `INTERVAL '<value>' <leading_field> [ (<leading_precision>) ]
/// [ TO <last_field> [ (<fractional_seconds_precision>) ] ]`,
/// e.g. `INTERVAL '123:45.67' MINUTE(3) TO SECOND(2)`.
///
/// The parser does not validate the `<value>`, nor does it ensure
/// that the `<leading_field>` units >= the units in `<last_field>`,
/// so the user will have to reject intervals like `HOUR TO YEAR`.
Interval {
value: String,
leading_field: DateTimeField,
leading_precision: Option<u64>,
last_field: Option<DateTimeField>,
/// The seconds precision can be specified in SQL source as
/// `INTERVAL '__' SECOND(_, x)` (in which case the `leading_field`
/// will be `Second` and the `last_field` will be `None`),
/// or as `__ TO SECOND(x)`.
fractional_seconds_precision: Option<u64>,
},
/// `NULL` value
Null,
} It feels a bit strange at first, but seems quite sensible once you sit with it. We could retain the existing |
So a single I'm not sure I understand: why provide an option to keep existing variants, instead of just providing an optional |
Yep, this is done now! Please take a look when you get a chance.
I think we're just saying the same thing in different ways. I coded up what I meant, at least, in the second commit in this PR—let me know if that's also what you had in mind! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're just saying the same thing in different ways.
You're right, this is exactly what I had in mind. Thanks!
src/parser.rs
Outdated
@@ -1854,20 +1847,17 @@ impl Parser { | |||
} | |||
|
|||
/// Parse a LIMIT clause | |||
pub fn parse_limit(&mut self) -> Result<Option<Expr>, ParserError> { | |||
pub fn parse_limit(&mut self) -> Result<Option<u64>, ParserError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not against this change, but I didn't make it sooner because at least Postgres allows expressions here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good point. I've backed this bit out.
src/parser.rs
Outdated
let value = self | ||
.parse_literal_uint() | ||
.map(|n| Expr::Value(Value::Long(n)))?; | ||
pub fn parse_offset(&mut self) -> Result<u64, ParserError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. This was added in this PR https://github.com/andygrove/sqlparser-rs/pull/69#issuecomment-497160698
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backed this bit out too.
@@ -1176,7 +1176,10 @@ impl Parser { | |||
return parser_err!(format!("No value parser for keyword {}", k.keyword)); | |||
} | |||
}, | |||
Token::Number(ref n) => Ok(Value::Number(n.to_string())), | |||
Token::Number(ref n) => match n.parse() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was confused by this at first, as the IDE suggested that parse()
returns a String
. Perhaps add a comment that this is for feature=bigdecimal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! I believe I've addressed all your feedback.
The SQL standard requires that numeric literals with a decimal point, like 1.23, are represented exactly, up to some precision. That means that parsing these literals into f64s is invalid, as it is impossible to represent many decimal numbers exactly in binary floating point (for example, 0.3). This commit parses all numeric literals into a new `Value` variant `Number(String)`, removing the old `Long(u64)` and `Double(f64)` variants. This is slightly less convenient for downstream consumers, but far more flexible, as numbers that do not fit into a u64 and f64 are now representable.
With `--features bigdecimal`, parse numbers into BigDecimals instead of leaving them as strings.
Yep, thank you! |
The SQL standard requires that numeric literals with a decimal point,
like 1.23, are represented exactly, up to some precision. That means
that parsing these literals into f64s is invalid, as it is impossible
to represent many decimal numbers exactly in binary floating point (for
example, 0.3).
This commit instead parses these literals into a big decimal type that
is capable of representing every decimal fraction exactly. Downstream
consumers can lower this representation into something more efficient
but less precise (like the old f64) if desired.