-
Notifications
You must be signed in to change notification settings - Fork 606
README: update for recent improvements #128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,15 +6,22 @@ | |
[](https://coveralls.io/github/andygrove/sqlparser-rs?branch=master) | ||
[](https://gitter.im/sqlparser-rs/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) | ||
|
||
The goal of this project is to build a SQL lexer and parser capable of parsing SQL that conforms with the [ANSI SQL:2011](https://jakewheat.github.io/sql-overview/sql-2011-foundation-grammar.html#_5_1_sql_terminal_character) standard but also making it easy to support custom dialects so that this crate can be used as a foundation for vendor-specific parsers. | ||
The goal of this project is to build a SQL lexer and parser capable of parsing | ||
SQL that conforms with the [ANSI/ISO SQL standard][sql-standard] while also | ||
making it easy to support custom dialects so that this crate can be used as a | ||
foundation for vendor-specific parsers. | ||
|
||
This parser is currently being used by the [DataFusion](https://github.com/andygrove/datafusion) query engine and [LocustDB](https://github.com/cswinter/LocustDB). | ||
This parser is currently being used by the [DataFusion] query engine and | ||
[LocustDB]. | ||
|
||
## Example | ||
|
||
The current code is capable of parsing some trivial SELECT and CREATE TABLE statements. | ||
To parse a simple `SELECT` statement: | ||
|
||
```rust | ||
use sqlparser::dialect::GenericDialect; | ||
use sqlparser::parser::Parser; | ||
|
||
let sql = "SELECT a, b, 123, myfunc(b) \ | ||
FROM table_1 \ | ||
WHERE a > b AND b < 100 \ | ||
|
@@ -33,23 +40,88 @@ This outputs | |
AST: [Query(Query { ctes: [], body: Select(Select { distinct: false, projection: [UnnamedExpr(Identifier("a")), UnnamedExpr(Identifier("b")), UnnamedExpr(Value(Long(123))), UnnamedExpr(Function(Function { name: ObjectName(["myfunc"]), args: [Identifier("b")], over: None, distinct: false }))], from: [TableWithJoins { relation: Table { name: ObjectName(["table_1"]), alias: None, args: [], with_hints: [] }, joins: [] }], selection: Some(BinaryOp { left: BinaryOp { left: Identifier("a"), op: Gt, right: Identifier("b") }, op: And, right: BinaryOp { left: Identifier("b"), op: Lt, right: Value(Long(100)) } }), group_by: [], having: None }), order_by: [OrderByExpr { expr: Identifier("a"), asc: Some(false) }, OrderByExpr { expr: Identifier("b"), asc: None }], limit: None, offset: None, fetch: None })] | ||
``` | ||
|
||
## SQL compliance | ||
|
||
SQL was first standardized in 1987, and revisions of the standard have been | ||
published regularly since. Most revisions have added significant new features to | ||
the language, and as a result no database claims to support the full breadth of | ||
features. This parser currently supports most of the SQL-92 syntax, plus some | ||
syntax from newer versions that have been explicitly requested, plus some MSSQL- | ||
and PostgreSQL-specific syntax. Whenever possible, the [online SQL:2011 | ||
grammar][sql-2011-grammar] is used to guide what syntax to accept. (We will | ||
happily accept changes that conform to the SQL:2016 syntax as well, but that | ||
edition's grammar is not yet available online.) | ||
|
||
Unfortunately, stating anything more specific about compliance is difficult. | ||
There is no publicly available test suite that can assess compliance | ||
automatically, and doing so manually would strain the project's limited | ||
resources. Still, we are interested in eventually supporting the full SQL | ||
dialect, and we are slowly building out our own test suite. | ||
|
||
If you are assessing whether this project will be suitable for your needs, | ||
you'll likely need to experimentally verify whether it supports the subset of | ||
SQL that you need. Please file issues about any unsupported queries that you | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps we should request PRs when possible, and issues - preferably with relevant bits of the grammar and/or testcases - otherwise? I feel this is more of a "we can share our efforts" project, rather than "we have some free resources to implement this for you". If that's the case, we should be upfront about this. We could also add a link to a typical PR implementing a new syntax (once there is one after the renames in 0.4) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed on all points. I've adjusted this section a bit, but added the bulk of your recommendations to the "Contributing" section below. |
||
discover. Doing so helps us prioritize support for the portions of the standard | ||
that are actually used. Note that if you urgently need support for a feature, | ||
you will likely need to write the implementation yourself. See the | ||
[Contributing](#Contributing) section for details. | ||
|
||
### Supporting custom SQL dialects | ||
|
||
This is a work in progress, but we have some notes on [writing a custom SQL | ||
parser](docs/custom_sql_parser.md). | ||
|
||
## Design | ||
|
||
This parser is implemented using the [Pratt Parser](https://tdop.github.io/) design, which is a top-down operator-precedence parser. | ||
The core expression parser uses the [Pratt Parser] design, which is a top-down | ||
operator-precedence (TDOP) parser, while the surrounding SQL statement parser is | ||
a traditional, hand-written recursive descent parser. Eli Bendersky has a good | ||
[tutorial on TDOP parsers][tdop-tutorial], if you are interested in learning | ||
more about the technique. | ||
|
||
I am a fan of this design pattern over parser generators for the following reasons: | ||
We are a fan of this design pattern over parser generators for the following | ||
reasons: | ||
|
||
- Code is simple to write and can be concise and elegant (this is far from true for this current implementation unfortunately, but I hope to fix that using some macros) | ||
- Code is simple to write and can be concise and elegant | ||
- Performance is generally better than code generated by parser generators | ||
- Debugging is much easier with hand-written code | ||
- It is far easier to extend and make dialect-specific extensions compared to using a parser generator | ||
|
||
## Supporting custom SQL dialects | ||
|
||
This is a work in progress but I started some notes on [writing a custom SQL parser](docs/custom_sql_parser.md). | ||
- It is far easier to extend and make dialect-specific extensions | ||
compared to using a parser generator | ||
|
||
## Contributing | ||
|
||
Contributors are welcome! Please see the [current issues](https://github.com/andygrove/sqlparser-rs/issues) and feel free to file more! | ||
|
||
Please run [cargo fmt](https://github.com/rust-lang/rustfmt#on-the-stable-toolchain) to ensure the code is properly formatted. | ||
Contributions are highly encouraged! | ||
|
||
Pull requests that add support for or fix a bug in a feature in the SQL | ||
standard, or a feature in a popular RDBMS, like Microsoft SQL Server or | ||
PostgreSQL, will almost certainly be accepted after a brief review. For | ||
particularly large or invasive changes, consider opening an issue first, | ||
especially if you are a first time contributor, so that you can coordinate with | ||
the maintainers. CI will ensure that your code passes `cargo test`, | ||
`cargo fmt`, and `cargo clippy`, so you will likely want to run all three | ||
commands locally before submitting your PR. | ||
|
||
If you are unable to submit a patch, feel free to file an issue instead. Please | ||
try to include: | ||
|
||
* some representative examples of the syntax you wish to support or fix; | ||
* the relevant bits of the [SQL grammar][sql-2011-grammar], if the syntax is | ||
part of SQL:2011; and | ||
* links to documentation for the feature for a few of the most popular | ||
databases that support it. | ||
|
||
Please be aware that, while we strive to address bugs and review PRs quickly, we | ||
make no such guarantees for feature requests. If you need support for a feature, | ||
you will likely need to implement it yourself. Our goal as maintainers is to | ||
facilitate the integration of various features from various contributors, but | ||
not to provide the implementations ourselves, as we simply don't have the | ||
resources. | ||
|
||
[tdop-tutorial]: https://eli.thegreenplace.net/2010/01/02/top-down-operator-precedence-parsing | ||
[`cargo fmt`]: https://github.com/rust-lang/rustfmt#on-the-stable-toolchain | ||
[current issues]: https://github.com/andygrove/sqlparser-rs/issues | ||
[DataFusion]: https://github.com/apache/arrow/tree/master/rust/datafusion | ||
[LocustDB]: https://github.com/cswinter/LocustDB | ||
[Pratt Parser]: https://tdop.github.io/ | ||
[sql-2011-grammar]: https://jakewheat.github.io/sql-overview/sql-2011-foundation-grammar.html | ||
[sql-standard]: https://en.wikipedia.org/wiki/ISO/IEC_9075 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've recently tried to create a new app using this crate and I instinctively tried to copy-paste the code from README to main.rs. I think adding the required imports would be helpful:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Done.