commit | 4c629e8520b68eb289b34882aa326d80a6f8e022 | [log] [tgz] |
---|---|---|
author | Ophir LOJKINE <contact@ophir.dev> | Mon Nov 18 13:30:53 2024 +0100 |
committer | GitHub <noreply@github.com> | Mon Nov 18 07:30:53 2024 -0500 |
tree | 705c878afb06d897e886a9980adc26b66512e283 | |
parent | a67a4f3cbe2cb6e266f3914db8520890bfa7e198 [diff] |
support sqlite's OR clauses in update statements (#1530)
This crate contains a lexer and parser for SQL that conforms with the ANSI/ISO SQL standard and other dialects. This crate is used as a foundation for SQL query engines, vendor-specific parsers, and various SQL analysis.
To parse a simple SELECT
statement:
use sqlparser::dialect::GenericDialect; use sqlparser::parser::Parser; let sql = "SELECT a, b, 123, myfunc(b) \ FROM table_1 \ WHERE a > b AND b < 100 \ ORDER BY a DESC, b"; let dialect = GenericDialect {}; // or AnsiDialect, or your own dialect ... let ast = Parser::parse_sql(&dialect, sql).unwrap(); println!("AST: {:?}", ast);
This outputs
AST: [Query(Query { ctes: [], body: Select(Select { distinct: false, projection: [UnnamedExpr(Identifier("a")), UnnamedExpr(Identifier("b")), UnnamedExpr(Value(Long(123))), UnnamedExpr(Function(Function { name: ObjectName(["myfunc"]), args: [Identifier("b")], filter: None, over: None, distinct: false }))], from: [TableWithJoins { relation: Table { name: ObjectName(["table_1"]), alias: None, args: [], with_hints: [] }, joins: [] }], selection: Some(BinaryOp { left: BinaryOp { left: Identifier("a"), op: Gt, right: Identifier("b") }, op: And, right: BinaryOp { left: Identifier("b"), op: Lt, right: Value(Long(100)) } }), group_by: [], having: None }), order_by: [OrderByExpr { expr: Identifier("a"), asc: Some(false) }, OrderByExpr { expr: Identifier("b"), asc: None }], limit: None, offset: None, fetch: None })]
The following optional crate features are available:
serde
: Adds Serde support by implementing Serialize
and Deserialize
for all AST nodes.visitor
: Adds a Visitor
capable of recursively walking the AST tree.This crate provides only a syntax parser, and tries to avoid applying any SQL semantics, and accepts queries that specific databases would reject, even when using that Database's specific Dialect
. For example, CREATE TABLE(x int, x int)
is accepted by this crate, even though most SQL engines will reject this statement due to the repeated column name x
.
This crate avoids semantic analysis because it varies drastically between dialects and implementations. If you want to do semantic analysis, feel free to use this project as a base.
This crate allows users to recover the original SQL text (with comments removed, normalized whitespace and keyword capitalization), which is useful for tools that analyze and manipulate SQL.
This means that other than comments, whitespace and the capitalization of keywords, the following should hold true for all SQL:
// Parse SQL let ast = Parser::parse_sql(&GenericDialect, sql).unwrap(); // The original SQL text can be generated from the AST assert_eq!(ast[0].to_string(), sql);
There are still some cases in this crate where different SQL with seemingly similar semantics are represented with the same AST. We welcome PRs to fix such issues and distinguish different syntaxes in the AST.
SQL was first standardized in 1987, and revisions of the standard have been published regularly since. Most revisions have added significant new features to the language, and as a result no database claims to support the full breadth of features. This parser currently supports most of the SQL-92 syntax, plus some syntax from newer versions that have been explicitly requested, plus some MSSQL, PostgreSQL, and other dialect-specific syntax. Whenever possible, the online SQL:2016 grammar is used to guide what syntax to accept.
Unfortunately, stating anything more specific about compliance is difficult. There is no publicly available test suite that can assess compliance automatically, and doing so manually would strain the project's limited resources. Still, we are interested in eventually supporting the full SQL dialect, and we are slowly building out our own test suite.
If you are assessing whether this project will be suitable for your needs, you'll likely need to experimentally verify whether it supports the subset of SQL that you need. Please file issues about any unsupported queries that you discover. Doing so helps us prioritize support for the portions of the standard that are actually used. Note that if you urgently need support for a feature, you will likely need to write the implementation yourself. See the Contributing section for details.
This crate contains a CLI program that can parse a file and dump the results as JSON:
$ cargo run --features json_example --example cli FILENAME.sql [--dialectname]
This parser is currently being used by the DataFusion query engine, LocustDB, Ballista, GlueSQL, Opteryx, Polars, PRQL, Qrlew, JumpWire, and ParadeDB.
If your project is using sqlparser-rs feel free to make a PR to add it to this list.
The core expression parser uses the Pratt Parser design, which is a top-down operator-precedence (TDOP) parser, while the surrounding SQL statement parser is a traditional, hand-written recursive descent parser. Eli Bendersky has a good tutorial on TDOP parsers, if you are interested in learning more about the technique.
We are a fan of this design pattern over parser generators for the following reasons:
This is a work in progress, but we have some notes on writing a custom SQL parser.
Contributions are highly encouraged! However, the bandwidth we have to maintain this crate is limited. Please read the following sections carefully.
The most commonly accepted PRs add support for or fix a bug in a feature in the SQL standard, or a popular RDBMS, such as Microsoft SQL Server or PostgreSQL, will likely be accepted after a brief review. Any SQL feature that is dialect specific should be parsed by both the relevant Dialect
as well as GenericDialect
.
The current maintainers do not plan for any substantial changes to this crate's API. PRs proposing major refactors are not likely to be accepted.
While we hope to review PRs in a reasonably timely fashion, it may take a week or more. In order to speed the process, please make sure the PR passes all CI checks, and includes tests demonstrating your code works as intended (and to avoid regressions). Remember to also test error paths.
PRs without tests will not be reviewed or merged. Since the CI ensures that cargo test
, cargo fmt
, and cargo clippy
, pass you should likely to run all three commands locally before submitting your PR.
If you are unable to submit a patch, feel free to file an issue instead. Please try to include:
Unfortunately, if you need support for a feature, you will likely need to implement it yourself, or file a well enough described ticket that another member of the community can do so. Our goal as maintainers is to facilitate the integration of various features from various contributors, but not to provide the implementations ourselves, as we simply don't have the resources.
All code in this repository is licensed under the Apache Software License 2.0.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be licensed as above, without any additional terms or conditions.