blob: b4b85e77a2e5fd8f376a768af7b8c000f1090bc1 [file] [log] [blame]
.. Licensed to the Apache Software Foundation (ASF) under one
.. or more contributor license agreements. See the NOTICE file
.. distributed with this work for additional information
.. regarding copyright ownership. The ASF licenses this file
.. to you under the Apache License, Version 2.0 (the
.. "License"); you may not use this file except in compliance
.. with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
.. software distributed under the License is distributed on an
.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
.. KIND, either express or implied. See the License for the
.. specific language governing permissions and limitations
.. under the License.
.. _flight-sql:
================
Arrow Flight SQL
================
Arrow Flight SQL is a protocol for interacting with SQL databases
using the Arrow in-memory format and the :doc:`Flight RPC
<./Flight>` framework.
Generally, a database will implement the RPC methods according to the
specification, but does not need to implement a client-side driver. A
database client can use the provided Flight SQL client to interact
with any database that supports the necessary endpoints. Flight SQL
clients wrap the underlying Flight client to provide methods for the
new RPC methods described here.
RPC Methods
===========
Flight SQL reuses the predefined RPC methods in Arrow Flight, and
provides various commands that pair those methods with request/response
messages defined via Protobuf (see below).
SQL Metadata
------------
Flight SQL provides a variety of commands to fetch catalog metadata
about the database server.
All of these commands can be used with the GetFlightInfo and GetSchema
RPC methods. The Protobuf request message should be packed into a
google.protobuf.Any message, then serialized and packed as the ``cmd``
field in a CMD-type FlightDescriptor.
If the command is used with GetFlightInfo, the server will return a
FlightInfo response. The client should then use the Ticket(s) in the
FlightInfo with the DoGet RPC method to fetch a Arrow data containing
the results of the command. In other words, SQL metadata is returned as
Arrow data, just like query results themselves.
The Arrow schema returned by GetSchema or DoGet for a particular
command is fixed according to the specification.
``CommandGetCatalogs``
List the catalogs available in the database. The definition varies
by vendor.
``CommandGetCrossReference``
List the foreign key columns in a given table that reference
columns in a given parent table.
``CommandGetDbSchemas``
List the schemas (note: a grouping of tables, *not* an Arrow
schema) available in the database. The definition varies by
vendor.
``CommandGetExportedKeys``
List foreign key columns that reference the primary key columns of
a given table.
``CommandGetImportedKeys``
List foreign keys of a given table.
``CommandGetPrimaryKeys``
List the primary keys of a given table.
``CommandGetSqlInfo``
Fetch metadata about the database server and its supported SQL
features.
``CommandGetTables``
List tables in the database.
``CommandGetTableTypes``
List table types in the database. The list of types varies by
vendor.
Query Execution
---------------
Flight SQL also provides commands to execute SQL queries and manage
prepared statements.
Many of these commands are also used with GetFlightInfo/GetSchema and
work identically to the metadata methods above. Some of these commands
can be used with the DoPut RPC method, but the command should still be
encoded in the request FlightDescriptor in the same way.
Commands beginning with "Action" are instead used with the DoAction
RPC method, in which case the command should be packed into a
google.protobuf.Any message, then serialized and packed into the
``body`` of a Flight Action. Also, the action ``type`` should be set
to the command name (i.e. for ``ActionClosePreparedStatementRequest``,
the ``type`` should be ``ClosePreparedStatement``).
Commands that execute updates such as ``CommandStatementUpdate`` and
``CommandStatementIngest`` return a Flight SQL ``DoPutUpdateResult``
after consuming the entire FlightData stream. This message is encoded
in the ``app_metadata`` field of the Flight RPC ``PutResult`` returned.
``ActionClosePreparedStatementRequest``
Close a previously created prepared statement.
``ActionCreatePreparedStatementRequest``
Create a new prepared statement for a SQL query.
The response will contain an opaque handle used to identify the
prepared statement. It may also contain two optional schemas: the
Arrow schema of the result set, and the Arrow schema of the bind
parameters (if any). Because the schema of the result set may
depend on the bind parameters, the schemas may not necessarily be
provided here as a result, or if provided, they may not be accurate.
Clients should not assume the schema provided here will be the
schema of any data actually returned by executing the prepared
statement.
Some statements may have bind parameters without any specific type.
(As a trivial example for SQL, consider ``SELECT ?``.) It is
not currently specified how this should be handled in the bind
parameter schema above. We suggest either using a union type to
enumerate the possible types, or using the NA (null) type as a
wildcard/placeholder.
``CommandPreparedStatementQuery``
Execute a previously created prepared statement and get the results.
When used with DoPut: binds parameter values to the prepared statement.
The server may optionally provide an updated handle in the response.
Updating the handle allows the client to supply all state required to
execute the query in an ActionPreparedStatementExecute message.
For example, stateless servers can encode the bound parameter values into
the new handle, and the client will send that new handle with parameters
back to the server.
Note that a handle returned from a DoPut call with
CommandPreparedStatementQuery can itself be passed to a subsequent DoPut
call with CommandPreparedStatementQuery to bind a new set of parameters.
The subsequent call itself may return an updated handle which again should
be used for subsequent requests.
The server is responsible for detecting the case where the client does not
use the updated handle and should return an error.
When used with GetFlightInfo: execute the prepared statement. The
prepared statement can be reused after fetching results.
When used with GetSchema: get the expected Arrow schema of the
result set. If the client has bound parameter values with DoPut
previously, the server should take those values into account.
``CommandPreparedStatementUpdate``
Execute a previously created prepared statement that does not
return results.
When used with DoPut: execute the query and return the number of
affected rows. The prepared statement can be reused afterwards.
``CommandStatementQuery``
Execute an ad-hoc SQL query.
When used with GetFlightInfo: execute the query (call DoGet to
fetch results).
When used with GetSchema: return the schema of the query results.
``CommandStatementUpdate``
Execute an ad-hoc SQL query that does not return results.
When used with DoPut: execute the query and return the number of
affected rows.
``CommandStatementIngest``
Execute a bulk ingestion.
When used with DoPut: load the stream of Arrow record batches into
the specified target table and return the number of rows ingested
via a ``DoPutUpdateResult`` message.
Flight Server Session Management
--------------------------------
Flight SQL provides commands to set and update server session variables
which affect the server behaviour in various ways. Common options may
include (depending on the server implementation) ``catalog`` and
``schema``, indicating the currently-selected catalog and schema for
queries to be run against.
Clients should prefer, where possible, setting options prior to issuing
queries and other commands, as some server implementations may require
these options be set exactly once and prior to any other activity which
may trigger their implicit setting.
For compatibility with Database Connectivity drivers (JDBC, ODBC, and
others), it is strongly recommended that server implementations accept
string representations of all option values which may be provided to the
driver as part of a server connection string and passed through to the
server without further conversion. For ease of use it is also recommended
to accept and convert other numeric types to the preferred type for an
option value, however this is not required.
Sessions are persisted between the client and server using an
implementation-defined mechanism, which is typically RFC 6265 cookies.
Servers may also combine other connection state opaquely with the
session token: Consider that the lifespan and semantics of a session
should make sense for any additional uses, e.g. CloseSession would also
invalidate any authentication context persisted via the session context.
A session may be initiated upon a nonempty (or empty) SetSessionOptions
call, or at any other time of the server's choosing.
``SetSessionOptions``
Set server session option(s) by name/value.
``GetSessionOptions``
Get the current server session options, including those set by the client
and any defaulted or implicitly set by the server.
``CloseSession``
Close and invalidate the current session context.
Sequence Diagrams
=================
.. mermaid:: ./FlightSql/CommandGetTables.mmd
:caption: Listing available tables.
.. mermaid:: ./FlightSql/CommandStatementQuery.mmd
:caption: Executing an ad-hoc query.
.. mermaid:: ./FlightSql/CommandPreparedStatementQuery.mmd
:caption: Creating a prepared statement, then executing it.
.. mermaid:: ./FlightSql/CommandStatementIngest.mmd
:caption: Executing a bulk ingestion.
External Resources
==================
- `Introducing Apache Arrow Flight SQL: Accelerating Database Access
<https://arrow.apache.org/blog/2022/02/16/introducing-arrow-flight-sql/>`_
Protocol Buffer Definitions
===========================
.. literalinclude:: ../../../format/FlightSql.proto
:language: protobuf
:linenos: