| .. Licensed to the Apache Software Foundation (ASF) under one |
| .. or more contributor license agreements. See the NOTICE file |
| .. distributed with this work for additional information |
| .. regarding copyright ownership. The ASF licenses this file |
| .. to you under the Apache License, Version 2.0 (the |
| .. "License"); you may not use this file except in compliance |
| .. with the License. You may obtain a copy of the License at |
| |
| .. http://www.apache.org/licenses/LICENSE-2.0 |
| |
| .. Unless required by applicable law or agreed to in writing, |
| .. software distributed under the License is distributed on an |
| .. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| .. KIND, either express or implied. See the License for the |
| .. specific language governing permissions and limitations |
| .. under the License. |
| |
| ================== |
| Arrow JDBC Adapter |
| ================== |
| |
| The Arrow JDBC Adapter assists with working with JDBC and Arrow |
| data. Currently, it supports reading JDBC ResultSets into Arrow |
| VectorSchemaRoots. |
| |
| ResultSet to VectorSchemaRoot Conversion |
| ======================================== |
| |
| This can be accessed via the JdbcToArrow class. The resulting |
| ArrowVectorIterator will convert a ResultSet to Arrow data in batches |
| of rows. |
| |
| .. code-block:: java |
| |
| try (ArrowVectorIterator it = JdbcToArrow.sqlToArrowVectorIterator(resultSet, allocator)) { |
| while (it.hasNext()) { |
| VectorSchemaRoot root = it.next(); |
| // Consume the root… |
| } |
| } |
| |
| The batch size and type mapping can both be customized: |
| |
| .. code-block:: java |
| |
| JdbcToArrowConfig config = new JdbcToArrowConfigBuilder(allocator, /*calendar=*/null) |
| .setReuseVectorSchemaRoot(reuseVectorSchemaRoot) |
| .setJdbcToArrowTypeConverter((jdbcFieldInfo -> { |
| switch (jdbcFieldInfo.getJdbcType()) { |
| case Types.BIGINT: |
| // Assume actual value range is SMALLINT |
| return new ArrowType.Int(16, true); |
| default: |
| return null; |
| } |
| })) |
| .build(); |
| try (ArrowVectorIterator iter = JdbcToArrow.sqlToArrowVectorIterator(rs, config)) { |
| while (iter.hasNext()) { |
| VectorSchemaRoot root = iter.next(); |
| // Consume the root… |
| } |
| } |
| |
| The JDBC type can be explicitly specified, which is useful since JDBC |
| drivers can give spurious type information. For example, the Postgres |
| driver has been observed to use Decimal types with scale and precision |
| 0; these cases can be handled by specifying the type explicitly before |
| reading. Also, some JDBC drivers may return BigDecimal values with |
| inconsistent scale. A RoundingMode can be set to handle these cases: |
| |
| .. code-block:: java |
| |
| Map<Integer, JdbcFieldInfo> mapping = new HashMap<>(); |
| mapping.put(1, new JdbcFieldInfo(Types.DECIMAL, 20, 7)); |
| JdbcToArrowConfig config = new JdbcToArrowConfigBuilder(allocator, /*calendar=*/null) |
| .setBigDecimalRoundingMode(RoundingMode.UNNECESSARY) |
| .setExplicitTypesByColumnIndex(mapping) |
| .build(); |
| try (ArrowVectorIterator iter = JdbcToArrow.sqlToArrowVectorIterator(rs, config)) { |
| while (iter.hasNext()) { |
| VectorSchemaRoot root = iter.next(); |
| // Consume the root… |
| } |
| } |
| |
| The mapping from JDBC type to Arrow type can be overridden via the |
| ``JdbcToArrowConfig``, but it is not possible to customize the |
| conversion from JDBC value to Arrow value itself, nor is it possible |
| to define a conversion for an unsupported type. |
| |
| Type Mapping |
| ------------ |
| |
| The JDBC to Arrow type mapping can be obtained at runtime from |
| `JdbcToArrowUtils.getArrowTypeFromJdbcType`_. |
| |
| .. _JdbcToArrowUtils.getArrowTypeFromJdbcType: https://arrow.apache.org/docs/java/reference/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.html#getArrowTypeFromJdbcType-org.apache.arrow.adapter.jdbc.JdbcFieldInfo-java.util.Calendar- |
| |
| +--------------------+--------------------+-------+ |
| | JDBC Type | Arrow Type | Notes | |
| +====================+====================+=======+ |
| | ARRAY | List | \(1) | |
| +--------------------+--------------------+-------+ |
| | BIGINT | Int64 | | |
| +--------------------+--------------------+-------+ |
| | BINARY | Binary | | |
| +--------------------+--------------------+-------+ |
| | BIT | Bool | | |
| +--------------------+--------------------+-------+ |
| | BLOB | Binary | | |
| +--------------------+--------------------+-------+ |
| | BOOLEAN | Bool | | |
| +--------------------+--------------------+-------+ |
| | CHAR | Utf8 | | |
| +--------------------+--------------------+-------+ |
| | CLOB | Utf8 | | |
| +--------------------+--------------------+-------+ |
| | DATE | Date32 | | |
| +--------------------+--------------------+-------+ |
| | DECIMAL | Decimal128 | \(2) | |
| +--------------------+--------------------+-------+ |
| | DOUBLE | Double | | |
| +--------------------+--------------------+-------+ |
| | FLOAT | Float32 | | |
| +--------------------+--------------------+-------+ |
| | INTEGER | Int32 | | |
| +--------------------+--------------------+-------+ |
| | LONGVARBINARY | Binary | | |
| +--------------------+--------------------+-------+ |
| | LONGNVARCHAR | Utf8 | | |
| +--------------------+--------------------+-------+ |
| | LONGVARCHAR | Utf8 | | |
| +--------------------+--------------------+-------+ |
| | NCHAR | Utf8 | | |
| +--------------------+--------------------+-------+ |
| | NULL | Null | | |
| +--------------------+--------------------+-------+ |
| | NUMERIC | Decimal128 | | |
| +--------------------+--------------------+-------+ |
| | NVARCHAR | Utf8 | | |
| +--------------------+--------------------+-------+ |
| | REAL | Float32 | | |
| +--------------------+--------------------+-------+ |
| | SMALLINT | Int16 | | |
| +--------------------+--------------------+-------+ |
| | STRUCT | Struct | \(3) | |
| +--------------------+--------------------+-------+ |
| | TIME | Time32[ms] | | |
| +--------------------+--------------------+-------+ |
| | TIMESTAMP | Timestamp[ms] | \(4) | |
| +--------------------+--------------------+-------+ |
| | TINYINT | Int8 | | |
| +--------------------+--------------------+-------+ |
| | VARBINARY | Binary | | |
| +--------------------+--------------------+-------+ |
| | VARCHAR | Utf8 | | |
| +--------------------+--------------------+-------+ |
| |
| * \(1) The list value type must be explicitly configured and cannot be |
| inferred. Use `setArraySubTypeByColumnIndexMap`_ or |
| `setArraySubTypeByColumnNameMap`_. |
| * \(2) By default, the scale of decimal values must match the scale in |
| the type exactly; precision is allowed to be any value greater or |
| equal to the type precision. If there is a mismatch, by default, an |
| exception will be thrown. This can be configured by setting a |
| different RoundingMode with setBigDecimalRoundingMode. |
| * \(3) Not fully supported: while the type conversion is defined, the |
| value conversion is not. See ARROW-17006_. |
| * \(4) If a Calendar is provided, then the timestamp will have the |
| timezone of the calendar, else it will be a timestamp without |
| timezone. |
| |
| .. _setArraySubTypeByColumnIndexMap: https://arrow.apache.org/docs/java/reference/org/apache/arrow/adapter/jdbc/JdbcToArrowConfigBuilder.html#setArraySubTypeByColumnIndexMap-java.util.Map- |
| .. _setArraySubTypeByColumnNameMap: https://arrow.apache.org/docs/java/reference/org/apache/arrow/adapter/jdbc/JdbcToArrowConfigBuilder.html#setArraySubTypeByColumnNameMap-java.util.Map- |
| .. _ARROW-17006: https://issues.apache.org/jira/browse/ARROW-17006 |
| |
| VectorSchemaRoot to PreparedStatement Parameter Conversion |
| ========================================================== |
| |
| The adapter can bind rows of Arrow data from a VectorSchemaRoot to |
| parameters of a JDBC PreparedStatement. This can be accessed via the |
| JdbcParameterBinder class. Each call to next() will bind parameters |
| from the next row of data, and then the application can execute the |
| statement, call addBatch(), etc. as desired. Null values will lead to |
| a setNull call with an appropriate JDBC type code (listed below). |
| |
| .. code-block:: java |
| |
| final JdbcParameterBinder binder = |
| JdbcParameterBinder.builder(statement, root).bindAll().build(); |
| while (binder.next()) { |
| statement.executeUpdate(); |
| } |
| // Use a VectorLoader to update the root |
| binder.reset(); |
| while (binder.next()) { |
| statement.executeUpdate(); |
| } |
| |
| The mapping of vectors to parameters, the JDBC type code used by the |
| converters, and the type conversions themselves can all be customized: |
| |
| .. code-block:: java |
| |
| final JdbcParameterBinder binder = |
| JdbcParameterBinder.builder(statement, root) |
| .bind(/*parameterIndex*/2, /*columnIndex*/0) |
| .bind(/*parameterIndex*/1, customColumnBinderInstance) |
| .build(); |
| |
| Type Mapping |
| ------------ |
| |
| The Arrow to JDBC type mapping can be obtained at runtime via |
| a method on ColumnBinder. |
| |
| +----------------------------+----------------------------+-------+ |
| | Arrow Type | JDBC Type | Notes | |
| +============================+============================+=======+ |
| | Binary | VARBINARY (setBytes) | | |
| +----------------------------+----------------------------+-------+ |
| | Bool | BOOLEAN (setBoolean) | | |
| +----------------------------+----------------------------+-------+ |
| | Date32 | DATE (setDate) | | |
| +----------------------------+----------------------------+-------+ |
| | Date64 | DATE (setDate) | | |
| +----------------------------+----------------------------+-------+ |
| | Decimal128 | DECIMAL (setBigDecimal) | | |
| +----------------------------+----------------------------+-------+ |
| | Decimal256 | DECIMAL (setBigDecimal) | | |
| +----------------------------+----------------------------+-------+ |
| | FixedSizeBinary | BINARY (setBytes) | | |
| +----------------------------+----------------------------+-------+ |
| | Float32 | REAL (setFloat) | | |
| +----------------------------+----------------------------+-------+ |
| | Int8 | TINYINT (setByte) | | |
| +----------------------------+----------------------------+-------+ |
| | Int16 | SMALLINT (setShort) | | |
| +----------------------------+----------------------------+-------+ |
| | Int32 | INTEGER (setInt) | | |
| +----------------------------+----------------------------+-------+ |
| | Int64 | BIGINT (setLong) | | |
| +----------------------------+----------------------------+-------+ |
| | LargeBinary | LONGVARBINARY (setBytes) | | |
| +----------------------------+----------------------------+-------+ |
| | LargeUtf8 | LONGVARCHAR (setString) | \(1) | |
| +----------------------------+----------------------------+-------+ |
| | Time[s] | TIME (setTime) | | |
| +----------------------------+----------------------------+-------+ |
| | Time[ms] | TIME (setTime) | | |
| +----------------------------+----------------------------+-------+ |
| | Time[us] | TIME (setTime) | | |
| +----------------------------+----------------------------+-------+ |
| | Time[ns] | TIME (setTime) | | |
| +----------------------------+----------------------------+-------+ |
| | Timestamp[s] | TIMESTAMP (setTimestamp) | \(2) | |
| +----------------------------+----------------------------+-------+ |
| | Timestamp[ms] | TIMESTAMP (setTimestamp) | \(2) | |
| +----------------------------+----------------------------+-------+ |
| | Timestamp[us] | TIMESTAMP (setTimestamp) | \(2) | |
| +----------------------------+----------------------------+-------+ |
| | Timestamp[ns] | TIMESTAMP (setTimestamp) | \(2) | |
| +----------------------------+----------------------------+-------+ |
| | Utf8 | VARCHAR (setString) | | |
| +----------------------------+----------------------------+-------+ |
| |
| * \(1) Strings longer than Integer.MAX_VALUE bytes (the maximum length |
| of a Java ``byte[]``) will cause a runtime exception. |
| * \(2) If the timestamp has a timezone, the JDBC type defaults to |
| TIMESTAMP_WITH_TIMEZONE. If the timestamp has no timezone, |
| technically there is not a correct conversion from Arrow value to |
| JDBC value, because a JDBC Timestamp is in UTC, and we have no |
| timezone information. In this case, the default binder will call |
| `setTimestamp(int, Timestamp) |
| <https://docs.oracle.com/en/java/javase/11/docs/api/java.sql/java/sql/PreparedStatement.html#setTimestamp(int,java.sql.Timestamp)>`_, |
| which will lead to the driver using the "default timezone" (that of |
| the Java VM). |