docs/sql-data-sources-jdbc.md

layout: global title: JDBC To Other Databases displayTitle: JDBC To Other Databases license: | Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Table of contents {:toc}

Spark SQL also includes a data source that can read data from other databases using JDBC. This functionality should be preferred over using JdbcRDD. This is because the results are returned as a DataFrame and they can easily be processed in Spark SQL or joined with other data sources. The JDBC data source is also easier to use from Java or Python as it does not require the user to provide a ClassTag. (Note that this is different than the Spark SQL JDBC server, which allows other applications to run queries using Spark SQL).

To get started you will need to include the JDBC driver for your particular database on the spark classpath. For example, to connect to postgres from the Spark Shell you would run the following command:

{% highlight bash %} ./bin/spark-shell --driver-class-path postgresql-9.4.1207.jar --jars postgresql-9.4.1207.jar {% endhighlight %}

Data Source Option

Spark supports the following case-insensitive options for JDBC. The Data source options of JDBC can be set via:

the .option/.options methods of
- DataFrameReader
- DataFrameWriter
OPTIONS clause at CREATE TABLE USING DATA_SOURCE

For connection properties, users can specify the JDBC connection properties in the data source options. user and password are normally provided as connection properties for logging into the data sources.

Note that kerberos authentication with keytab is not always supported by the JDBC driver.
Before using keytab and principal configuration options, please make sure the following requirements are met:

The included JDBC driver version supports kerberos authentication with keytab.
There is a built-in connection provider which supports the used database.

There is a built-in connection providers for the following databases:

DB2
MariaDB
MS Sql
Oracle
PostgreSQL

If the requirements are not met, please consider using the JdbcConnectionProvider developer API to handle custom authentication.

{% highlight sql %}

CREATE TEMPORARY VIEW jdbcTable USING org.apache.spark.sql.jdbc OPTIONS ( url “jdbc:postgresql:dbserver”, dbtable “schema.tablename”, user ‘username’, password ‘password’ )

INSERT INTO TABLE jdbcTable SELECT * FROM resultTable {% endhighlight %}

Data Type Mapping

Mapping Spark SQL Data Types from MySQL

The below table describes the data type conversions from MySQL data types to Spark SQL Data Types, when reading data from a MySQL table using the built-in jdbc data source with the MySQL Connector/J as the activated JDBC Driver. Note that, different JDBC drivers, such as Maria Connector/J, which are also available to connect MySQL, may have different mapping rules.

Mapping Spark SQL Data Types to MySQL

The below table describes the data type conversions from Spark SQL Data Types to MySQL data types, when creating, altering, or writing data to a MySQL table using the built-in jdbc data source with the MySQL Connector/J as the activated JDBC Driver.

Note that, different JDBC drivers, such as Maria Connector/J, which are also available to connect MySQL, may have different mapping rules.

The Spark Catalyst data types below are not supported with suitable MYSQL types.

DayTimeIntervalType
YearMonthIntervalType
CalendarIntervalType
ArrayType
MapType
StructType
UserDefinedType
NullType
ObjectType
VariantType

Mapping Spark SQL Data Types from PostgreSQL

The below table describes the data type conversions from PostgreSQL data types to Spark SQL Data Types, when reading data from a Postgres table using the built-in jdbc data source with the PostgreSQL JDBC Driver as the activated JDBC Driver. Note that, different JDBC drivers, or different versions might result slightly different.

Mapping Spark SQL Data Types to PostgreSQL

The below table describes the data type conversions from Spark SQL Data Types to PostgreSQL data types, when creating, altering, or writing data to a PostgreSQL table using the built-in jdbc data source with the PostgreSQL JDBC Driver as the activated JDBC Driver.

The Spark Catalyst data types below are not supported with suitable PostgreSQL types.

DayTimeIntervalType
YearMonthIntervalType
CalendarIntervalType
ArrayType - if the element type is not listed above
MapType
StructType
UserDefinedType
NullType
ObjectType
VariantType

Mapping Spark SQL Data Types from Oracle

The below table describes the data type conversions from Oracle data types to Spark SQL Data Types, when reading data from an Oracle table using the built-in jdbc data source with the Oracle JDBC as the activated JDBC Driver.

Mapping Spark SQL Data Types to Oracle

The below table describes the data type conversions from Spark SQL Data Types to Oracle data types, when creating, altering, or writing data to an Oracle table using the built-in jdbc data source with the Oracle JDBC as the activated JDBC Driver.

The Spark Catalyst data types below are not supported with suitable Oracle types.

DayTimeIntervalType
YearMonthIntervalType
CalendarIntervalType
ArrayType
MapType
StructType
UserDefinedType
NullType
ObjectType
VariantType

Mapping Spark SQL Data Types from Microsoft SQL Server

The below table describes the data type conversions from Microsoft SQL Server data types to Spark SQL Data Types, when reading data from a Microsoft SQL Server table using the built-in jdbc data source with the mssql-jdbc as the activated JDBC Driver.

Mapping Spark SQL Data Types to Microsoft SQL Server

The below table describes the data type conversions from Spark SQL Data Types to Microsoft SQL Server data types, when creating, altering, or writing data to a Microsoft SQL Server table using the built-in jdbc data source with the mssql-jdbc as the activated JDBC Driver.

The Spark Catalyst data types below are not supported with suitable SQL Server types.

DayTimeIntervalType
YearMonthIntervalType
CalendarIntervalType
ArrayType
MapType
StructType
UserDefinedType
NullType
ObjectType
VariantType