layout: global title: JDBC To Other Databases displayTitle: JDBC To Other Databases license: | Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Spark SQL also includes a data source that can read data from other databases using JDBC. This functionality should be preferred over using JdbcRDD. This is because the results are returned as a DataFrame and they can easily be processed in Spark SQL or joined with other data sources. The JDBC data source is also easier to use from Java or Python as it does not require the user to provide a ClassTag. (Note that this is different than the Spark SQL JDBC server, which allows other applications to run queries using Spark SQL).
To get started you will need to include the JDBC driver for your particular database on the spark classpath. For example, to connect to postgres from the Spark Shell you would run the following command:
{% highlight bash %} ./bin/spark-shell --driver-class-path postgresql-9.4.1207.jar --jars postgresql-9.4.1207.jar {% endhighlight %}
Spark supports the following case-insensitive options for JDBC. The Data source options of JDBC can be set via:
.option
/.options
methods ofDataFrameReader
DataFrameWriter
OPTIONS
clause at CREATE TABLE USING DATA_SOURCEFor connection properties, users can specify the JDBC connection properties in the data source options. user and password are normally provided as connection properties for logging into the data sources.
Note that kerberos authentication with keytab is not always supported by the JDBC driver.
Before using keytab and principal configuration options, please make sure the following requirements are met:
There is a built-in connection providers for the following databases:
If the requirements are not met, please consider using the JdbcConnectionProvider developer API to handle custom authentication.
{% highlight sql %}
CREATE TEMPORARY VIEW jdbcTable USING org.apache.spark.sql.jdbc OPTIONS ( url “jdbc:postgresql:dbserver”, dbtable “schema.tablename”, user ‘username’, password ‘password’ )
INSERT INTO TABLE jdbcTable SELECT * FROM resultTable {% endhighlight %}
The below table describes the data type conversions from MySQL data types to Spark SQL Data Types, when reading data from a MySQL table using the built-in jdbc data source with the MySQL Connector/J as the activated JDBC Driver. Note that, different JDBC drivers, such as Maria Connector/J, which are also available to connect MySQL, may have different mapping rules.
The below table describes the data type conversions from Spark SQL Data Types to MySQL data types, when creating, altering, or writing data to a MySQL table using the built-in jdbc data source with the MySQL Connector/J as the activated JDBC Driver.
Note that, different JDBC drivers, such as Maria Connector/J, which are also available to connect MySQL, may have different mapping rules.
The Spark Catalyst data types below are not supported with suitable MYSQL types.
The below table describes the data type conversions from PostgreSQL data types to Spark SQL Data Types, when reading data from a Postgres table using the built-in jdbc data source with the PostgreSQL JDBC Driver as the activated JDBC Driver. Note that, different JDBC drivers, or different versions might result slightly different.
The below table describes the data type conversions from Spark SQL Data Types to PostgreSQL data types, when creating, altering, or writing data to a PostgreSQL table using the built-in jdbc data source with the PostgreSQL JDBC Driver as the activated JDBC Driver.
The Spark Catalyst data types below are not supported with suitable PostgreSQL types.
The below table describes the data type conversions from Oracle data types to Spark SQL Data Types, when reading data from an Oracle table using the built-in jdbc data source with the Oracle JDBC as the activated JDBC Driver.
The below table describes the data type conversions from Spark SQL Data Types to Oracle data types, when creating, altering, or writing data to an Oracle table using the built-in jdbc data source with the Oracle JDBC as the activated JDBC Driver.
The Spark Catalyst data types below are not supported with suitable Oracle types.
The below table describes the data type conversions from Microsoft SQL Server data types to Spark SQL Data Types, when reading data from a Microsoft SQL Server table using the built-in jdbc data source with the mssql-jdbc as the activated JDBC Driver.
The below table describes the data type conversions from Spark SQL Data Types to Microsoft SQL Server data types, when creating, altering, or writing data to a Microsoft SQL Server table using the built-in jdbc data source with the mssql-jdbc as the activated JDBC Driver.
The Spark Catalyst data types below are not supported with suitable SQL Server types.