Apache Drill supports querying a number of RDBMS instances. This allows you to connect your traditional databases to your Drill cluster so you can have a single view of both your relational and NoSQL datasources in a single system.
As with any source, Drill supports joins within and between all systems. Drill additionally has powerful pushdown capabilities with RDBMS sources. This includes support to push down join, where, group by, intersect and other SQL operations into a particular RDBMS source (as appropriate).
Drill is designed to work with any relational datastore that provides a JDBC driver. Drill is actively tested with PostgreSQL, MySQL, Oracle, MSSQL, Apache Derby and H2. For each system, you will follow three basic steps for setup:
jars/3rdparty
directory. (You’ll need to do this on every node.)Introduced in release: 1.18
A JDBC storage plugin configuration property sourceParameters
was introduced to allow setting data source parameters supported by HikariCP. Parameters names with incorrect naming and parameter values which are of incorrect data type or illegal will cause the storage plugin to fail to start. See the Example of PostgreSQL Configuration with sourceParameters
configuration property section for the example of usage.
From release 1.20 onwards, the default sourceParameters
set on the JDBC storage plugin are set to make the initiation of outbound connections lazy rather than eager, where eager means at the time of storage config enablement. Additionally, the JDBC connection pool is now allowed by default to shrink back down to empty if all of its connections are idle. These new settings are visible in the rdbms
storage config template and may be overridden to reinstate Drill's earlier behaviour.
It is also possible to set JDBC driver connection properties (which are distinct from HikariCP's own properties) using sourceParameters
by prepending the property name with dataSource.
. For example, to set the connection property S3OutputLocation
, which is supported by the AWS Athena JDBC driver, add a property named dataSource.S3OutputLocation
to sourceParameters
. Refer to the HikariCP docs for more detail.
Drill communicates with MySQL through the JDBC driver using the configuration that you specify in the Web UI or through the REST API.
{% include startnote.html %}Verify that MySQL is running and the MySQL driver is in place before you configure the JDBC storage plugin.{% include endnote.html %}
To configure the JDBC storage plugin:
myplugin
. Each configuration registered with Drill must have a distinct name. Names are case-sensitive.{ "type": "jdbc", "driver": "com.mysql.jdbc.Driver", "url": "jdbc:mysql://localhost:3306", "username": "root", "password": "mypassword", "enabled": true }
{% include startnote.html %}The JDBC URL may differ depending on your installation and configuration. See the example configurations below for examples.{% include endnote.html %}
You can use the performance_schema database, which is installed with MySQL to query your MySQL performance_schema database. Include the names of the storage plugin configuration, the database, and table in dot notation the FROM clause as follows:
0: jdbc:drill:zk=local> select * from myplugin.performance_schema.accounts; |--------|------------|----------------------|--------------------| | USER | HOST | CURRENT_CONNECTIONS | TOTAL_CONNECTIONS | |--------|------------|----------------------|--------------------| | null | null | 18 | 20 | | jdoe | localhost | 0 | 813 | | root | localhost | 3 | 5 | |--------|------------|----------------------|--------------------| 3 rows selected (0.171 seconds)
Download and install the official ClickHouse JDBC driver on all of the nodes in your cluster.
{ "type": "jdbc", "enabled": true, "driver": "ru.yandex.clickhouse.ClickHouseDriver", "url": "jdbc:clickhouse://1.2.3.4:8123.default", "username": "user", "password": "password" }
For MySQL, Drill has been tested with MySQL's mysql-connector-java-5.1.37-bin.jar driver. Copy this to all nodes.
{ "type": "jdbc", "enabled": true, "driver": "com.mysql.jdbc.Driver", "url": "jdbc:mysql://1.2.3.4", "username": "user", "password": "password" }
Download and install Oracle's Thin ojdbc7.12.1.0.2.jar driver and copy it to all nodes in your cluster.
{ "type": "jdbc", "enabled": true, "driver:" "oracle.jdbc.OracleDriver", "url:" "jdbc:oracle:thin:user/password@1.2.3.4:1521/ORCL" }
Drill is tested with the PostgreSQL driver version 42.2.11 (any recent driver should work). Download and copy this driver jar to the jars/3rdparty
folder on all nodes.
{% include startnote.html %}You'll need to provide a database name as part of your JDBC connection string for Drill to correctly expose PostgreSQL tables.{% include endnote.html %}
{ "type": "jdbc", "enabled": true, "driver": "org.postgresql.Driver", "url": "jdbc:postgresql://1.2.3.4/mydatabase", "username": "user", "password": "password" }
You may need to qualify a table name with a schema name for Drill to return data. For example, when querying a table named ips, you must issue the query against public.ips, as shown in the following example:
0: jdbc:drill:zk=local> use pgdb; |------|----------------------------------| | ok | summary | |------|----------------------------------| | true | Default schema changed to [pgdb] | |------|----------------------------------| 0: jdbc:drill:zk=local> show tables; |--------------|--------------| | TABLE_SCHEMA | TABLE_NAME | |--------------|--------------| | pgdb.test | ips | | pgdb.test | pg_aggregate | | pgdb.test | pg_am | |--------------|--------------| 0: jdbc:drill:zk=local> select * from public.ips; |------|---------| | ipid | ipv4dot | |------|---------| | 1 | 1.2.3.4 | | 2 | 1.2.3.5 | |------|---------|
sourceParameters
configuration property{ "type": "jdbc", "enabled": true, "driver": "org.postgresql.Driver", "url": "jdbc:postgresql://1.2.3.4/mydatabase?defaultRowFetchSize=2", "username": "user", "password": "password", "sourceParameters": { "minimumIdle": 0, "autoCommit": true, "connectionTestQuery": "select version() as postgresql_version", "dataSource.cachePrepStmts": true, "dataSource.prepStmtCacheSize": 250 } }
For SQL Server, Drill has been tested with Microsoft's sqljdbc41.4.2.6420.100.jar driver. Copy this jar file to all Drillbits.
{% include startnote.html %}You'll need to provide a database name as part of your JDBC connection string for Drill to correctly expose MSSQL schemas.{% include endnote.html %}
{ "type": "jdbc", "enabled": true, "driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver", "url": "jdbc:sqlserver://1.2.3.4:1433;databaseName=mydatabase", "username": "user", "password": "password" }