| <?xml version="1.0" encoding="UTF-8"?><!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> |
| <concept id="impala_jdbc"> |
| |
| <title id="jdbc">Configuring Impala to Work with JDBC</title> |
| <prolog> |
| <metadata> |
| <data name="Category" value="Impala"/> |
| <data name="Category" value="JDBC"/> |
| <data name="Category" value="Java"/> |
| <data name="Category" value="SQL"/> |
| <data name="Category" value="Querying"/> |
| <data name="Category" value="Configuring"/> |
| <data name="Category" value="Starting and Stopping"/> |
| <data name="Category" value="Developers"/> |
| </metadata> |
| </prolog> |
| |
| <conbody> |
| |
| <p> |
| <indexterm audience="hidden">JDBC</indexterm> |
| Impala supports the standard JDBC interface, allowing access from commercial Business Intelligence tools and |
| custom software written in Java or other programming languages. The JDBC driver allows you to access Impala |
| from a Java program that you write, or a Business Intelligence or similar tool that uses JDBC to communicate |
| with various database products. |
| </p> |
| |
| <p> |
| Setting up a JDBC connection to Impala involves the following steps: |
| </p> |
| |
| <ul> |
| <li> |
| Verifying the communication port where the Impala daemons in your cluster are listening for incoming JDBC |
| requests. |
| </li> |
| |
| <li> |
| Installing the JDBC driver on every system that runs the JDBC-enabled application. |
| </li> |
| |
| <li> |
| Specifying a connection string for the JDBC application to access one of the servers running the |
| <cmdname>impalad</cmdname> daemon, with the appropriate security settings. |
| </li> |
| </ul> |
| |
| <p outputclass="toc inpage"/> |
| </conbody> |
| |
| <concept id="jdbc_port"> |
| |
| <title>Configuring the JDBC Port</title> |
| |
| <conbody> |
| |
| <p> |
| The default port used by JDBC 2.0 and later (as well as ODBC 2.x) is 21050. Impala server accepts JDBC |
| connections through this same port 21050 by default. Make sure this port is available for communication |
| with other hosts on your network, for example, that it is not blocked by firewall software. If your JDBC |
| client software connects to a different port, specify that alternative port number with the |
| <codeph>--hs2_port</codeph> option when starting <codeph>impalad</codeph>. See |
| <xref href="impala_processes.xml#processes"/> for details about Impala startup options. See |
| <xref href="impala_ports.xml#ports"/> for information about all ports used for communication between Impala |
| and clients or between Impala components. |
| </p> |
| </conbody> |
| </concept> |
| |
| <concept id="jdbc_driver_choice"> |
| |
| <title>Choosing the JDBC Driver</title> |
| <prolog> |
| <metadata> |
| <data name="Category" value="Planning"/> |
| </metadata> |
| </prolog> |
| |
| <conbody> |
| |
| <p> |
| In Impala 2.0 and later, you can use the Hive 0.13 JDBC driver. If you are |
| already using JDBC applications with an earlier Impala release, you should update |
| your JDBC driver, because the Hive 0.12 driver that was formerly the only choice |
| is not compatible with Impala 2.0 and later. |
| </p> |
| |
| <p> |
| The Hive JDBC driver provides a substantial speed increase for JDBC |
| applications with Impala 2.0 and higher, for queries that return large result sets. |
| </p> |
| |
| <p conref="../shared/impala_common.xml#common/complex_types_blurb"/> |
| |
| <p conref="../shared/impala_common.xml#common/jdbc_odbc_complex_types"/> |
| <p conref="../shared/impala_common.xml#common/jdbc_odbc_complex_types_views"/> |
| |
| </conbody> |
| </concept> |
| |
| <concept id="jdbc_setup"> |
| |
| <title>Enabling Impala JDBC Support on Client Systems</title> |
| <prolog> |
| <metadata> |
| <data name="Category" value="Installing"/> |
| </metadata> |
| </prolog> |
| |
| <conbody> |
| |
| <section id="install_hive_driver"> |
| <title>Using the Hive JDBC Driver</title> |
| <p> |
| You install the Hive JDBC driver (<codeph>hive-jdbc</codeph> package) through the Linux package manager, on |
| hosts within the cluster. The driver consists of several Java JAR files. The same driver can be used by Impala and Hive. |
| </p> |
| |
| <p> |
| To get the JAR files, install the Hive JDBC driver on each host in the cluster that will run |
| JDBC applications. <!-- TODO: Find a URL to point to for instructions and downloads --> |
| </p> |
| |
| <note> |
| The latest JDBC driver, corresponding to Hive 0.13, provides substantial performance improvements for |
| Impala queries that return large result sets. Impala 2.0 and later are compatible with the Hive 0.13 |
| driver. If you already have an older JDBC driver installed, and are running Impala 2.0 or higher, consider |
| upgrading to the latest Hive JDBC driver for best performance with JDBC applications. |
| </note> |
| |
| <p> |
| If you are using JDBC-enabled applications on hosts outside the cluster, you cannot use the the same install |
| procedure on the hosts. Install the JDBC driver on at least one cluster host using the preceding |
| procedure. Then download the JAR files to each client machine that will use JDBC with Impala: |
| </p> |
| |
| <codeblock>commons-logging-X.X.X.jar |
| hadoop-common.jar |
| hive-common-X.XX.X.jar |
| hive-jdbc-X.XX.X.jar |
| hive-metastore-X.XX.X.jar |
| hive-service-X.XX.X.jar |
| httpclient-X.X.X.jar |
| httpcore-X.X.X.jar |
| libfb303-X.X.X.jar |
| libthrift-X.X.X.jar |
| log4j-X.X.XX.jar |
| slf4j-api-X.X.X.jar |
| slf4j-logXjXX-X.X.X.jar |
| </codeblock> |
| |
| <p> |
| <b>To enable JDBC support for Impala on the system where you run the JDBC application:</b> |
| </p> |
| |
| <ol> |
| <li> |
| Download the JAR files listed above to each client machine. |
| <note> |
| For Maven users, see |
| <xref keyref="Impala-JDBC-Example">this sample github page</xref> for an example of the |
| dependencies you could add to a <codeph>pom</codeph> file instead of downloading the individual JARs. |
| </note> |
| </li> |
| |
| <li> |
| Store the JAR files in a location of your choosing, ideally a directory already referenced in your |
| <codeph>CLASSPATH</codeph> setting. For example: |
| <ul> |
| <li> |
| On Linux, you might use a location such as <codeph>/opt/jars/</codeph>. |
| </li> |
| |
| <li> |
| On Windows, you might use a subdirectory underneath <filepath>C:\Program Files</filepath>. |
| </li> |
| </ul> |
| </li> |
| |
| <li> |
| To successfully load the Impala JDBC driver, client programs must be able to locate the associated JAR |
| files. This often means setting the <codeph>CLASSPATH</codeph> for the client process to include the |
| JARs. Consult the documentation for your JDBC client for more details on how to install new JDBC drivers, |
| but some examples of how to set <codeph>CLASSPATH</codeph> variables include: |
| <ul> |
| <li> |
| On Linux, if you extracted the JARs to <codeph>/opt/jars/</codeph>, you might issue the following |
| command to prepend the JAR files path to an existing classpath: |
| <codeblock>export CLASSPATH=/opt/jars/*.jar:$CLASSPATH</codeblock> |
| </li> |
| |
| <li> |
| On Windows, use the <b>System Properties</b> control panel item to modify the <b>Environment |
| Variables</b> for your system. Modify the environment variables to include the path to which you |
| extracted the files. |
| <note> |
| If the existing <codeph>CLASSPATH</codeph> on your client machine refers to some older version of |
| the Hive JARs, ensure that the new JARs are the first ones listed. Either put the new JAR files |
| earlier in the listings, or delete the other references to Hive JAR files. |
| </note> |
| </li> |
| </ul> |
| </li> |
| </ol> |
| </section> |
| |
| </conbody> |
| </concept> |
| |
| <concept id="jdbc_connect"> |
| |
| <title>Establishing JDBC Connections</title> |
| |
| <conbody> |
| |
| <p> |
| The JDBC driver class depends on which driver you select. |
| </p> |
| |
| <note conref="../shared/impala_common.xml#common/proxy_jdbc_caveat"/> |
| |
| <section id="class_hive_driver"> |
| <title>Using the Hive JDBC Driver</title> |
| |
| <p> |
| For example, with the Hive JDBC driver, the class name is <codeph>org.apache.hive.jdbc.HiveDriver</codeph>. |
| Once you have configured Impala to work with JDBC, you can establish connections between the two. |
| To do so for a cluster that does not use |
| Kerberos authentication, use a connection string of the form |
| <codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/;auth=noSasl</codeph>. |
| <!-- |
| Include the <codeph>auth=noSasl</codeph> argument |
| only when connecting to a non-Kerberos cluster; if Kerberos is enabled, omit the <codeph>auth</codeph> argument. |
| --> |
| For example, you might use: |
| </p> |
| |
| <codeblock>jdbc:hive2://myhost.example.com:21050/;auth=noSasl</codeblock> |
| |
| <p> |
| To connect to an instance of Impala that requires Kerberos authentication, use a connection string of the |
| form |
| <codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/;principal=<varname>principal_name</varname></codeph>. |
| The principal must be the same user principal you used when starting Impala. For example, you might use: |
| </p> |
| |
| <codeblock>jdbc:hive2://myhost.example.com:21050/;principal=impala/myhost.example.com@H2.EXAMPLE.COM</codeblock> |
| |
| <p> |
| To connect to an instance of Impala that requires LDAP authentication, use a connection string of the form |
| <codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/<varname>db_name</varname>;user=<varname>ldap_userid</varname>;password=<varname>ldap_password</varname></codeph>. |
| For example, you might use: |
| </p> |
| |
| <codeblock>jdbc:hive2://myhost.example.com:21050/test_db;user=fred;password=xyz123</codeblock> |
| |
| <note> |
| <p conref="../shared/impala_common.xml#common/hive_jdbc_ssl_kerberos_caveat"/> |
| </note> |
| |
| </section> |
| |
| </conbody> |
| </concept> |
| |
| <concept rev="2.3.0" id="jdbc_odbc_notes"> |
| <title>Notes about JDBC and ODBC Interaction with Impala SQL Features</title> |
| <conbody> |
| <p> |
| Most Impala SQL features work equivalently through the <cmdname>impala-shell</cmdname> interpreter |
| of the JDBC or ODBC APIs. The following are some exceptions to keep in mind when switching between |
| the interactive shell and applications using the APIs: |
| </p> |
| <ul> |
| <li> |
| <p conref="../shared/impala_common.xml#common/complex_types_blurb"/> |
| <ul> |
| <li> |
| <p> |
| Queries involving the complex types (<codeph>ARRAY</codeph>, <codeph>STRUCT</codeph>, and <codeph>MAP</codeph>) |
| require notation that might not be available in all levels of JDBC and ODBC drivers. |
| If you have trouble querying such a table due to the driver level or |
| inability to edit the queries used by the application, you can create a view that exposes |
| a <q>flattened</q> version of the complex columns and point the application at the view. |
| See <xref href="impala_complex_types.xml#complex_types"/> for details. |
| </p> |
| </li> |
| <li> |
| <p> |
| The complex types available in <keyword keyref="impala23_full"/> and higher are supported by the |
| JDBC <codeph>getColumns()</codeph> API. |
| Both <codeph>MAP</codeph> and <codeph>ARRAY</codeph> are reported as the JDBC SQL Type <codeph>ARRAY</codeph>, |
| because this is the closest matching Java SQL type. This behavior is consistent with Hive. |
| <codeph>STRUCT</codeph> types are reported as the JDBC SQL Type <codeph>STRUCT</codeph>. |
| </p> |
| <p> |
| To be consistent with Hive's behavior, the TYPE_NAME field is populated |
| with the primitive type name for scalar types, and with the full <codeph>toSql()</codeph> |
| for complex types. The resulting type names are somewhat inconsistent, |
| because nested types are printed differently than top-level types. For example, |
| the following list shows how <codeph>toSQL()</codeph> for Impala types are |
| translated to <codeph>TYPE_NAME</codeph> values: |
| <codeblock><![CDATA[DECIMAL(10,10) becomes DECIMAL |
| CHAR(10) becomes CHAR |
| VARCHAR(10) becomes VARCHAR |
| ARRAY<DECIMAL(10,10)> becomes ARRAY<DECIMAL(10,10)> |
| ARRAY<CHAR(10)> becomes ARRAY<CHAR(10)> |
| ARRAY<VARCHAR(10)> becomes ARRAY<VARCHAR(10)> |
| ]]> |
| </codeblock> |
| </p> |
| </li> |
| </ul> |
| </li> |
| </ul> |
| </conbody> |
| </concept> |
| |
| </concept> |