| <?xml version="1.0" encoding="UTF-8"?> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> |
| <concept id="impala_jdbc"> |
| <title id="jdbc">Configuring Impala to Work with JDBC</title> |
| <prolog> |
| <metadata> |
| <data name="Category" value="Impala"/> |
| <data name="Category" value="JDBC"/> |
| <data name="Category" value="Java"/> |
| <data name="Category" value="SQL"/> |
| <data name="Category" value="Querying"/> |
| <data name="Category" value="Configuring"/> |
| <data name="Category" value="Starting and Stopping"/> |
| <data name="Category" value="Developers"/> |
| </metadata> |
| </prolog> |
| <conbody> |
| <p> Impala supports the standard JDBC interface, allowing access from |
| commercial Business Intelligence tools and custom software written in Java |
| or other programming languages. The JDBC driver allows you to access |
| Impala from a Java program that you write, or a Business Intelligence or |
| similar tool that uses JDBC to communicate with various database products. </p> |
| <p> Setting up a JDBC connection to Impala involves the following steps: </p> |
| <ul> |
| <li> Verifying the communication port where the Impala daemons in your |
| cluster are listening for incoming JDBC requests. </li> |
| <li> Installing the JDBC driver on every system that runs the JDBC-enabled |
| application. </li> |
| <li> Specifying a connection string for the JDBC application to access one |
| of the servers running the <cmdname>impalad</cmdname> daemon, with the |
| appropriate security settings. </li> |
| </ul> |
| <p outputclass="toc inpage"/> |
| </conbody> |
| <concept id="jdbc_port"> |
| <title>Configuring the JDBC Port</title> |
| <conbody> |
| <p> The following are the default ports that Impala server accepts JDBC |
| connections through: <simpletable frame="all" |
| relcolwidth="1.0* 1.03* 2.38*" id="simpletable_tr2_gnt_43b"> |
| <strow> |
| <stentry><b>Protocol</b> |
| </stentry> |
| <stentry><b>Default Port</b> |
| </stentry> |
| <stentry><b>Flag to Specify an Alternate Port</b> |
| </stentry> |
| </strow> |
| <strow> |
| <stentry>HTTP</stentry> |
| <stentry>28000</stentry> |
| <stentry><codeph>‑‑hs2_http_port</codeph> |
| </stentry> |
| </strow> |
| <strow> |
| <stentry>Binary TCP</stentry> |
| <stentry>21050</stentry> |
| <stentry><codeph>‑‑hs2_port</codeph> |
| </stentry> |
| </strow> |
| </simpletable> |
| </p> |
| <p> Make sure the port for the protocol you are using is available for |
| communication with clients, for example, that it is not blocked by |
| firewall software. </p> |
| <p> If your JDBC client software connects to a different port, specify |
| that alternative port number with the flag in the above table when |
| starting the <codeph>impalad</codeph>. </p> |
| </conbody> |
| </concept> |
| <concept id="jdbc_driver_choice"> |
| <title>Choosing the JDBC Driver</title> |
| <prolog> |
| <metadata> |
| <data name="Category" value="Planning"/> |
| </metadata> |
| </prolog> |
| <conbody> |
| <p> In Impala 2.0 and later, you can use the Hive 0.13 or higher JDBC |
| driver. If you are already using JDBC applications with an earlier |
| Impala release, you should update your JDBC driver, because the Hive |
| 0.12 driver that was formerly the only choice is not compatible with |
| Impala 2.0 and later. </p> |
| <p> The Hive JDBC driver provides a substantial speed increase for JDBC |
| applications with Impala 2.0 and higher, for queries that return large |
| result sets. </p> |
| </conbody> |
| </concept> |
| <concept id="jdbc_setup"> |
| <title>Enabling Impala JDBC Support on Client Systems</title> |
| <prolog> |
| <metadata> |
| <data name="Category" value="Installing"/> |
| </metadata> |
| </prolog> |
| <conbody> |
| <section id="install_hive_driver"> |
| <title>Using the Hive JDBC Driver</title> |
| <p> You install the Hive JDBC driver (<codeph>hive-jdbc</codeph> |
| package) through the Linux package manager, on hosts within the |
| cluster. The driver consists of several JAR files. The same driver can |
| be used by Impala and Hive. </p> |
| <p> To get the JAR files, install the Hive JDBC driver on each host in |
| the cluster that will run JDBC applications. </p> |
| <note> The latest JDBC driver, corresponding to Hive 0.13, provides |
| substantial performance improvements for Impala queries that return |
| large result sets. Impala 2.0 and later are compatible with the Hive |
| 0.13 driver. If you already have an older JDBC driver installed, and |
| are running Impala 2.0 or higher, consider upgrading to the latest |
| Hive JDBC driver for best performance with JDBC applications. </note> |
| <p> If you are using JDBC-enabled applications on hosts outside the |
| cluster, you cannot use the the same install procedure on the hosts. |
| Install the JDBC driver on at least one cluster host using the |
| preceding procedure. Then download the JAR files to each client |
| machine that will use JDBC with Impala: </p> |
| <codeblock>commons-logging-X.X.X.jar |
| hadoop-common.jar |
| hive-common-X.XX.X.jar |
| hive-jdbc-X.XX.X.jar |
| hive-metastore-X.XX.X.jar |
| hive-service-X.XX.X.jar |
| httpclient-X.X.X.jar |
| httpcore-X.X.X.jar |
| libfb303-X.X.X.jar |
| libthrift-X.X.X.jar |
| log4j-X.X.XX.jar |
| slf4j-api-X.X.X.jar |
| slf4j-logXjXX-X.X.X.jar |
| </codeblock> |
| <p> |
| <b>To enable JDBC support for Impala on the system where you run the |
| JDBC application:</b> |
| </p> |
| <ol> |
| <li> Download the JAR files listed above to each client machine. |
| <note> For Maven users, see <xref keyref="Impala-JDBC-Example" |
| >this sample github page</xref> for an example of the |
| dependencies you could add to a <codeph>pom</codeph> file instead |
| of downloading the individual JARs. </note> |
| </li> |
| <li> Store the JAR files in a location of your choosing, ideally a |
| directory already referenced in your <codeph>CLASSPATH</codeph> |
| setting. For example: <ul> |
| <li> On Linux, you might use a location such as |
| <codeph>/opt/jars/</codeph>. </li> |
| <li> On Windows, you might use a subdirectory underneath |
| <filepath>C:\Program Files</filepath>. </li> |
| </ul> |
| </li> |
| <li> To successfully load the Impala JDBC driver, client programs must |
| be able to locate the associated JAR files. This often means setting |
| the <codeph>CLASSPATH</codeph> for the client process to include the |
| JARs. Consult the documentation for your JDBC client for more |
| details on how to install new JDBC drivers, but some examples of how |
| to set <codeph>CLASSPATH</codeph> variables include: <ul> |
| <li> On Linux, if you extracted the JARs to |
| <codeph>/opt/jars/</codeph>, you might issue the following |
| command to prepend the JAR files path to an existing classpath: |
| <codeblock>export CLASSPATH=/opt/jars/*.jar:$CLASSPATH</codeblock> |
| </li> |
| <li> On Windows, use the <b>System Properties</b> control panel |
| item to modify the <b>Environment Variables</b> for your system. |
| Modify the environment variables to include the path to which |
| you extracted the files. <note> If the existing |
| <codeph>CLASSPATH</codeph> on your client machine refers to |
| some older version of the Hive JARs, ensure that the new JARs |
| are the first ones listed. Either put the new JAR files |
| earlier in the listings, or delete the other references to |
| Hive JAR files. </note> |
| </li> |
| </ul> |
| </li> |
| </ol> |
| </section> |
| </conbody> |
| </concept> |
| <concept id="jdbc_connect"> |
| <title>Establishing JDBC Connections</title> |
| <conbody> |
| <p> The JDBC driver class depends on which driver you select. </p> |
| <note conref="../shared/impala_common.xml#common/proxy_jdbc_caveat"/> |
| <section id="class_hive_driver"> |
| <title>Using the Hive JDBC Driver</title> |
| <p> For example, with the Hive JDBC driver, the class name is |
| <codeph>org.apache.hive.jdbc.HiveDriver</codeph>. Once you have |
| configured Impala to work with JDBC, you can establish connections |
| between the two. To do so for a cluster that does not use Kerberos |
| authentication, use a connection string of the form |
| <codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/;auth=noSasl</codeph>. |
| <!-- |
| Include the <codeph>auth=noSasl</codeph> argument |
| only when connecting to a non-Kerberos cluster; if Kerberos is enabled, omit the <codeph>auth</codeph> argument. |
| --> |
| For example, you might use: </p> |
| <codeblock>jdbc:hive2://myhost.example.com:21050/;auth=noSasl</codeblock> |
| <p> To connect to an instance of Impala that requires Kerberos |
| authentication, use a connection string of the form |
| <codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/;principal=<varname>principal_name</varname></codeph>. |
| The principal must be the same user principal you used when starting |
| Impala. For example, you might use: </p> |
| <codeblock>jdbc:hive2://myhost.example.com:21050/;principal=impala/myhost.example.com@H2.EXAMPLE.COM</codeblock> |
| <p> To connect to an instance of Impala that requires LDAP |
| authentication, use a connection string of the form |
| <codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/<varname>db_name</varname>;user=<varname>ldap_userid</varname>;password=<varname>ldap_password</varname></codeph>. |
| For example, you might use: </p> |
| <codeblock>jdbc:hive2://myhost.example.com:21050/test_db;user=fred;password=xyz123</codeblock> |
| <p> To connect to an instance of Impala over HTTP, specify the HTTP |
| port, 28000 by default, and <codeph>transportMode=http</codeph> in the |
| connection string. For example: |
| <codeblock>jdbc:hive2://myhost.example.com:28000/;transportMode=http</codeblock> |
| </p> |
| <note> |
| <p |
| conref="../shared/impala_common.xml#common/hive_jdbc_ssl_kerberos_caveat" |
| /> |
| </note> |
| </section> |
| </conbody> |
| </concept> |
| <concept rev="2.3.0" id="jdbc_odbc_notes"> |
| <title>Notes about JDBC and ODBC Interaction with Impala SQL |
| Features</title> |
| <conbody> |
| <p> Most Impala SQL features work equivalently through the |
| <cmdname>impala-shell</cmdname> interpreter of the JDBC or ODBC APIs. |
| The following are some exceptions to keep in mind when switching between |
| the interactive shell and applications using the APIs: </p> |
| <ul> |
| <li> |
| <p conref="../shared/impala_common.xml#common/complex_types_blurb"/> |
| <ul> |
| <li> |
| <p> Queries involving the complex types (<codeph>ARRAY</codeph>, |
| <codeph>STRUCT</codeph>, and <codeph>MAP</codeph>) require |
| notation that might not be available in all levels of JDBC and |
| ODBC drivers. If you have trouble querying such a table due to |
| the driver level or inability to edit the queries used by the |
| application, you can create a view that exposes a |
| <q>flattened</q> version of the complex columns and point the |
| application at the view. See <xref |
| href="impala_complex_types.xml#complex_types"/> for details. |
| </p> |
| </li> |
| <li> |
| <p> The complex types available in <keyword keyref="impala23_full" |
| /> and higher are supported by the JDBC |
| <codeph>getColumns()</codeph> API. Both <codeph>MAP</codeph> |
| and <codeph>ARRAY</codeph> are reported as the JDBC SQL Type |
| <codeph>ARRAY</codeph>, because this is the closest matching |
| Java SQL type. This behavior is consistent with Hive. |
| <codeph>STRUCT</codeph> types are reported as the JDBC SQL |
| Type <codeph>STRUCT</codeph>. </p> |
| <p> To be consistent with Hive's behavior, the TYPE_NAME field is |
| populated with the primitive type name for scalar types, and |
| with the full <codeph>toSql()</codeph> for complex types. The |
| resulting type names are somewhat inconsistent, because nested |
| types are printed differently than top-level types. For example, |
| the following list shows how <codeph>toSQL()</codeph> for Impala |
| types are translated to <codeph>TYPE_NAME</codeph> values: <codeblock><![CDATA[DECIMAL(10,10) becomes DECIMAL |
| CHAR(10) becomes CHAR |
| VARCHAR(10) becomes VARCHAR |
| ARRAY<DECIMAL(10,10)> becomes ARRAY<DECIMAL(10,10)> |
| ARRAY<CHAR(10)> becomes ARRAY<CHAR(10)> |
| ARRAY<VARCHAR(10)> becomes ARRAY<VARCHAR(10)> |
| ]]> |
| </codeblock> |
| </p> |
| </li> |
| </ul> |
| </li> |
| </ul> |
| </conbody> |
| </concept> |
| <concept id="jdbc_kudu"> |
| <title>Kudu Considerations for DML Statements</title> |
| <conbody> |
| <p> Currently, Impala <codeph>INSERT</codeph>, <codeph>UPDATE</codeph>, or |
| other DML statements issued through the JDBC interface against a Kudu |
| table do not return JDBC error codes for conditions such as duplicate |
| primary key columns. Therefore, for applications that issue a high |
| volume of DML statements, prefer to use the Kudu Java API directly |
| rather than a JDBC application. </p> |
| </conbody> |
| </concept> |
| </concept> |