| --- |
| layout: global |
| title: Distributed SQL Engine |
| displayTitle: Distributed SQL Engine |
| license: | |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --- |
| |
| * Table of contents |
| {:toc} |
| |
| Spark SQL can also act as a distributed query engine using its JDBC/ODBC or command-line interface. |
| In this mode, end-users or applications can interact with Spark SQL directly to run SQL queries, |
| without the need to write any code. |
| |
| ## Running the Thrift JDBC/ODBC server |
| |
| The Thrift JDBC/ODBC server implemented here corresponds to the [`HiveServer2`](https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2) |
| in built-in Hive. You can test the JDBC server with the beeline script that comes with either Spark or compatible Hive. |
| |
| To start the JDBC/ODBC server, run the following in the Spark directory: |
| |
| ./sbin/start-thriftserver.sh |
| |
| This script accepts all `bin/spark-submit` command line options, plus a `--hiveconf` option to |
| specify Hive properties. You may run `./sbin/start-thriftserver.sh --help` for a complete list of |
| all available options. By default, the server listens on localhost:10000. You may override this |
| behaviour via either environment variables, i.e.: |
| |
| {% highlight bash %} |
| export HIVE_SERVER2_THRIFT_PORT=<listening-port> |
| export HIVE_SERVER2_THRIFT_BIND_HOST=<listening-host> |
| ./sbin/start-thriftserver.sh \ |
| --master <master-uri> \ |
| ... |
| {% endhighlight %} |
| |
| or system properties: |
| |
| {% highlight bash %} |
| ./sbin/start-thriftserver.sh \ |
| --hiveconf hive.server2.thrift.port=<listening-port> \ |
| --hiveconf hive.server2.thrift.bind.host=<listening-host> \ |
| --master <master-uri> |
| ... |
| {% endhighlight %} |
| |
| Now you can use beeline to test the Thrift JDBC/ODBC server: |
| |
| ./bin/beeline |
| |
| Connect to the JDBC/ODBC server in beeline with: |
| |
| beeline> !connect jdbc:hive2://localhost:10000 |
| |
| Beeline will ask you for a username and password. In non-secure mode, simply enter the username on |
| your machine and a blank password. For secure mode, please follow the instructions given in the |
| [beeline documentation](https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients). |
| |
| Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` and `hdfs-site.xml` files in `conf/`. |
| |
| You may also use the beeline script that comes with Hive. |
| |
| Thrift JDBC server also supports sending thrift RPC messages over HTTP transport. |
| Use the following setting to enable HTTP mode as system property or in `hive-site.xml` file in `conf/`: |
| |
| hive.server2.transport.mode - Set this to value: http |
| hive.server2.thrift.http.port - HTTP port number to listen on; default is 10001 |
| hive.server2.http.endpoint - HTTP endpoint; default is cliservice |
| |
| To test, use beeline to connect to the JDBC/ODBC server in http mode with: |
| |
| beeline> !connect jdbc:hive2://<host>:<port>/<database>?hive.server2.transport.mode=http;hive.server2.thrift.http.path=<http_endpoint> |
| |
| If you closed a session and do CTAS, you must set `fs.%s.impl.disable.cache` to true in `hive-site.xml`. |
| See more details in [[SPARK-21067]](https://issues.apache.org/jira/browse/SPARK-21067). |
| |
| ## Running the Spark SQL CLI |
| |
| The Spark SQL CLI is a convenient tool to run the Hive metastore service in local mode and execute |
| queries input from the command line. Note that the Spark SQL CLI cannot talk to the Thrift JDBC server. |
| |
| To start the Spark SQL CLI, run the following in the Spark directory: |
| |
| ./bin/spark-sql |
| |
| Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` and `hdfs-site.xml` files in `conf/`. |
| You may run `./bin/spark-sql --help` for a complete list of all available options. |