docs/sql-distributed-sql-engine.md - spark - Git at Google

 ---
 layout: global
 title: Distributed SQL Engine
 displayTitle: Distributed SQL Engine
 license: |
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

      http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
 ---

 * Table of contents
 {:toc}

 Spark SQL can also act as a distributed query engine using its JDBC/ODBC or command-line interface.
 In this mode, end-users or applications can interact with Spark SQL directly to run SQL queries,
 without the need to write any code.

 ## Running the Thrift JDBC/ODBC server

 The Thrift JDBC/ODBC server implemented here corresponds to the [`HiveServer2`](https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2)
 in built-in Hive. You can test the JDBC server with the beeline script that comes with either Spark or compatible Hive.

 To start the JDBC/ODBC server, run the following in the Spark directory:

     ./sbin/start-thriftserver.sh

 This script accepts all `bin/spark-submit` command line options, plus a `--hiveconf` option to
 specify Hive properties. You may run `./sbin/start-thriftserver.sh --help` for a complete list of
 all available options. By default, the server listens on localhost:10000. You may override this
 behaviour via either environment variables, i.e.:

 {% highlight bash %}
 export HIVE_SERVER2_THRIFT_PORT=<listening-port>
 export HIVE_SERVER2_THRIFT_BIND_HOST=<listening-host>
 ./sbin/start-thriftserver.sh \
   --master <master-uri> \
   ...
 {% endhighlight %}

 or system properties:

 {% highlight bash %}
 ./sbin/start-thriftserver.sh \
   --hiveconf hive.server2.thrift.port=<listening-port> \
   --hiveconf hive.server2.thrift.bind.host=<listening-host> \
   --master <master-uri>
   ...
 {% endhighlight %}

 Now you can use beeline to test the Thrift JDBC/ODBC server:

     ./bin/beeline

 Connect to the JDBC/ODBC server in beeline with:

     beeline> !connect jdbc:hive2://localhost:10000

 Beeline will ask you for a username and password. In non-secure mode, simply enter the username on
 your machine and a blank password. For secure mode, please follow the instructions given in the
 [beeline documentation](https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients).

 Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` and `hdfs-site.xml` files in `conf/`.

 You may also use the beeline script that comes with Hive.

 Thrift JDBC server also supports sending thrift RPC messages over HTTP transport.
 Use the following setting to enable HTTP mode as system property or in `hive-site.xml` file in `conf/`:

     hive.server2.transport.mode - Set this to value: http
     hive.server2.thrift.http.port - HTTP port number to listen on; default is 10001
     hive.server2.http.endpoint - HTTP endpoint; default is cliservice

 To test, use beeline to connect to the JDBC/ODBC server in http mode with:

     beeline> !connect jdbc:hive2://<host>:<port>/<database>?hive.server2.transport.mode=http;hive.server2.thrift.http.path=<http_endpoint>

 If you closed a session and do CTAS, you must set `fs.%s.impl.disable.cache` to true in `hive-site.xml`.
 See more details in [[SPARK-21067]](https://issues.apache.org/jira/browse/SPARK-21067).

 ## Running the Spark SQL CLI

 The Spark SQL CLI is a convenient tool to run the Hive metastore service in local mode and execute
 queries input from the command line. Note that the Spark SQL CLI cannot talk to the Thrift JDBC server.

 To start the Spark SQL CLI, run the following in the Spark directory:

     ./bin/spark-sql

 Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` and `hdfs-site.xml` files in `conf/`.
 You may run `./bin/spark-sql --help` for a complete list of all available options.
	---
	layout: global
	title: Distributed SQL Engine
	displayTitle: Distributed SQL Engine
	license: \|
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	---

	* Table of contents
	{:toc}

	Spark SQL can also act as a distributed query engine using its JDBC/ODBC or command-line interface.
	In this mode, end-users or applications can interact with Spark SQL directly to run SQL queries,
	without the need to write any code.

	## Running the Thrift JDBC/ODBC server

	The Thrift JDBC/ODBC server implemented here corresponds to the [`HiveServer2`](https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2)
	in built-in Hive. You can test the JDBC server with the beeline script that comes with either Spark or compatible Hive.

	To start the JDBC/ODBC server, run the following in the Spark directory:

	./sbin/start-thriftserver.sh

	This script accepts all `bin/spark-submit` command line options, plus a `--hiveconf` option to
	specify Hive properties. You may run `./sbin/start-thriftserver.sh --help` for a complete list of
	all available options. By default, the server listens on localhost:10000. You may override this
	behaviour via either environment variables, i.e.:

	{% highlight bash %}
	export HIVE_SERVER2_THRIFT_PORT=<listening-port>
	export HIVE_SERVER2_THRIFT_BIND_HOST=<listening-host>
	./sbin/start-thriftserver.sh \
	--master <master-uri> \
	...
	{% endhighlight %}

	or system properties:

	{% highlight bash %}
	./sbin/start-thriftserver.sh \
	--hiveconf hive.server2.thrift.port=<listening-port> \
	--hiveconf hive.server2.thrift.bind.host=<listening-host> \
	--master <master-uri>
	...
	{% endhighlight %}

	Now you can use beeline to test the Thrift JDBC/ODBC server:

	./bin/beeline

	Connect to the JDBC/ODBC server in beeline with:

	beeline> !connect jdbc:hive2://localhost:10000

	Beeline will ask you for a username and password. In non-secure mode, simply enter the username on
	your machine and a blank password. For secure mode, please follow the instructions given in the
	[beeline documentation](https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients).

	Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` and `hdfs-site.xml` files in `conf/`.

	You may also use the beeline script that comes with Hive.

	Thrift JDBC server also supports sending thrift RPC messages over HTTP transport.
	Use the following setting to enable HTTP mode as system property or in `hive-site.xml` file in `conf/`:

	hive.server2.transport.mode - Set this to value: http
	hive.server2.thrift.http.port - HTTP port number to listen on; default is 10001
	hive.server2.http.endpoint - HTTP endpoint; default is cliservice

	To test, use beeline to connect to the JDBC/ODBC server in http mode with:

	beeline> !connect jdbc:hive2://<host>:<port>/<database>?hive.server2.transport.mode=http;hive.server2.thrift.http.path=<http_endpoint>

	If you closed a session and do CTAS, you must set `fs.%s.impl.disable.cache` to true in `hive-site.xml`.
	See more details in [[SPARK-21067]](https://issues.apache.org/jira/browse/SPARK-21067).

	## Running the Spark SQL CLI

	The Spark SQL CLI is a convenient tool to run the Hive metastore service in local mode and execute
	queries input from the command line. Note that the Spark SQL CLI cannot talk to the Thrift JDBC server.

	To start the Spark SQL CLI, run the following in the Spark directory:

	./bin/spark-sql

	Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` and `hdfs-site.xml` files in `conf/`.
	You may run `./bin/spark-sql --help` for a complete list of all available options.