examples/scala/spark-example/README.adoc - kudu - Git at Google

 // Licensed to the Apache Software Foundation (ASF) under one
 // or more contributor license agreements.  See the NOTICE file
 // distributed with this work for additional information
 // regarding copyright ownership.  The ASF licenses this file
 // to you under the Apache License, Version 2.0 (the
 // "License"); you may not use this file except in compliance
 // with the License.  You may obtain a copy of the License at
 //
 //   http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing,
 // software distributed under the License is distributed on an
 // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 // KIND, either express or implied.  See the License for the
 // specific language governing permissions and limitations
 // under the License.

 = Kudu-Spark example README
 :author: Kudu Team
 :homepage: https://kudu.apache.org/

 This is an example program that uses the Kudu-Spark integration to:

 - Create a table
 - Insert some rows
 - Upsert some rows
 - Scan some rows
 ** Scan rows using RDD/DataFrame methods
 ** Scan rows using SparkSQL
 - Delete the table

 To build the example, ensure maven is installed and execute
 the following from the 'spark-example' directory. This will create a Spark
 application jar in the 'target' directory:

 [source,bash]
 ----
 $ mvn package
 ----

 To configure the kudu-spark example, there are two Java system properties
 available:

 - kuduMasters: A comma-separated list of Kudu master RPC endpoints, where
   each endpoint is in form '<HostName|IPAddress>[:PortNumber]' (the port number
   by default is 7051 if not specified).
   Default: 'localhost:7051'.
 - tableName: The name of the table to use for the example program. This
   table should not exist in Kudu. Default: 'spark_test'.

 The application can be run using `spark-submit`. For example, to run the
 example against a Spark cluster running on YARN with Kudu masters at nodes
 master1, master2, master3, use a command like the following:

 [source.bash]
 ----
 $ spark-submit \
   --class org.apache.kudu.spark.examples.SparkExample \
   --master yarn \
   --driver-java-options \
     '-DkuduMasters=master1,master2,master3 -DtableName=test_table' \
   target/kudu-spark-example-1.0-SNAPSHOT.jar
 ----

 You will need the Kudu cluster to be up and running and Spark correctly
 configured for the example to work.
	// Licensed to the Apache Software Foundation (ASF) under one
	// or more contributor license agreements. See the NOTICE file
	// distributed with this work for additional information
	// regarding copyright ownership. The ASF licenses this file
	// to you under the Apache License, Version 2.0 (the
	// "License"); you may not use this file except in compliance
	// with the License. You may obtain a copy of the License at
	//
	// http://www.apache.org/licenses/LICENSE-2.0
	//
	// Unless required by applicable law or agreed to in writing,
	// software distributed under the License is distributed on an
	// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	// KIND, either express or implied. See the License for the
	// specific language governing permissions and limitations
	// under the License.

	= Kudu-Spark example README
	:author: Kudu Team
	:homepage: https://kudu.apache.org/

	This is an example program that uses the Kudu-Spark integration to:

	- Create a table
	- Insert some rows
	- Upsert some rows
	- Scan some rows
	** Scan rows using RDD/DataFrame methods
	** Scan rows using SparkSQL
	- Delete the table

	To build the example, ensure maven is installed and execute
	the following from the 'spark-example' directory. This will create a Spark
	application jar in the 'target' directory:

	[source,bash]
	----
	$ mvn package
	----

	To configure the kudu-spark example, there are two Java system properties
	available:

	- kuduMasters: A comma-separated list of Kudu master RPC endpoints, where
	each endpoint is in form '<HostName\|IPAddress>[:PortNumber]' (the port number
	by default is 7051 if not specified).
	Default: 'localhost:7051'.
	- tableName: The name of the table to use for the example program. This
	table should not exist in Kudu. Default: 'spark_test'.

	The application can be run using `spark-submit`. For example, to run the
	example against a Spark cluster running on YARN with Kudu masters at nodes
	master1, master2, master3, use a command like the following:

	[source.bash]
	----
	$ spark-submit \
	--class org.apache.kudu.spark.examples.SparkExample \
	--master yarn \
	--driver-java-options \
	'-DkuduMasters=master1,master2,master3 -DtableName=test_table' \
	target/kudu-spark-example-1.0-SNAPSHOT.jar
	----

	You will need the Kudu cluster to be up and running and Spark correctly
	configured for the example to work.