blob: f5c4e36c677fbd364d304ed432010b9350884fa7 [file] [log] [blame]
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
= Kudu-Spark example README
:author: Kudu Team
:homepage: https://kudu.apache.org/
This is an example program that uses the Kudu-Spark integration to:
- Create a table
- Insert some rows
- Upsert some rows
- Scan some rows
** Scan rows using RDD/DataFrame methods
** Scan rows using SparkSQL
- Delete the table
To build the example, ensure maven is installed and execute
the following from the 'spark-example' directory. This will create a Spark
application jar in the 'target' directory:
[source,bash]
----
$ mvn package
----
To configure the kudu-spark example, there are two Java system properties
available:
- kuduMasters: A comma-separated list of Kudu master RPC endpoints, where
each endpoint is in form '<HostName|IPAddress>[:PortNumber]' (the port number
by default is 7051 if not specified).
Default: 'localhost:7051'.
- tableName: The name of the table to use for the example program. This
table should not exist in Kudu. Default: 'spark_test'.
The application can be run using `spark-submit`. For example, to run the
example against a Spark cluster running on YARN with Kudu masters at nodes
master1, master2, master3, use a command like the following:
[source.bash]
----
$ spark-submit \
--class org.apache.kudu.spark.examples.SparkExample \
--master yarn \
--driver-java-options \
'-DkuduMasters=master1,master2,master3 -DtableName=test_table' \
target/kudu-spark-example-1.0-SNAPSHOT.jar
----
You will need the Kudu cluster to be up and running and Spark correctly
configured for the example to work.