blob: 88333693955dbc788911c705c4288a1de9247805 [file] [log] [blame]
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
[[developing]]
= Developing Applications With Apache Kudu
:author: Kudu Team
:imagesdir: ./images
:icons: font
:toc: left
:toclevels: 3
:doctype: book
:backend: html5
:sectlinks:
:experimental:
Kudu provides C++ and Java client APIs, as well as reference examples to illustrate
their use. A Python API is included, but it is currently considered experimental,
unstable, and subject to change at any time.
WARNING: Use of server-side or private interfaces is not supported, and interfaces
which are not part of public APIs have no stability guarantees.
== Viewing the API Documentation
include::installation.adoc[tags=view_api]
== Working Examples
Several example applications are provided in the
link:https://github.com/cloudera/kudu-examples[kudu-examples] Github
repository. Each example includes a `README` that shows how to compile and run
it. These examples illustrate correct usage of the Kudu APIs, as well as how to
set up a virtual machine to run Kudu. The following list includes some of the
examples that are available today. Check the repository itself in case this list goes
out of date.
`java-example`::
A simple Java application which connects to a Kudu instance, creates a table, writes data to it, then drops the table.
`collectl`::
A small Java application which listens on a TCP socket for time series data corresponding to the Collectl wire protocol.
The commonly-available collectl tool can be used to send example data to the server.
`clients/python`::
An experimental Python client for Kudu.
`demo-vm-setup`::
Scripts to download and run a VirtualBox virtual machine with Kudu already installed.
See link:quickstart.html[Quickstart] for more information.
These examples should serve as helpful starting points for your own Kudu applications and integrations.
=== Maven Artifacts
The following Maven `<dependency>` element is valid for the Kudu public beta:
[source,xml]
----
<dependency>
<groupId>org.apache.kudu</groupId>
<artifactId>kudu-client</artifactId>
<version>0.5.0</version>
</dependency>
----
Because the Maven artifacts are not in Maven Central, use the following `<repository>`
element:
[source,xml]
----
<repository>
<id>cdh.repo</id>
<name>Cloudera Repositories</name>
<url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
----
See subdirectories of https://github.com/cloudera/kudu-examples/tree/master/java for
example Maven pom.xml files.
== Example Impala Commands With Kudu
See link:kudu_impala_integration.html[Using Impala With Kudu] for guidance on installing
and using Impala with Kudu, including several `impala-shell` examples.
== Kudu Integration with Spark
Kudu integrates with Spark through the Data Source API as of version 0.9.
Include the kudu-spark jar using the --jars option:
[source]
----
spark-shell --jars kudu-spark-0.9.0.jar
----
then import kudu-spark and create a dataframe:
[source,scala]
----
import org.apache.kudu.spark.kudu._
// Read a table from Kudu
val df = sqlContext.read.options(Map("kudu.master" -> "kudu.master:7051","kudu.table" -> "kudu_table")).kudu
// Query using the Spark API...
df.select("id").filter("id" >= 5).show()
// ...or register a temporary table and use SQL
df.registerTempTable("kudu_table")
val filteredDF = sqlContext.sql("select id from kudu_table where id >= 5").show()
// Use KuduContext to create, delete, or write to Kudu tables
val kuduContext = new KuduContext("kudu.master:7051")
// Create a new Kudu table from a dataframe schema
// NB: No rows from the dataframe are inserted into the table
kuduContext.createTable("test_table", df.schema, Seq("key"), new CreateTableOptions().setNumReplicas(1))
// Insert data
kuduContext.insertRows(df, "test_table")
// Delete data
kuduContext.deleteRows(filteredDF, "test_table")
// Upsert data
kuduContext.upsertRows(df, "test_table")
// Update data
val alteredDF = df.select("id", $"count" + 1)
kuduContext.updateRows(filteredRows, "test_table"
// Data can also be inserted into the Kudu table using the data source, though the methods on KuduContext are preferred
// NB: The default is to upsert rows; to perform standard inserts instead, set operation = insert in the options map
// NB: Only mode Append is supported
df.write.options(Map("kudu.master"-> "kudu.master:7051", "kudu.table"-> "test_table")).mode("append").kudu
// Check for the existence of a Kudu table
kuduContext.tableExists("another_table")
// Delete a Kudu table
kuduContext.deleteTable("unwanted_table")
----
=== Spark Integration Known Issues and Limitations
- The Kudu Spark integration is tested and developed against Spark 1.6 and Scala
2.10.
- Kudu tables with a name containing upper case or non-ascii characters must be
assigned an alternate name when registered as a temporary table.
- Kudu tables with a column name containing upper case or non-ascii characters
may not be used with SparkSQL. Non-primary key columns may be renamed in Kudu
to work around this issue.
- `NULL`, `NOT NULL`, `<>`, `OR`, `LIKE`, and `IN` predicates are not pushed to
Kudu, and instead will be evaluated by the Spark task.
- Kudu does not support all types supported by Spark SQL, such as `Date`,
`Decimal` and complex types.
== Integration with MapReduce, YARN, and Other Frameworks
Kudu was designed to integrate with MapReduce, YARN, Spark, and other frameworks in
the Hadoop ecosystem. See
link:https://github.com/apache/kudu/blob/master/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/RowCounter.java[RowCounter.java]
and
link:https://github.com/apache/kudu/blob/master/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportCsv.java[ImportCsv.java]
for examples which you can model your own integrations on. Stay tuned for more examples
using YARN and Spark in the future.