blob: 42b1110fd1cb26b1865bfe041ea3591342f364a1 [file] [log] [blame]
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
[[developing]]
= Developing Applications With Apache Kudu (incubating)
:author: Kudu Team
:imagesdir: ./images
:icons: font
:toc: left
:toclevels: 3
:doctype: book
:backend: html5
:sectlinks:
:experimental:
Kudu provides C++ and Java client APIs, as well as reference examples to illustrate
their use. A Python API is included, but it is currently considered experimental,
unstable, and subject to change at any time.
WARNING: Use of server-side or private interfaces is not supported, and interfaces
which are not part of public APIs have no stability guarantees.
== Viewing the API Documentation
include::installation.adoc[tags=view_api]
== Working Examples
Several example applications are provided in the
link:https://github.com/cloudera/kudu-examples[kudu-examples] Github
repository. Each example includes a `README` that shows how to compile and run
it. These examples illustrate correct usage of the Kudu APIs, as well as how to
set up a virtual machine to run Kudu. The following list includes some of the
examples that are available today. Check the repository itself in case this list goes
out of date.
`java-example`::
A simple Java application which connects to a Kudu instance, creates a table, writes data to it, then drops the table.
`collectl`::
A small Java application which listens on a TCP socket for time series data corresponding to the Collectl wire protocol.
The commonly-available collectl tool can be used to send example data to the server.
`clients/python`::
An experimental Python client for Kudu.
`demo-vm-setup`::
Scripts to download and run a VirtualBox virtual machine with Kudu already installed.
See link:quickstart.html[Quickstart] for more information.
These examples should serve as helpful starting points for your own Kudu applications and integrations.
=== Maven Artifacts
The following Maven `<dependency>` element is valid for the Kudu public beta:
[source,xml]
----
<dependency>
<groupId>org.kududb</groupId>
<artifactId>kudu-client</artifactId>
<version>0.5.0</version>
</dependency>
----
Because the Maven artifacts are not in Maven Central, use the following `<repository>`
element:
[source,xml]
----
<repository>
<id>cdh.repo</id>
<name>Cloudera Repositories</name>
<url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
----
See subdirectories of https://github.com/cloudera/kudu-examples/tree/master/java for
example Maven pom.xml files.
== Example Impala Commands With Kudu
See link:kudu_impala_integration.html[Using Impala With Kudu] for guidance on installing
and using Impala with Kudu, including several `impala-shell` examples.
== Kudu integration with Spark
Kudu integrates with spark through the spark data source api as of version 0.9
Include the kudu-spark using the --jars
[source]
----
spark-shell --jars /kudu-spark-0.9.0.jar
----
Then import kudu-spark and create a dataframe:
[source]
----
// Import kudu datasource
import org.kududb.spark.kudu._
val kuduDataFrame = sqlContext.read.options(Map("kudu.master"-> "your.kudu.master.here","kudu.table"-> "your.kudu.table.here")).kudu
// Then query using spark api or register a temporary table and use spark sql
kuduDataFrame.select("id").filter("id">=5).show()
// Register kuduDataFrame as a temporary table for spark-sql
kuduDataFrame.registerTempTable("kudu_table")
// Select from the dataframe
sqlContext.sql("select id from kudu_table where id>=5").show()
// create a new kudu table from a dataframe
val kuduContext = new KuduContext("your.kudu.master.here")
kuduContext.createTable("testcreatetable", df.schema, Seq("key"), new CreateTableOptions().setNumReplicas(1))
// then we can insert data into the kudu table
df.write.options(Map("kudu.master"-> "your.kudu.master.here","kudu.table"-> "your.kudu.table.here")).mode("append").kudu
// to update existing data change the mode to 'overwrite'
df.write.options(Map("kudu.master"-> "your.kudu.master.here","kudu.table"-> "your.kudu.table.here")).mode("overwrite").kudu
// to check for existance of a kudu table
kuduContext.tableExists("your.kudu.table.here")
// to delete a kudu table
kuduContext.deleteTable("your.kudu.table.here")
----
== Integration with MapReduce, YARN, and Other Frameworks
Kudu was designed to integrate with MapReduce, YARN, Spark, and other frameworks in
the Hadoop ecosystem. See link:https://github.com/apache/incubator-kudu/blob/master/java/kudu-client-tools/src/main/java/org/kududb/mapreduce/tools/RowCounter.java[RowCounter.java]
and
link:https://github.com/apache/incubator-kudu/blob/master/java/kudu-client-tools/src/main/java/org/kududb/mapreduce/tools/ImportCsv.java[ImportCsv.java]
for examples which you can model your own integrations on. Stay tuned for more examples
using YARN and Spark in the future.