blob: f056229bdda354d110af7c27777283e31215436e [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
<meta name="description" content="A new open source Apache Hadoop ecosystem project, Apache Kudu completes Hadoop's storage layer to enable fast analytics on fast data" />
<meta name="author" content="Cloudera" />
<title>Apache Kudu - Developing Applications With Apache Kudu</title>
<!-- Bootstrap core CSS -->
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css"
integrity="sha384-1q8mTJOASx8j1Au+a5WDVnPi2lkFfwwEAa8hDDdjZlpLegxhjVME1fgjWPGmkzs7"
crossorigin="anonymous">
<!-- Custom styles for this template -->
<link href="/css/kudu.css" rel="stylesheet"/>
<link href="/css/asciidoc.css" rel="stylesheet"/>
<link rel="shortcut icon" href="/img/logo-favicon.ico" />
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.6.1/css/font-awesome.min.css" />
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
</head>
<body>
<div class="kudu-site container-fluid">
<!-- Static navbar -->
<nav class="navbar navbar-default">
<div class="container-fluid">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="logo" href="/"><img
src="//d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_80px.png"
srcset="//d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_80px.png 1x, //d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_160px.png 2x"
alt="Apache Kudu"/></a>
</div>
<div id="navbar" class="collapse navbar-collapse">
<ul class="nav navbar-nav navbar-right">
<li >
<a href="/">Home</a>
</li>
<li >
<a href="/overview.html">Overview</a>
</li>
<li class="active">
<a href="/docs/">Documentation</a>
</li>
<li >
<a href="/releases/">Releases</a>
</li>
<li >
<a href="/blog/">Blog</a>
</li>
<!-- NOTE: this dropdown menu does not appear on Mobile, so don't add anything here
that doesn't also appear elsewhere on the site. -->
<li class="dropdown">
<a href="/community.html" role="button" aria-haspopup="true" aria-expanded="false">Community <span class="caret"></span></a>
<ul class="dropdown-menu">
<li class="dropdown-header">GET IN TOUCH</li>
<li><a class="icon email" href="/community.html">Mailing Lists</a></li>
<li><a class="icon slack" href="https://getkudu-slack.herokuapp.com/">Slack Channel</a></li>
<li role="separator" class="divider"></li>
<li><a href="/community.html#meetups-user-groups-and-conference-presentations">Events and Meetups</a></li>
<li><a href="/committers.html">Project Committers</a></li>
<li><a href="/ecosystem.html">Ecosystem</a></li>
<!--<li><a href="/roadmap.html">Roadmap</a></li>-->
<li><a href="/community.html#contributions">How to Contribute</a></li>
<li role="separator" class="divider"></li>
<li class="dropdown-header">DEVELOPER RESOURCES</li>
<li><a class="icon github" href="https://github.com/apache/incubator-kudu">GitHub</a></li>
<li><a class="icon gerrit" href="http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu">Gerrit Code Review</a></li>
<li><a class="icon jira" href="https://issues.apache.org/jira/browse/KUDU">JIRA Issue Tracker</a></li>
<li role="separator" class="divider"></li>
<li class="dropdown-header">SOCIAL MEDIA</li>
<li><a class="icon twitter" href="https://twitter.com/ApacheKudu">Twitter</a></li>
<li><a href="https://www.reddit.com/r/kudu/">Reddit</a></li>
<li role="separator" class="divider"></li>
<li class="dropdown-header">APACHE SOFTWARE FOUNDATION</li>
<li><a href="https://www.apache.org/security/" target="_blank">Security</a></li>
<li><a href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Sponsorship</a></li>
<li><a href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li>
<li><a href="https://www.apache.org/licenses/" target="_blank">License</a></li>
</ul>
</li>
<li >
<a href="/faq.html">FAQ</a>
</li>
</ul><!-- /.nav -->
</div><!-- /#navbar -->
</div><!-- /.container-fluid -->
</nav>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<div class="container">
<div class="row">
<div class="col-md-9">
<h1>Developing Applications With Apache Kudu</h1>
<div id="preamble">
<div class="sectionbody">
<div class="paragraph">
<p>Kudu provides C++, Java and Python client APIs, as well as reference examples to illustrate
their use.</p>
</div>
<div class="admonitionblock warning">
<table>
<tr>
<td class="icon">
<i class="fa icon-warning" title="Warning"></i>
</td>
<td class="content">
Use of server-side or private interfaces is not supported, and interfaces
which are not part of public APIs have no stability guarantees.
</td>
</tr>
</table>
</div>
</div>
</div>
<div class="sect1">
<h2 id="view_api"><a class="link" href="#view_api">Viewing the API Documentation</a></h2>
<div class="sectionbody">
<div class="paragraph">
<div class="title">C++ API Documentation</div>
<p>You can view the <a href="../cpp-client-api/index.html">C++ client API documentation</a>
online. Alternatively, after
<a href="installation.html#build_from_source">building Kudu from source</a>, you can
additionally build the <code>doxygen</code> target (e.g., run <code>make doxygen</code> if using
make) and use the locally generated API documentation by opening
<code>docs/doxygen/client_api/html/index.html</code> file in your favorite Web browser.</p>
</div>
<div class="admonitionblock note">
<table>
<tr>
<td class="icon">
<i class="fa icon-note" title="Note"></i>
</td>
<td class="content">
In order to build the <code>doxygen</code> target, it&#8217;s necessary to have
doxygen of version 1.8.11 or newer with Dot (graphviz) support installed at
your build machine. If you installed doxygen after building Kudu from source,
you will need to run <code>cmake</code> again to pick up the doxygen location and generate
appropriate targets.
</td>
</tr>
</table>
</div>
<div class="paragraph">
<div class="title">Java API Documentation</div>
<p>You can view the <a href="../apidocs/index.html">Java API documentation</a> online.
Alternatively, after <a href="installation.html#build_java_client">building
the Java client</a>, Java API documentation is available in
<code>java/kudu-client/target/apidocs/index.html</code>.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_working_examples"><a class="link" href="#_working_examples">Working Examples</a></h2>
<div class="sectionbody">
<div class="paragraph">
<p>Several example applications are provided in the
<a href="https://github.com/apache/kudu/tree/master/examples">examples directory</a>
of the Apache Kudu git repository. Each example includes a <code>README</code> that shows
how to compile and run it. The following list includes some of the
examples that are available today. Check the repository itself in case this list goes
out of date.</p>
</div>
<div class="dlist">
<dl>
<dt class="hdlist1"><code>cpp/example.cc</code></dt>
<dd>
<p>A simple C++ application which connects to a Kudu instance, creates a table, writes data to it, then drops the table.</p>
</dd>
<dt class="hdlist1"><code>java/java-example</code></dt>
<dd>
<p>A simple Java application which connects to a Kudu instance, creates a table, writes data to it, then drops the table.</p>
</dd>
<dt class="hdlist1"><code>java/collectl</code></dt>
<dd>
<p>A small Java application which listens on a TCP socket for time series data corresponding to the Collectl wire protocol.
The commonly-available collectl tool can be used to send example data to the server.</p>
</dd>
<dt class="hdlist1"><code>java/insert-loadgen</code></dt>
<dd>
<p>A Java application that generates random insert load.</p>
</dd>
<dt class="hdlist1"><code>python/dstat-kudu</code></dt>
<dd>
<p>An example program that shows how to use the Kudu Python API to load data into a new / existing Kudu table
generated by an external program, <code>dstat</code> in this case.</p>
</dd>
<dt class="hdlist1"><code>python/graphite-kudu</code></dt>
<dd>
<p>An example plugin for using graphite-web with Kudu as a backend.</p>
</dd>
</dl>
</div>
<div class="paragraph">
<p>These examples should serve as helpful starting points for your own Kudu applications and integrations.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_maven_artifacts"><a class="link" href="#_maven_artifacts">Maven Artifacts</a></h2>
<div class="sectionbody">
<div class="paragraph">
<p>The following Maven <code>&lt;dependency&gt;</code> element is valid for the Apache Kudu public release
(since 1.0.0):</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-xml" data-lang="xml">&lt;dependency&gt;
&lt;groupId&gt;org.apache.kudu&lt;/groupId&gt;
&lt;artifactId&gt;kudu-client&lt;/artifactId&gt;
&lt;version&gt;1.10.0&lt;/version&gt;
&lt;/dependency&gt;</code></pre>
</div>
</div>
<div class="paragraph">
<p>Convenience binary artifacts for the Java client and various Java integrations (e.g. Spark, Flume)
are also available via the <a href="http://repository.apache.org">ASF Maven repository</a> and
<a href="https://mvnrepository.com/artifact/org.apache.kudu">Maven Central repository</a>.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_example_impala_commands_with_kudu"><a class="link" href="#_example_impala_commands_with_kudu">Example Impala Commands With Kudu</a></h2>
<div class="sectionbody">
<div class="paragraph">
<p>See <a href="kudu_impala_integration.html">Using Impala With Kudu</a> for guidance on installing
and using Impala with Kudu, including several <code>impala-shell</code> examples.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_kudu_integration_with_spark"><a class="link" href="#_kudu_integration_with_spark">Kudu Integration with Spark</a></h2>
<div class="sectionbody">
<div class="paragraph">
<p>Kudu integrates with Spark through the Data Source API as of version 1.0.0.
Include the kudu-spark dependency using the --packages option:</p>
</div>
<div class="paragraph">
<p>Use the kudu-spark_2.10 artifact if using Spark with Scala 2.10. Note that Spark 1 is no
longer supported in Kudu starting from version 1.6.0. So in order to use Spark 1 integrated
with Kudu, version 1.5.0 is the latest to go to.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code>spark-shell --packages org.apache.kudu:kudu-spark_2.10:1.5.0</code></pre>
</div>
</div>
<div class="paragraph">
<p>Use kudu-spark2_2.11 artifact if using Spark 2 with Scala 2.11.</p>
</div>
<div class="admonitionblock note">
<table>
<tr>
<td class="icon">
<i class="fa icon-note" title="Note"></i>
</td>
<td class="content">
kudu-spark versions 1.8.0 and below have slightly different syntax.
See the documentation of your version for a valid example. Versioned documentation can be found
on the <a href="http://kudu.apache.org/releases/">releases page</a>.
</td>
</tr>
</table>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code>spark-shell --packages org.apache.kudu:kudu-spark2_2.11:1.10.0</code></pre>
</div>
</div>
<div class="paragraph">
<p>Below is a minimal Spark SQL "select" example. We first import the kudu spark package,
then create a DataFrame, and then create a view from the DataFrame. After those
steps, the table is accessible from Spark SQL.</p>
</div>
<div class="admonitionblock note">
<table>
<tr>
<td class="icon">
<i class="fa icon-note" title="Note"></i>
</td>
<td class="content">
There is also a Spark <a href="https://github.com/apache/kudu/tree/master/examples/quickstart/spark">quickstart</a>
guide and another <a href="https://github.com/apache/kudu/tree/master/examples/scala/spark-example">example</a>
available.
</td>
</tr>
</table>
</div>
<div class="admonitionblock note">
<table>
<tr>
<td class="icon">
<i class="fa icon-note" title="Note"></i>
</td>
<td class="content">
You can use the Kudu CLI tool to create table and generate data by
<code>kudu perf loadgen kudu.master:7051 -keep_auto_table</code> for the following two examples.
</td>
</tr>
</table>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-scala" data-lang="scala">import org.apache.kudu.spark.kudu._
// Create a DataFrame that points to the Kudu table we want to query.
val df = spark.read.options(Map("kudu.master" -&gt; "kudu.master:7051",
"kudu.table" -&gt; "default.my_table")).format("kudu").load
// Create a view from the DataFrame to make it accessible from Spark SQL.
df.createOrReplaceTempView("my_table")
// Now we can run Spark SQL queries against our view of the Kudu table.
spark.sql("select * from my_table").show()</code></pre>
</div>
</div>
<div class="paragraph">
<p>Below is a more sophisticated example that includes both reads and writes:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-scala" data-lang="scala">import org.apache.kudu.client._
import org.apache.kudu.spark.kudu.KuduContext
import collection.JavaConverters._
// Read a table from Kudu
val df = spark.read
.options(Map("kudu.master" -&gt; "kudu.master:7051", "kudu.table" -&gt; "kudu_table"))
.format("kudu").load
// Query using the Spark API...
df.select("key").filter("key &gt;= 5").show()
// ...or register a temporary table and use SQL
df.createOrReplaceTempView("kudu_table")
val filteredDF = spark.sql("select key from kudu_table where key &gt;= 5").show()
// Use KuduContext to create, delete, or write to Kudu tables
val kuduContext = new KuduContext("kudu.master:7051", spark.sparkContext)
// Create a new Kudu table from a DataFrame schema
// NB: No rows from the DataFrame are inserted into the table
kuduContext.createTable(
"test_table", df.schema, Seq("key"),
new CreateTableOptions()
.setNumReplicas(1)
.addHashPartitions(List("key").asJava, 3))
// Check for the existence of a Kudu table
kuduContext.tableExists("test_table")
// Insert data
kuduContext.insertRows(df, "test_table")
// Delete data
kuduContext.deleteRows(df, "test_table")
// Upsert data
kuduContext.upsertRows(df, "test_table")
// Update data
val updateDF = df.select($"key", ($"int_val" + 1).as("int_val"))
kuduContext.updateRows(updateDF, "test_table")
// Data can also be inserted into the Kudu table using the data source, though the methods on
// KuduContext are preferred
// NB: The default is to upsert rows; to perform standard inserts instead, set operation = insert
// in the options map
// NB: Only mode Append is supported
df.write
.options(Map("kudu.master"-&gt; "kudu.master:7051", "kudu.table"-&gt; "test_table"))
.mode("append")
.format("kudu").save
// Delete a Kudu table
kuduContext.deleteTable("test_table")</code></pre>
</div>
</div>
<div class="sect2">
<h3 id="_upsert_option_in_kudu_spark"><a class="link" href="#_upsert_option_in_kudu_spark">Upsert option in Kudu Spark</a></h3>
<div class="paragraph">
<p>The upsert operation in kudu-spark supports an extra write option of <code>ignoreNull</code>. If set to true,
it will avoid setting existing column values in Kudu table to Null if the corresponding DataFrame
column values are Null. If unspecified, <code>ignoreNull</code> is false by default.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-scala" data-lang="scala">val dataFrame = spark.read
.options(Map("kudu.master" -&gt; "kudu.master:7051", "kudu.table" -&gt; simpleTableName))
.format("kudu").load
dataFrame.createOrReplaceTempView(simpleTableName)
dataFrame.show()
// Below is the original data in the table 'simpleTableName'
+---+---+
|key|val|
+---+---+
| 0|foo|
+---+---+
// Upsert a row with existing key 0 and val Null with ignoreNull set to true
val nullDF = spark.createDataFrame(Seq((0, null.asInstanceOf[String]))).toDF("key", "val")
val wo = new KuduWriteOptions
wo.ignoreNull = true
kuduContext.upsertRows(nullDF, simpleTableName, wo)
dataFrame.show()
// The val field stays unchanged
+---+---+
|key|val|
+---+---+
| 0|foo|
+---+---+
// Upsert a row with existing key 0 and val Null with ignoreNull default/set to false
kuduContext.upsertRows(nullDF, simpleTableName)
// Equivalent to:
// val wo = new KuduWriteOptions
// wo.ignoreNull = false
// kuduContext.upsertRows(nullDF, simpleTableName, wo)
df.show()
// The val field is set to Null this time
+---+----+
|key| val|
+---+----+
| 0|null|
+---+----+</code></pre>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_using_spark_with_a_secure_kudu_cluster"><a class="link" href="#_using_spark_with_a_secure_kudu_cluster">Using Spark with a Secure Kudu Cluster</a></h3>
<div class="paragraph">
<p>The Kudu Spark integration is able to operate on secure Kudu clusters which have
authentication and encryption enabled, but the submitter of the Spark job must
provide the proper credentials. For Spark jobs using the default 'client' deploy
mode, the submitting user must have an active Kerberos ticket granted through
<code>kinit</code>. For Spark jobs using the 'cluster' deploy mode, a Kerberos principal
name and keytab location must be provided through the <code>--principal</code> and
<code>--keytab</code> arguments to <code>spark2-submit</code>.</p>
</div>
</div>
<div class="sect2">
<h3 id="_spark_integration_best_practices"><a class="link" href="#_spark_integration_best_practices">Spark Integration Best Practices</a></h3>
<div class="sect3">
<h4 id="_avoid_multiple_kudu_clients_per_cluster"><a class="link" href="#_avoid_multiple_kudu_clients_per_cluster">Avoid multiple Kudu clients per cluster.</a></h4>
<div class="paragraph">
<p>One common Kudu-Spark coding error is instantiating extra <code>KuduClient</code> objects.
In kudu-spark, a <code>KuduClient</code> is owned by the <code>KuduContext</code>. Spark application code
should not create another <code>KuduClient</code> connecting to the same cluster. Instead,
application code should use the <code>KuduContext</code> to access a <code>KuduClient</code> using
<code>KuduContext#syncClient</code>.</p>
</div>
<div class="paragraph">
<p>To diagnose multiple <code>KuduClient</code> instances in a Spark job, look for signs in
the logs of the master being overloaded by many <code>GetTableLocations</code> or
<code>GetTabletLocations</code> requests coming from different clients, usually around the
same time. This symptom is especially likely in Spark Streaming code,
where creating a <code>KuduClient</code> per task will result in periodic waves of master
requests from new clients.</p>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_spark_integration_known_issues_and_limitations"><a class="link" href="#_spark_integration_known_issues_and_limitations">Spark Integration Known Issues and Limitations</a></h3>
<div class="ulist">
<ul>
<li>
<p>Spark 2.2+ requires Java 8 at runtime even though Kudu Spark 2.x integration
is Java 7 compatible. Spark 2.2 is the default dependency version as of
Kudu 1.5.0.</p>
</li>
<li>
<p>Kudu tables with a name containing upper case or non-ascii characters must be
assigned an alternate name when registered as a temporary table.</p>
</li>
<li>
<p>Kudu tables with a column name containing upper case or non-ascii characters
may not be used with SparkSQL. Columns may be renamed in Kudu to work around
this issue.</p>
</li>
<li>
<p><code>&lt;&gt;</code> and <code>OR</code> predicates are not pushed to Kudu, and instead will be evaluated
by the Spark task. Only <code>LIKE</code> predicates with a suffix wildcard are pushed to
Kudu, meaning that <code>LIKE "FOO%"</code> is pushed down but <code>LIKE "FOO%BAR"</code> isn&#8217;t.</p>
</li>
<li>
<p>Kudu does not support every type supported by Spark SQL. For example,
<code>Date</code> and complex types are not supported.</p>
</li>
<li>
<p>Kudu tables may only be registered as temporary tables in SparkSQL.
Kudu tables may not be queried using HiveContext.</p>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_jvm_based_integration_testing"><a class="link" href="#_jvm_based_integration_testing">JVM-Based Integration Testing</a></h2>
<div class="sectionbody">
<div class="paragraph">
<p>As of version 1.9.0, Kudu ships with an experimental feature called the binary
test JAR. This feature gives people who want to test against Kudu the
capability to start a Kudu "mini cluster" from Java or another JVM-based
language without having to first build Kudu locally. This is possible because
the Kudu binary JAR contains relocatable Kudu binaries that are used by the
<code>KuduTestHarness</code> in the <code>kudu-test-utils</code> module. The <code>KuduTestHarness</code>
contains logic to search the classpath for the Kudu binaries and to start a
mini cluster using them.</p>
</div>
<div class="paragraph">
<p><em>Important: The <code>kudu-binary</code> module should only be used to run Kudu for
integration testing purposes. It should never be used to run an actual Kudu
service, in production or development, because the <code>kudu-binary</code> module
includes native security-related dependencies that have been copied from the
build system and will not be patched when the operating system on the runtime
host is patched.</em></p>
</div>
<div class="sect2">
<h3 id="_system_requirements"><a class="link" href="#_system_requirements">System Requirements</a></h3>
<div class="paragraph">
<p>The binary test JAR must be run on one of the
<a href="installation.html#prerequisites_and_requirements">supported Kudu platforms</a>,
which includes:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>macOS El Capitan (10.11) or later;</p>
</li>
<li>
<p>CentOS 6.6+, Ubuntu 14.04+, or another recent distribution of Linux</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The related Maven integration using <code>os-maven-plugin</code> requires Maven 3.1 or later.</p>
</div>
</div>
<div class="sect2">
<h3 id="_using_the_kudu_binary_test_jar"><a class="link" href="#_using_the_kudu_binary_test_jar">Using the Kudu Binary Test Jar</a></h3>
<div class="paragraph">
<p>Take the following steps to start a Kudu mini cluster from a Java project.</p>
</div>
<div class="paragraph">
<p><strong>1. Add build-time dependencies.</strong> The <code>kudu-binary</code> artifact contains the
native Kudu (server and command-line tool) binaries for specific operating
systems. In order to download the right artifact for the running operating
system, use the <code>os-maven-plugin</code> to detect the current runtime environment.
Finally, the <code>kudu-test-utils</code> module provides the <code>KuduTestHarness</code> class,
which runs a Kudu mini cluster.</p>
</div>
<div class="paragraph">
<p>Maven example for Kudu 1.10.0:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-xml" data-lang="xml">&lt;build&gt;
&lt;extensions&gt;
&lt;!-- Used to find the right kudu-binary artifact with the Maven
property ${os.detected.classifier} --&gt;
&lt;extension&gt;
&lt;groupId&gt;kr.motd.maven&lt;/groupId&gt;
&lt;artifactId&gt;os-maven-plugin&lt;/artifactId&gt;
&lt;version&gt;1.6.2&lt;/version&gt;
&lt;/extension&gt;
&lt;/extensions&gt;
&lt;/build&gt;
&lt;dependencies&gt;
&lt;dependency&gt;
&lt;groupId&gt;org.apache.kudu&lt;/groupId&gt;
&lt;artifactId&gt;kudu-test-utils&lt;/artifactId&gt;
&lt;version&gt;1.10.0&lt;/version&gt;
&lt;scope&gt;test&lt;/scope&gt;
&lt;/dependency&gt;
&lt;dependency&gt;
&lt;groupId&gt;org.apache.kudu&lt;/groupId&gt;
&lt;artifactId&gt;kudu-binary&lt;/artifactId&gt;
&lt;version&gt;1.10.0&lt;/version&gt;
&lt;classifier&gt;${os.detected.classifier}&lt;/classifier&gt;
&lt;scope&gt;test&lt;/scope&gt;
&lt;/dependency&gt;
&lt;/dependencies&gt;</code></pre>
</div>
</div>
<div class="paragraph">
<p><strong>2. Write a test that starts a Kudu mini cluster using the KuduTestHarness.</strong>
It will automatically find the binary test JAR if Maven is configured correctly.</p>
</div>
<div class="paragraph">
<p>The recommended way to start a Kudu mini cluster is by using the
<code>KuduTestHarness</code> class from the <code>kudu-test-utils</code> module, which also acts as a
JUnit <code>Rule</code>. Here is an example of a Java-based integration test that starts a
Kudu cluster, creates a Kudu table on the cluster, and then exits:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-java" data-lang="java">import org.apache.kudu.ColumnSchema;
import org.apache.kudu.Schema;
import org.apache.kudu.Type;
import org.apache.kudu.test.KuduTestHarness;
import org.junit.Rule;
import org.junit.Test;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
public class MyKuduTest {
// The KuduTestHarness automatically starts and stops a real Kudu cluster
// when each test is run. Kudu persists its on-disk state in a temporary
// directory under a location defined by the environment variable TEST_TMPDIR
// if set, or under /tmp otherwise. That cluster data is deleted on
// successful exit of the test. The cluster output is logged through slf4j.
@Rule
public KuduTestHarness harness = new KuduTestHarness();
@Test
public void test() throws Exception {
// Get a KuduClient configured to talk to the running mini cluster.
KuduClient client = harness.getClient();
// Create a new Kudu table.
List&lt;ColumnSchema&gt; columns = new ArrayList&lt;&gt;();
columns.add(
new ColumnSchema.ColumnSchemaBuilder(
"key", Type.INT32).key(true).build());
Schema schema = new Schema(columns);
CreateTableOptions opts =
new CreateTableOptions().setRangePartitionColumns(
Collections.singletonList("key"));
client.createTable("table-1", schema, opts);
// Now we may insert rows into the newly-created Kudu table using 'client',
// scan the table, etc.
}
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>For more examples of using the <code>KuduTestHarness</code>, including how to pass
configuration options to the Kudu cluster being managed by the harness, see the
<a href="https://github.com/apache/kudu/tree/master/examples/java/java-example">java-example</a>
project in the Kudu source code repository, or look at the various Kudu
integration tests under
<a href="https://github.com/apache/kudu/tree/master/java">java</a> in the Kudu source
code repository.</p>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_kudu_python_client"><a class="link" href="#_kudu_python_client">Kudu Python Client</a></h2>
<div class="sectionbody">
<div class="paragraph">
<p>The Kudu Python client provides a Python friendly interface to the C++ client API.
The sample below demonstrates the use of part of the Python client.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-python" data-lang="python">import kudu
from kudu.client import Partitioning
from datetime import datetime
# Connect to Kudu master server
client = kudu.connect(host='kudu.master', port=7051)
# Define a schema for a new table
builder = kudu.schema_builder()
builder.add_column('key').type(kudu.int64).nullable(False).primary_key()
builder.add_column('ts_val', type_=kudu.unixtime_micros, nullable=False, compression='lz4')
schema = builder.build()
# Define partitioning schema
partitioning = Partitioning().add_hash_partitions(column_names=['key'], num_buckets=3)
# Create new table
client.create_table('python-example', schema, partitioning)
# Open a table
table = client.table('python-example')
# Create a new session so that we can apply write operations
session = client.new_session()
# Insert a row
op = table.new_insert({'key': 1, 'ts_val': datetime.utcnow()})
session.apply(op)
# Upsert a row
op = table.new_upsert({'key': 2, 'ts_val': "2016-01-01T00:00:00.000000"})
session.apply(op)
# Updating a row
op = table.new_update({'key': 1, 'ts_val': ("2017-01-01", "%Y-%m-%d")})
session.apply(op)
# Delete a row
op = table.new_delete({'key': 2})
session.apply(op)
# Flush write operations, if failures occur, capture print them.
try:
session.flush()
except kudu.KuduBadStatus as e:
print(session.get_pending_errors())
# Create a scanner and add a predicate
scanner = table.scanner()
scanner.add_predicate(table['ts_val'] == datetime(2017, 1, 1))
# Open Scanner and read all tuples
# Note: This doesn't scale for large scans
result = scanner.open().read_all_tuples()</code></pre>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_integration_with_mapreduce_yarn_and_other_frameworks"><a class="link" href="#_integration_with_mapreduce_yarn_and_other_frameworks">Integration with MapReduce, YARN, and Other Frameworks</a></h2>
<div class="sectionbody">
<div class="paragraph">
<p>Kudu was designed to integrate with MapReduce, YARN, Spark, and other frameworks in
the Hadoop ecosystem. See
<a href="https://github.com/apache/kudu/blob/master/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/RowCounter.java">RowCounter.java</a>
and
<a href="https://github.com/apache/kudu/blob/master/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportCsv.java">ImportCsv.java</a>
for examples which you can model your own integrations on. Stay tuned for more examples
using YARN and Spark in the future.</p>
</div>
</div>
</div>
</div>
<div class="col-md-3">
<div id="toc" data-spy="affix" data-offset-top="70">
<ul>
<li>
<a href="index.html">Introducing Kudu</a>
</li>
<li>
<a href="release_notes.html">Kudu Release Notes</a>
</li>
<li>
<a href="quickstart.html">Quickstart Guide</a>
</li>
<li>
<a href="installation.html">Installation Guide</a>
</li>
<li>
<a href="configuration.html">Configuring Kudu</a>
</li>
<li>
<a href="hive_metastore.html">Using the Hive Metastore with Kudu</a>
</li>
<li>
<a href="kudu_impala_integration.html">Using Impala with Kudu</a>
</li>
<li>
<a href="administration.html">Administering Kudu</a>
</li>
<li>
<a href="troubleshooting.html">Troubleshooting Kudu</a>
</li>
<li>
<span class="active-toc">Developing Applications with Kudu</span>
<ul class="sectlevel1">
<li><a href="#view_api">Viewing the API Documentation</a></li>
<li><a href="#_working_examples">Working Examples</a></li>
<li><a href="#_maven_artifacts">Maven Artifacts</a></li>
<li><a href="#_example_impala_commands_with_kudu">Example Impala Commands With Kudu</a></li>
<li><a href="#_kudu_integration_with_spark">Kudu Integration with Spark</a>
<ul class="sectlevel2">
<li><a href="#_upsert_option_in_kudu_spark">Upsert option in Kudu Spark</a></li>
<li><a href="#_using_spark_with_a_secure_kudu_cluster">Using Spark with a Secure Kudu Cluster</a></li>
<li><a href="#_spark_integration_best_practices">Spark Integration Best Practices</a>
<ul class="sectlevel3">
<li><a href="#_avoid_multiple_kudu_clients_per_cluster">Avoid multiple Kudu clients per cluster.</a></li>
</ul>
</li>
<li><a href="#_spark_integration_known_issues_and_limitations">Spark Integration Known Issues and Limitations</a></li>
</ul>
</li>
<li><a href="#_jvm_based_integration_testing">JVM-Based Integration Testing</a>
<ul class="sectlevel2">
<li><a href="#_system_requirements">System Requirements</a></li>
<li><a href="#_using_the_kudu_binary_test_jar">Using the Kudu Binary Test Jar</a></li>
</ul>
</li>
<li><a href="#_kudu_python_client">Kudu Python Client</a></li>
<li><a href="#_integration_with_mapreduce_yarn_and_other_frameworks">Integration with MapReduce, YARN, and Other Frameworks</a></li>
</ul>
</li>
<li>
<a href="schema_design.html">Kudu Schema Design</a>
</li>
<li>
<a href="scaling_guide.html">Kudu Scaling Guide</a>
</li>
<li>
<a href="security.html">Kudu Security</a>
</li>
<li>
<a href="transaction_semantics.html">Kudu Transaction Semantics</a>
</li>
<li>
<a href="background_tasks.html">Background Maintenance Tasks</a>
</li>
<li>
<a href="configuration_reference.html">Kudu Configuration Reference</a>
</li>
<li>
<a href="command_line_tools_reference.html">Kudu Command Line Tools Reference</a>
</li>
<li>
<a href="known_issues.html">Known Issues and Limitations</a>
</li>
<li>
<a href="contributing.html">Contributing to Kudu</a>
</li>
<li>
<a href="export_control.html">Export Control Notice</a>
</li>
</ul>
</div>
</div>
</div>
</div>
<footer class="footer">
<div class="row">
<div class="col-md-9">
<p class="small">
Copyright &copy; 2020 The Apache Software Foundation. Last updated 2020-05-18 13:51:59 PDT
</p>
<p class="small">
Apache Kudu, Kudu, Apache, the Apache feather logo, and the Apache Kudu
project logo are either registered trademarks or trademarks of The
Apache Software Foundation in the United States and other countries.
</p>
</div>
<div class="col-md-3">
<a class="pull-right" href="https://www.apache.org/events/current-event.html">
<img src="https://www.apache.org/events/current-event-234x60.png"/>
</a>
</div>
</div>
</footer>
</div>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
<script>
// Try to detect touch-screen devices. Note: Many laptops have touch screens.
$(document).ready(function() {
if ("ontouchstart" in document.documentElement) {
$(document.documentElement).addClass("touch");
} else {
$(document.documentElement).addClass("no-touch");
}
});
</script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js"
integrity="sha384-0mSbJDEHialfmuBBQP6A4Qrprq5OVfW37PRR3j5ELqxss1yVqOtnepnHVP9aJ7xS"
crossorigin="anonymous"></script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-68448017-1', 'auto');
ga('send', 'pageview');
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/anchor-js/3.1.0/anchor.js"></script>
<script>
anchors.options = {
placement: 'right',
visible: 'touch',
};
anchors.add();
</script>
</body>
</html>