| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| <meta charset="utf-8" /> |
| <meta http-equiv="X-UA-Compatible" content="IE=edge" /> |
| <meta name="viewport" content="width=device-width, initial-scale=1" /> |
| <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags --> |
| <meta name="description" content="A new open source Apache Hadoop ecosystem project, Apache Kudu completes Hadoop's storage layer to enable fast analytics on fast data" /> |
| <meta name="author" content="Cloudera" /> |
| <title>Apache Kudu - Developing Applications With Apache Kudu</title> |
| <!-- Bootstrap core CSS --> |
| <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css" |
| integrity="sha384-1q8mTJOASx8j1Au+a5WDVnPi2lkFfwwEAa8hDDdjZlpLegxhjVME1fgjWPGmkzs7" |
| crossorigin="anonymous"> |
| |
| <!-- Custom styles for this template --> |
| <link href="/css/kudu.css" rel="stylesheet"/> |
| <link href="/css/asciidoc.css" rel="stylesheet"/> |
| <link rel="shortcut icon" href="/img/logo-favicon.ico" /> |
| <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.6.1/css/font-awesome.min.css" /> |
| |
| |
| |
| <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> |
| <!--[if lt IE 9]> |
| <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script> |
| <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> |
| <![endif]--> |
| </head> |
| <body> |
| <div class="kudu-site container-fluid"> |
| <!-- Static navbar --> |
| <nav class="navbar navbar-default"> |
| <div class="container-fluid"> |
| <div class="navbar-header"> |
| <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar"> |
| <span class="sr-only">Toggle navigation</span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| |
| <a class="logo" href="/"><img |
| src="//d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_80px.png" |
| srcset="//d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_80px.png 1x, //d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_160px.png 2x" |
| alt="Apache Kudu"/></a> |
| |
| </div> |
| <div id="navbar" class="collapse navbar-collapse"> |
| <ul class="nav navbar-nav navbar-right"> |
| <li > |
| <a href="/">Home</a> |
| </li> |
| <li > |
| <a href="/overview.html">Overview</a> |
| </li> |
| <li class="active"> |
| <a href="/docs/">Documentation</a> |
| </li> |
| <li > |
| <a href="/releases/">Releases</a> |
| </li> |
| <li > |
| <a href="/blog/">Blog</a> |
| </li> |
| <!-- NOTE: this dropdown menu does not appear on Mobile, so don't add anything here |
| that doesn't also appear elsewhere on the site. --> |
| <li class="dropdown"> |
| <a href="/community.html" role="button" aria-haspopup="true" aria-expanded="false">Community <span class="caret"></span></a> |
| <ul class="dropdown-menu"> |
| <li class="dropdown-header">GET IN TOUCH</li> |
| <li><a class="icon email" href="/community.html">Mailing Lists</a></li> |
| <li><a class="icon slack" href="https://getkudu-slack.herokuapp.com/">Slack Channel</a></li> |
| <li role="separator" class="divider"></li> |
| <li><a href="/community.html#meetups-user-groups-and-conference-presentations">Events and Meetups</a></li> |
| <li><a href="/committers.html">Project Committers</a></li> |
| <li><a href="/ecosystem.html">Ecosystem</a></li> |
| <!--<li><a href="/roadmap.html">Roadmap</a></li>--> |
| <li><a href="/community.html#contributions">How to Contribute</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="dropdown-header">DEVELOPER RESOURCES</li> |
| <li><a class="icon github" href="https://github.com/apache/incubator-kudu">GitHub</a></li> |
| <li><a class="icon gerrit" href="http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu">Gerrit Code Review</a></li> |
| <li><a class="icon jira" href="https://issues.apache.org/jira/browse/KUDU">JIRA Issue Tracker</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="dropdown-header">SOCIAL MEDIA</li> |
| <li><a class="icon twitter" href="https://twitter.com/ApacheKudu">Twitter</a></li> |
| <li><a href="https://www.reddit.com/r/kudu/">Reddit</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="dropdown-header">APACHE SOFTWARE FOUNDATION</li> |
| <li><a href="https://www.apache.org/security/" target="_blank">Security</a></li> |
| <li><a href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Sponsorship</a></li> |
| <li><a href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li> |
| <li><a href="https://www.apache.org/licenses/" target="_blank">License</a></li> |
| </ul> |
| </li> |
| <li > |
| <a href="/faq.html">FAQ</a> |
| </li> |
| </ul><!-- /.nav --> |
| </div><!-- /#navbar --> |
| </div><!-- /.container-fluid --> |
| </nav> |
| |
| <!-- |
| |
| Licensed under the Apache License, Version 2.0 (the "License"); |
| you may not use this file except in compliance with the License. |
| You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| |
| <div class="container"> |
| <div class="row"> |
| <div class="col-md-9"> |
| |
| <h1>Developing Applications With Apache Kudu</h1> |
| <div id="preamble"> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p>Kudu provides C++, Java and Python client APIs, as well as reference examples to illustrate |
| their use.</p> |
| </div> |
| <div class="admonitionblock warning"> |
| <table> |
| <tr> |
| <td class="icon"> |
| <i class="fa icon-warning" title="Warning"></i> |
| </td> |
| <td class="content"> |
| Use of server-side or private interfaces is not supported, and interfaces |
| which are not part of public APIs have no stability guarantees. |
| </td> |
| </tr> |
| </table> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="view_api"><a class="link" href="#view_api">Viewing the API Documentation</a></h2> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <div class="title">C++ API Documentation</div> |
| <p>You can view the <a href="../cpp-client-api/index.html">C++ client API documentation</a> |
| online. Alternatively, after |
| <a href="installation.html#build_from_source">building Kudu from source</a>, you can |
| additionally build the <code>doxygen</code> target (e.g., run <code>make doxygen</code> if using |
| make) and use the locally generated API documentation by opening |
| <code>docs/doxygen/client_api/html/index.html</code> file in your favorite Web browser.</p> |
| </div> |
| <div class="admonitionblock note"> |
| <table> |
| <tr> |
| <td class="icon"> |
| <i class="fa icon-note" title="Note"></i> |
| </td> |
| <td class="content"> |
| In order to build the <code>doxygen</code> target, it’s necessary to have |
| doxygen of version 1.8.11 or newer with Dot (graphviz) support installed at |
| your build machine. If you installed doxygen after building Kudu from source, |
| you will need to run <code>cmake</code> again to pick up the doxygen location and generate |
| appropriate targets. |
| </td> |
| </tr> |
| </table> |
| </div> |
| <div class="paragraph"> |
| <div class="title">Java API Documentation</div> |
| <p>You can view the <a href="../apidocs/index.html">Java API documentation</a> online. |
| Alternatively, after <a href="installation.html#build_java_client">building |
| the Java client</a>, Java API documentation is available in |
| <code>java/kudu-client/target/apidocs/index.html</code>.</p> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="_working_examples"><a class="link" href="#_working_examples">Working Examples</a></h2> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p>Several example applications are provided in the |
| <a href="https://github.com/apache/kudu/tree/master/examples">examples directory</a> |
| of the Apache Kudu git repository. Each example includes a <code>README</code> that shows |
| how to compile and run it. The following list includes some of the |
| examples that are available today. Check the repository itself in case this list goes |
| out of date.</p> |
| </div> |
| <div class="dlist"> |
| <dl> |
| <dt class="hdlist1"><code>cpp/example.cc</code></dt> |
| <dd> |
| <p>A simple C++ application which connects to a Kudu instance, creates a table, writes data to it, then drops the table.</p> |
| </dd> |
| <dt class="hdlist1"><code>java/java-example</code></dt> |
| <dd> |
| <p>A simple Java application which connects to a Kudu instance, creates a table, writes data to it, then drops the table.</p> |
| </dd> |
| <dt class="hdlist1"><code>java/collectl</code></dt> |
| <dd> |
| <p>A small Java application which listens on a TCP socket for time series data corresponding to the Collectl wire protocol. |
| The commonly-available collectl tool can be used to send example data to the server.</p> |
| </dd> |
| <dt class="hdlist1"><code>java/insert-loadgen</code></dt> |
| <dd> |
| <p>A Java application that generates random insert load.</p> |
| </dd> |
| <dt class="hdlist1"><code>python/dstat-kudu</code></dt> |
| <dd> |
| <p>An example program that shows how to use the Kudu Python API to load data into a new / existing Kudu table |
| generated by an external program, <code>dstat</code> in this case.</p> |
| </dd> |
| <dt class="hdlist1"><code>python/graphite-kudu</code></dt> |
| <dd> |
| <p>An example plugin for using graphite-web with Kudu as a backend.</p> |
| </dd> |
| </dl> |
| </div> |
| <div class="paragraph"> |
| <p>These examples should serve as helpful starting points for your own Kudu applications and integrations.</p> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="_maven_artifacts"><a class="link" href="#_maven_artifacts">Maven Artifacts</a></h2> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p>The following Maven <code><dependency></code> element is valid for the Apache Kudu public release |
| (since 1.0.0):</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="highlight"><code class="language-xml" data-lang="xml"><dependency> |
| <groupId>org.apache.kudu</groupId> |
| <artifactId>kudu-client</artifactId> |
| <version>1.14.0</version> |
| </dependency></code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>Convenience binary artifacts for the Java client and various Java integrations (e.g. Spark, Flume) |
| are also available via the <a href="http://repository.apache.org">ASF Maven repository</a> and |
| <a href="https://mvnrepository.com/artifact/org.apache.kudu">Maven Central repository</a>.</p> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="_example_impala_commands_with_kudu"><a class="link" href="#_example_impala_commands_with_kudu">Example Impala Commands With Kudu</a></h2> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p>See <a href="kudu_impala_integration.html">Using Impala With Kudu</a> for guidance on installing |
| and using Impala with Kudu, including several <code>impala-shell</code> examples.</p> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="_kudu_integration_with_spark"><a class="link" href="#_kudu_integration_with_spark">Kudu Integration with Spark</a></h2> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p>Kudu integrates with Spark through the Data Source API as of version 1.0.0. |
| Include the kudu-spark dependency using the --packages option:</p> |
| </div> |
| <div class="paragraph"> |
| <p>Use the kudu-spark_2.10 artifact if using Spark with Scala 2.10. Note that Spark 1 is no |
| longer supported in Kudu starting from version 1.6.0. So in order to use Spark 1 integrated |
| with Kudu, version 1.5.0 is the latest to go to.</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="highlight"><code>spark-shell --packages org.apache.kudu:kudu-spark_2.10:1.5.0</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>Use kudu-spark2_2.11 artifact if using Spark 2 with Scala 2.11.</p> |
| </div> |
| <div class="admonitionblock note"> |
| <table> |
| <tr> |
| <td class="icon"> |
| <i class="fa icon-note" title="Note"></i> |
| </td> |
| <td class="content"> |
| kudu-spark versions 1.8.0 and below have slightly different syntax. |
| See the documentation of your version for a valid example. Versioned documentation can be found |
| on the <a href="http://kudu.apache.org/releases/">releases page</a>. |
| </td> |
| </tr> |
| </table> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="highlight"><code>spark-shell --packages org.apache.kudu:kudu-spark2_2.11:1.14.0</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>Below is a minimal Spark SQL "select" example. We first import the kudu spark package, |
| then create a DataFrame, and then create a view from the DataFrame. After those |
| steps, the table is accessible from Spark SQL.</p> |
| </div> |
| <div class="admonitionblock note"> |
| <table> |
| <tr> |
| <td class="icon"> |
| <i class="fa icon-note" title="Note"></i> |
| </td> |
| <td class="content"> |
| There is also a Spark <a href="https://github.com/apache/kudu/tree/master/examples/quickstart/spark">quickstart</a> |
| guide and another <a href="https://github.com/apache/kudu/tree/master/examples/scala/spark-example">example</a> |
| available. |
| </td> |
| </tr> |
| </table> |
| </div> |
| <div class="admonitionblock note"> |
| <table> |
| <tr> |
| <td class="icon"> |
| <i class="fa icon-note" title="Note"></i> |
| </td> |
| <td class="content"> |
| You can use the Kudu CLI tool to create table and generate data by |
| <code>kudu perf loadgen kudu.master:7051 -keep_auto_table</code> for the following two examples. |
| </td> |
| </tr> |
| </table> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="highlight"><code class="language-scala" data-lang="scala">import org.apache.kudu.spark.kudu._ |
| |
| // Create a DataFrame that points to the Kudu table we want to query. |
| val df = spark.read.options(Map("kudu.master" -> "kudu.master:7051", |
| "kudu.table" -> "default.my_table")).format("kudu").load |
| // Create a view from the DataFrame to make it accessible from Spark SQL. |
| df.createOrReplaceTempView("my_table") |
| // Now we can run Spark SQL queries against our view of the Kudu table. |
| spark.sql("select * from my_table").show()</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>Below is a more sophisticated example that includes both reads and writes:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="highlight"><code class="language-scala" data-lang="scala">import org.apache.kudu.client._ |
| import org.apache.kudu.spark.kudu.KuduContext |
| import collection.JavaConverters._ |
| |
| // Read a table from Kudu |
| val df = spark.read |
| .options(Map("kudu.master" -> "kudu.master:7051", "kudu.table" -> "kudu_table")) |
| .format("kudu").load |
| |
| // Query using the Spark API... |
| df.select("key").filter("key >= 5").show() |
| |
| // ...or register a temporary table and use SQL |
| df.createOrReplaceTempView("kudu_table") |
| val filteredDF = spark.sql("select key from kudu_table where key >= 5").show() |
| |
| // Use KuduContext to create, delete, or write to Kudu tables |
| val kuduContext = new KuduContext("kudu.master:7051", spark.sparkContext) |
| |
| // Create a new Kudu table from a DataFrame schema |
| // NB: No rows from the DataFrame are inserted into the table |
| kuduContext.createTable( |
| "test_table", df.schema, Seq("key"), |
| new CreateTableOptions() |
| .setNumReplicas(1) |
| .addHashPartitions(List("key").asJava, 3)) |
| |
| // Check for the existence of a Kudu table |
| kuduContext.tableExists("test_table") |
| |
| // Insert data |
| kuduContext.insertRows(df, "test_table") |
| |
| // Delete data |
| kuduContext.deleteRows(df, "test_table") |
| |
| // Upsert data |
| kuduContext.upsertRows(df, "test_table") |
| |
| // Update data |
| val updateDF = df.select($"key", ($"int_val" + 1).as("int_val")) |
| kuduContext.updateRows(updateDF, "test_table") |
| |
| // Data can also be inserted into the Kudu table using the data source, though the methods on |
| // KuduContext are preferred |
| // NB: The default is to upsert rows; to perform standard inserts instead, set operation = insert |
| // in the options map |
| // NB: Only mode Append is supported |
| df.write |
| .options(Map("kudu.master"-> "kudu.master:7051", "kudu.table"-> "test_table")) |
| .mode("append") |
| .format("kudu").save |
| |
| // Delete a Kudu table |
| kuduContext.deleteTable("test_table")</code></pre> |
| </div> |
| </div> |
| <div class="sect2"> |
| <h3 id="_upsert_option_in_kudu_spark"><a class="link" href="#_upsert_option_in_kudu_spark">Upsert option in Kudu Spark</a></h3> |
| <div class="paragraph"> |
| <p>The upsert operation in kudu-spark supports an extra write option of <code>ignoreNull</code>. If set to true, |
| it will avoid setting existing column values in Kudu table to Null if the corresponding DataFrame |
| column values are Null. If unspecified, <code>ignoreNull</code> is false by default.</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="highlight"><code class="language-scala" data-lang="scala">val dataFrame = spark.read |
| .options(Map("kudu.master" -> "kudu.master:7051", "kudu.table" -> simpleTableName)) |
| .format("kudu").load |
| dataFrame.createOrReplaceTempView(simpleTableName) |
| dataFrame.show() |
| // Below is the original data in the table 'simpleTableName' |
| +---+---+ |
| |key|val| |
| +---+---+ |
| | 0|foo| |
| +---+---+ |
| |
| // Upsert a row with existing key 0 and val Null with ignoreNull set to true |
| val nullDF = spark.createDataFrame(Seq((0, null.asInstanceOf[String]))).toDF("key", "val") |
| val wo = new KuduWriteOptions |
| wo.ignoreNull = true |
| kuduContext.upsertRows(nullDF, simpleTableName, wo) |
| dataFrame.show() |
| // The val field stays unchanged |
| +---+---+ |
| |key|val| |
| +---+---+ |
| | 0|foo| |
| +---+---+ |
| |
| // Upsert a row with existing key 0 and val Null with ignoreNull default/set to false |
| kuduContext.upsertRows(nullDF, simpleTableName) |
| // Equivalent to: |
| // val wo = new KuduWriteOptions |
| // wo.ignoreNull = false |
| // kuduContext.upsertRows(nullDF, simpleTableName, wo) |
| df.show() |
| // The val field is set to Null this time |
| +---+----+ |
| |key| val| |
| +---+----+ |
| | 0|null| |
| +---+----+</code></pre> |
| </div> |
| </div> |
| </div> |
| <div class="sect2"> |
| <h3 id="_using_spark_with_a_secure_kudu_cluster"><a class="link" href="#_using_spark_with_a_secure_kudu_cluster">Using Spark with a Secure Kudu Cluster</a></h3> |
| <div class="paragraph"> |
| <p>The Kudu Spark integration is able to operate on secure Kudu clusters which have |
| authentication and encryption enabled, but the submitter of the Spark job must |
| provide the proper credentials. For Spark jobs using the default 'client' deploy |
| mode, the submitting user must have an active Kerberos ticket granted through |
| <code>kinit</code>. For Spark jobs using the 'cluster' deploy mode, a Kerberos principal |
| name and keytab location must be provided through the <code>--principal</code> and |
| <code>--keytab</code> arguments to <code>spark2-submit</code>.</p> |
| </div> |
| </div> |
| <div class="sect2"> |
| <h3 id="_spark_integration_best_practices"><a class="link" href="#_spark_integration_best_practices">Spark Integration Best Practices</a></h3> |
| <div class="sect3"> |
| <h4 id="_avoid_multiple_kudu_clients_per_cluster"><a class="link" href="#_avoid_multiple_kudu_clients_per_cluster">Avoid multiple Kudu clients per cluster.</a></h4> |
| <div class="paragraph"> |
| <p>One common Kudu-Spark coding error is instantiating extra <code>KuduClient</code> objects. |
| In kudu-spark, a <code>KuduClient</code> is owned by the <code>KuduContext</code>. Spark application code |
| should not create another <code>KuduClient</code> connecting to the same cluster. Instead, |
| application code should use the <code>KuduContext</code> to access a <code>KuduClient</code> using |
| <code>KuduContext#syncClient</code>.</p> |
| </div> |
| <div class="paragraph"> |
| <p>To diagnose multiple <code>KuduClient</code> instances in a Spark job, look for signs in |
| the logs of the master being overloaded by many <code>GetTableLocations</code> or |
| <code>GetTabletLocations</code> requests coming from different clients, usually around the |
| same time. This symptom is especially likely in Spark Streaming code, |
| where creating a <code>KuduClient</code> per task will result in periodic waves of master |
| requests from new clients.</p> |
| </div> |
| </div> |
| </div> |
| <div class="sect2"> |
| <h3 id="_spark_integration_known_issues_and_limitations"><a class="link" href="#_spark_integration_known_issues_and_limitations">Spark Integration Known Issues and Limitations</a></h3> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>Spark 2.2+ requires Java 8 at runtime even though Kudu Spark 2.x integration |
| is Java 7 compatible. Spark 2.2 is the default dependency version as of |
| Kudu 1.5.0.</p> |
| </li> |
| <li> |
| <p>Kudu tables with a name containing upper case or non-ascii characters must be |
| assigned an alternate name when registered as a temporary table.</p> |
| </li> |
| <li> |
| <p>Kudu tables with a column name containing upper case or non-ascii characters |
| may not be used with SparkSQL. Columns may be renamed in Kudu to work around |
| this issue.</p> |
| </li> |
| <li> |
| <p><code><></code> and <code>OR</code> predicates are not pushed to Kudu, and instead will be evaluated |
| by the Spark task. Only <code>LIKE</code> predicates with a suffix wildcard are pushed to |
| Kudu, meaning that <code>LIKE "FOO%"</code> is pushed down but <code>LIKE "FOO%BAR"</code> isn’t.</p> |
| </li> |
| <li> |
| <p>Kudu does not support every type supported by Spark SQL. For example, |
| <code>Date</code> and complex types are not supported.</p> |
| </li> |
| <li> |
| <p>Kudu tables may only be registered as temporary tables in SparkSQL. |
| Kudu tables may not be queried using HiveContext.</p> |
| </li> |
| </ul> |
| </div> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="_jvm_based_integration_testing"><a class="link" href="#_jvm_based_integration_testing">JVM-Based Integration Testing</a></h2> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p>As of version 1.9.0, Kudu ships with an experimental feature called the binary |
| test JAR. This feature gives people who want to test against Kudu the |
| capability to start a Kudu "mini cluster" from Java or another JVM-based |
| language without having to first build Kudu locally. This is possible because |
| the Kudu binary JAR contains relocatable Kudu binaries that are used by the |
| <code>KuduTestHarness</code> in the <code>kudu-test-utils</code> module. The <code>KuduTestHarness</code> |
| contains logic to search the classpath for the Kudu binaries and to start a |
| mini cluster using them.</p> |
| </div> |
| <div class="paragraph"> |
| <p><em>Important: The <code>kudu-binary</code> module should only be used to run Kudu for |
| integration testing purposes. It should never be used to run an actual Kudu |
| service, in production or development, because the <code>kudu-binary</code> module |
| includes native security-related dependencies that have been copied from the |
| build system and will not be patched when the operating system on the runtime |
| host is patched.</em></p> |
| </div> |
| <div class="sect2"> |
| <h3 id="_system_requirements"><a class="link" href="#_system_requirements">System Requirements</a></h3> |
| <div class="paragraph"> |
| <p>The binary test JAR must be run on one of the |
| <a href="installation.html#prerequisites_and_requirements">supported Kudu platforms</a>, |
| which includes:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>macOS El Capitan (10.11) or later;</p> |
| </li> |
| <li> |
| <p>CentOS 6.6+, Ubuntu 14.04+, or another recent distribution of Linux</p> |
| </li> |
| </ul> |
| </div> |
| <div class="paragraph"> |
| <p>The related Maven integration using <code>os-maven-plugin</code> requires Maven 3.1 or later.</p> |
| </div> |
| </div> |
| <div class="sect2"> |
| <h3 id="_using_the_kudu_binary_test_jar"><a class="link" href="#_using_the_kudu_binary_test_jar">Using the Kudu Binary Test Jar</a></h3> |
| <div class="paragraph"> |
| <p>Take the following steps to start a Kudu mini cluster from a Java project.</p> |
| </div> |
| <div class="paragraph"> |
| <p><strong>1. Add build-time dependencies.</strong> The <code>kudu-binary</code> artifact contains the |
| native Kudu (server and command-line tool) binaries for specific operating |
| systems. In order to download the right artifact for the running operating |
| system, use the <code>os-maven-plugin</code> to detect the current runtime environment. |
| Finally, the <code>kudu-test-utils</code> module provides the <code>KuduTestHarness</code> class, |
| which runs a Kudu mini cluster.</p> |
| </div> |
| <div class="paragraph"> |
| <p>Maven example for Kudu 1.14.0:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="highlight"><code class="language-xml" data-lang="xml"><build> |
| <extensions> |
| <!-- Used to find the right kudu-binary artifact with the Maven |
| property ${os.detected.classifier} --> |
| <extension> |
| <groupId>kr.motd.maven</groupId> |
| <artifactId>os-maven-plugin</artifactId> |
| <version>1.6.2</version> |
| </extension> |
| </extensions> |
| </build> |
| |
| <dependencies> |
| <dependency> |
| <groupId>org.apache.kudu</groupId> |
| <artifactId>kudu-test-utils</artifactId> |
| <version>1.14.0</version> |
| <scope>test</scope> |
| </dependency> |
| <dependency> |
| <groupId>org.apache.kudu</groupId> |
| <artifactId>kudu-binary</artifactId> |
| <version>1.14.0</version> |
| <classifier>${os.detected.classifier}</classifier> |
| <scope>test</scope> |
| </dependency> |
| </dependencies></code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p><strong>2. Write a test that starts a Kudu mini cluster using the KuduTestHarness.</strong> |
| It will automatically find the binary test JAR if Maven is configured correctly.</p> |
| </div> |
| <div class="paragraph"> |
| <p>The recommended way to start a Kudu mini cluster is by using the |
| <code>KuduTestHarness</code> class from the <code>kudu-test-utils</code> module, which also acts as a |
| JUnit <code>Rule</code>. Here is an example of a Java-based integration test that starts a |
| Kudu cluster, creates a Kudu table on the cluster, and then exits:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="highlight"><code class="language-java" data-lang="java">import org.apache.kudu.ColumnSchema; |
| import org.apache.kudu.Schema; |
| import org.apache.kudu.Type; |
| import org.apache.kudu.test.KuduTestHarness; |
| import org.junit.Rule; |
| import org.junit.Test; |
| |
| import java.util.ArrayList; |
| import java.util.Collections; |
| import java.util.List; |
| |
| public class MyKuduTest { |
| // The KuduTestHarness automatically starts and stops a real Kudu cluster |
| // when each test is run. Kudu persists its on-disk state in a temporary |
| // directory under a location defined by the environment variable TEST_TMPDIR |
| // if set, or under /tmp otherwise. That cluster data is deleted on |
| // successful exit of the test. The cluster output is logged through slf4j. |
| @Rule |
| public KuduTestHarness harness = new KuduTestHarness(); |
| |
| @Test |
| public void test() throws Exception { |
| // Get a KuduClient configured to talk to the running mini cluster. |
| KuduClient client = harness.getClient(); |
| |
| // Create a new Kudu table. |
| List<ColumnSchema> columns = new ArrayList<>(); |
| columns.add( |
| new ColumnSchema.ColumnSchemaBuilder( |
| "key", Type.INT32).key(true).build()); |
| Schema schema = new Schema(columns); |
| CreateTableOptions opts = |
| new CreateTableOptions().setRangePartitionColumns( |
| Collections.singletonList("key")); |
| client.createTable("table-1", schema, opts); |
| |
| // Now we may insert rows into the newly-created Kudu table using 'client', |
| // scan the table, etc. |
| } |
| }</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>For more examples of using the <code>KuduTestHarness</code>, including how to pass |
| configuration options to the Kudu cluster being managed by the harness, see the |
| <a href="https://github.com/apache/kudu/tree/master/examples/java/java-example">java-example</a> |
| project in the Kudu source code repository, or look at the various Kudu |
| integration tests under |
| <a href="https://github.com/apache/kudu/tree/master/java">java</a> in the Kudu source |
| code repository.</p> |
| </div> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="_kudu_python_client"><a class="link" href="#_kudu_python_client">Kudu Python Client</a></h2> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p>The Kudu Python client provides a Python friendly interface to the C++ client API. |
| The sample below demonstrates the use of part of the Python client.</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="highlight"><code class="language-python" data-lang="python">import kudu |
| from kudu.client import Partitioning |
| from datetime import datetime |
| |
| # Connect to Kudu master server |
| client = kudu.connect(host='kudu.master', port=7051) |
| |
| # Define a schema for a new table |
| builder = kudu.schema_builder() |
| builder.add_column('key').type(kudu.int64).nullable(False).primary_key() |
| builder.add_column('ts_val', type_=kudu.unixtime_micros, nullable=False, compression='lz4') |
| schema = builder.build() |
| |
| # Define partitioning schema |
| partitioning = Partitioning().add_hash_partitions(column_names=['key'], num_buckets=3) |
| |
| # Create new table |
| client.create_table('python-example', schema, partitioning) |
| |
| # Open a table |
| table = client.table('python-example') |
| |
| # Create a new session so that we can apply write operations |
| session = client.new_session() |
| |
| # Insert a row |
| op = table.new_insert({'key': 1, 'ts_val': datetime.utcnow()}) |
| session.apply(op) |
| |
| # Upsert a row |
| op = table.new_upsert({'key': 2, 'ts_val': "2016-01-01T00:00:00.000000"}) |
| session.apply(op) |
| |
| # Updating a row |
| op = table.new_update({'key': 1, 'ts_val': ("2017-01-01", "%Y-%m-%d")}) |
| session.apply(op) |
| |
| # Delete a row |
| op = table.new_delete({'key': 2}) |
| session.apply(op) |
| |
| # Flush write operations, if failures occur, capture print them. |
| try: |
| session.flush() |
| except kudu.KuduBadStatus as e: |
| print(session.get_pending_errors()) |
| |
| # Create a scanner and add a predicate |
| scanner = table.scanner() |
| scanner.add_predicate(table['ts_val'] == datetime(2017, 1, 1)) |
| |
| # Open Scanner and read all tuples |
| # Note: This doesn't scale for large scans |
| result = scanner.open().read_all_tuples()</code></pre> |
| </div> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="_integration_with_mapreduce_yarn_and_other_frameworks"><a class="link" href="#_integration_with_mapreduce_yarn_and_other_frameworks">Integration with MapReduce, YARN, and Other Frameworks</a></h2> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p>Kudu was designed to integrate with MapReduce, YARN, Spark, and other frameworks in |
| the Hadoop ecosystem. See |
| <a href="https://github.com/apache/kudu/blob/master/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/RowCounter.java">RowCounter.java</a> |
| and |
| <a href="https://github.com/apache/kudu/blob/master/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportCsv.java">ImportCsv.java</a> |
| for examples which you can model your own integrations on. Stay tuned for more examples |
| using YARN and Spark in the future.</p> |
| </div> |
| </div> |
| </div> |
| </div> |
| <div class="col-md-3"> |
| |
| <div id="toc" data-spy="affix" data-offset-top="70"> |
| <ul> |
| |
| <li> |
| |
| <a href="index.html">Introducing Kudu</a> |
| </li> |
| <li> |
| |
| <a href="release_notes.html">Kudu Release Notes</a> |
| </li> |
| <li> |
| |
| <a href="quickstart.html">Quickstart Guide</a> |
| </li> |
| <li> |
| |
| <a href="installation.html">Installation Guide</a> |
| </li> |
| <li> |
| |
| <a href="configuration.html">Configuring Kudu</a> |
| </li> |
| <li> |
| |
| <a href="hive_metastore.html">Using the Hive Metastore with Kudu</a> |
| </li> |
| <li> |
| |
| <a href="kudu_impala_integration.html">Using Impala with Kudu</a> |
| </li> |
| <li> |
| |
| <a href="administration.html">Administering Kudu</a> |
| </li> |
| <li> |
| |
| <a href="troubleshooting.html">Troubleshooting Kudu</a> |
| </li> |
| <li> |
| <span class="active-toc">Developing Applications with Kudu</span> |
| <ul class="sectlevel1"> |
| <li><a href="#view_api">Viewing the API Documentation</a></li> |
| <li><a href="#_working_examples">Working Examples</a></li> |
| <li><a href="#_maven_artifacts">Maven Artifacts</a></li> |
| <li><a href="#_example_impala_commands_with_kudu">Example Impala Commands With Kudu</a></li> |
| <li><a href="#_kudu_integration_with_spark">Kudu Integration with Spark</a> |
| <ul class="sectlevel2"> |
| <li><a href="#_upsert_option_in_kudu_spark">Upsert option in Kudu Spark</a></li> |
| <li><a href="#_using_spark_with_a_secure_kudu_cluster">Using Spark with a Secure Kudu Cluster</a></li> |
| <li><a href="#_spark_integration_best_practices">Spark Integration Best Practices</a> |
| <ul class="sectlevel3"> |
| <li><a href="#_avoid_multiple_kudu_clients_per_cluster">Avoid multiple Kudu clients per cluster.</a></li> |
| </ul> |
| </li> |
| <li><a href="#_spark_integration_known_issues_and_limitations">Spark Integration Known Issues and Limitations</a></li> |
| </ul> |
| </li> |
| <li><a href="#_jvm_based_integration_testing">JVM-Based Integration Testing</a> |
| <ul class="sectlevel2"> |
| <li><a href="#_system_requirements">System Requirements</a></li> |
| <li><a href="#_using_the_kudu_binary_test_jar">Using the Kudu Binary Test Jar</a></li> |
| </ul> |
| </li> |
| <li><a href="#_kudu_python_client">Kudu Python Client</a></li> |
| <li><a href="#_integration_with_mapreduce_yarn_and_other_frameworks">Integration with MapReduce, YARN, and Other Frameworks</a></li> |
| </ul> |
| </li> |
| <li> |
| |
| <a href="schema_design.html">Kudu Schema Design</a> |
| </li> |
| <li> |
| |
| <a href="scaling_guide.html">Kudu Scaling Guide</a> |
| </li> |
| <li> |
| |
| <a href="security.html">Kudu Security</a> |
| </li> |
| <li> |
| |
| <a href="transaction_semantics.html">Kudu Transaction Semantics</a> |
| </li> |
| <li> |
| |
| <a href="background_tasks.html">Background Maintenance Tasks</a> |
| </li> |
| <li> |
| |
| <a href="configuration_reference.html">Kudu Configuration Reference</a> |
| </li> |
| <li> |
| |
| <a href="command_line_tools_reference.html">Kudu Command Line Tools Reference</a> |
| </li> |
| <li> |
| |
| <a href="metrics_reference.html">Kudu Metrics Reference</a> |
| </li> |
| <li> |
| |
| <a href="known_issues.html">Known Issues and Limitations</a> |
| </li> |
| <li> |
| |
| <a href="contributing.html">Contributing to Kudu</a> |
| </li> |
| <li> |
| |
| <a href="export_control.html">Export Control Notice</a> |
| </li> |
| </ul> |
| </div> |
| </div> |
| </div> |
| </div> |
| <footer class="footer"> |
| <div class="row"> |
| <div class="col-md-9"> |
| <p class="small"> |
| Copyright © 2020 The Apache Software Foundation. Last updated 2021-05-13 00:14:36 -0700 |
| </p> |
| <p class="small"> |
| Apache Kudu, Kudu, Apache, the Apache feather logo, and the Apache Kudu |
| project logo are either registered trademarks or trademarks of The |
| Apache Software Foundation in the United States and other countries. |
| </p> |
| </div> |
| <div class="col-md-3"> |
| <a class="pull-right" href="https://www.apache.org/events/current-event.html"> |
| <img src="https://www.apache.org/events/current-event-234x60.png"/> |
| </a> |
| </div> |
| </div> |
| </footer> |
| </div> |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script> |
| <script> |
| // Try to detect touch-screen devices. Note: Many laptops have touch screens. |
| $(document).ready(function() { |
| if ("ontouchstart" in document.documentElement) { |
| $(document.documentElement).addClass("touch"); |
| } else { |
| $(document.documentElement).addClass("no-touch"); |
| } |
| }); |
| </script> |
| <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js" |
| integrity="sha384-0mSbJDEHialfmuBBQP6A4Qrprq5OVfW37PRR3j5ELqxss1yVqOtnepnHVP9aJ7xS" |
| crossorigin="anonymous"></script> |
| <script> |
| (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ |
| (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), |
| m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) |
| })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); |
| |
| ga('create', 'UA-68448017-1', 'auto'); |
| ga('send', 'pageview'); |
| </script> |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/anchor-js/3.1.0/anchor.js"></script> |
| <script> |
| anchors.options = { |
| placement: 'right', |
| visible: 'touch', |
| }; |
| anchors.add(); |
| </script> |
| </body> |
| </html> |
| |