<h2 id="apache-kudu-ecosystem">Apache Kudu Ecosystem</h2>
<p>While the Apache Kudu project provides client bindings that allow users to
mutate and fetch data, more complex access patterns are often written via SQL
and compute engines. This is a non-exhaustive list of projects that integrate
with Kudu to enhance ingest, querying capabilities, and orchestration.</p>
<h3 id="frequently-used">Frequently used</h3>
<p>The following integrations are among the most commonly used with Apache Kudu
(sorted alphabetically).</p>
<li><a href="#apache-impala">Apache Impala</a></li>
<li><a href="#apache-nifi">Apache Nifi</a></li>
<li><a href="#apache-spark-sql">Apache Spark SQL</a></li>
<li><a href="#presto">Presto</a></li>
<h3 id="sql">SQL</h3>
<h4 id="apache-drill"><a href="">Apache Drill</a></h4>
<p>Apache Drill provides schema-free SQL Query Engine for Hadoop, NoSQL and Cloud
Storage. See the <a href="">Drill Kudu API
for more details.</p>
<h4 id="apache-hive"><a href="">Apache Hive</a></h4>
<p>The Apache Hive ™ data warehouse software facilitates reading, writing, and
managing large datasets residing in distributed storage using SQL. See the
<a href="">Hive Kudu integration
for more details.</p>
<h4 id="apache-impala"><a href="">Apache Impala</a></h4>
<p>Apache Impala is the open source, native analytic database for Apache Hadoop.
See the <a href="">Kudu Impala integration
documentation</a> for
more details.</p>
<h4 id="apache-spark-sql"><a href="">Apache Spark SQL</a></h4>
<p>Spark SQL is a Spark module for structured data processing. See the <a href="">Kudu Spark
for more details.</p>
<h4 id="presto"><a href="">Presto</a></h4>
<p>Presto is an open source distributed SQL query engine for running interactive
analytic queries against data sources of all sizes ranging from gigabytes to
petabytes. See the <a href="">Presto Kudu connector
documentation</a> for more
<h3 id="computation">Computation</h3>
<h4 id="apache-beam"><a href="">Apache Beam</a></h4>
<p>Apache Beam is a unified model for defining both batch and streaming
data-parallel processing pipelines, as well as a set of language-specific SDKs
for constructing pipelines and Runners for executing them on distributed
processing backends. See the <a href="">Beam Kudu source and sink
for more details.</p>
<h4 id="apache-spark"><a href="">Apache Spark</a></h4>
<p>Apache Spark is a unified analytics engine for large-scale data processing. See
the <a href="">Kudu Spark integration
for more details.</p>
<h4 id="pandas"><a href="">Pandas</a></h4>
<p>Pandas is an open source, BSD-licensed library providing high-performance,
easy-to-use data structures and data analysis tools for the Python programming
language. Kudu Python scanners can be converted to Pandas DataFrames. See
<a href="">Kudu’s Python
for example usage.</p>
<h3 id="talend-big-data"><a href="">Talend Big Data</a></h3>
<p>Talend simplifies and automates big data integration projects with on demand
Serverless Spark and machine learning. See <a href="">Talend’s Kudu component
for more details.</p>
<h3 id="ingest">Ingest</h3>
<h4 id="akka"><a href="">Akka</a></h4>
<p>Akka facilitates building highly concurrent, distributed, and resilient
message-driven applications on the JVM. See the <a href="">Alpakka Kudu connector
documentation</a> for more
<h4 id="apache-flink"><a href="">Apache Flink</a></h4>
<p>Apache Flink is a framework and distributed processing engine for stateful
computations over unbounded and bounded data streams. See the <a href="">Flink Kudu
for more details.</p>
<h4 id="apache-nifi"><a href="">Apache Nifi</a></h4>
<p>Apache NiFi supports powerful and scalable directed graphs of data routing,
transformation, and system mediation logic. See the <a href="">PutKudu processor
for more details.</p>
<h4 id="apache-spark-streaming"><a href="">Apache Spark Streaming</a></h4>
<p>Spark Streaming is an extension of the core Spark API that enables scalable,
high-throughput, fault-tolerant stream processing of live data streams.
See <a href="">Kudu’s Spark Streaming
for example usage.</p>
<h4 id="confluent-platform-kafka"><a href="">Confluent Platform Kafka</a></h4>
<p>Apache Kafka is an open-source distributed event streaming platform used by
thousands of companies for high-performance data pipelines, streaming
analytics, data integration, and mission-critical applications. See the <a href="">Kafka
Kudu connector
for more details.</p>
<h4 id="streamsets-data-collector"><a href="">StreamSets Data Collector</a></h4>
<p>StreamSets Data Collector is a lightweight, powerful engine that streams data
in real time. See the <a href="">StreamSets Data Collector Kudu destination
<h4 id="striim"><a href="">Striim</a></h4>
<p>Striim is real-time data integration software that enables continuous data
ingestion, in-flight stream processing, and delivery. See the <a href="">Striim Kudu
documentation</a> for
more details.</p>
<h4 id="tibco-streambase"><a href="">TIBCO StreamBase</a></h4>
<p>TIBCO StreamBase® is an event processing platform for applying mathematical and
relational processing to real-time data streams. See the <a href="">StreamBase Kudu
for more details.</p>
<h3 id="deployment-and-orchestration">Deployment and Orchestration</h3>
<h4 id="apache-camel"><a href="">Apache Camel</a></h4>
<p>Camel is an open source integration framework that empowers you to quickly and
easily integrate various systems consuming or producing data. See the <a href="">Camel
Kudu component
for more details.</p>
<h4 id="cloudera-manager"><a href="">Cloudera Manager</a></h4>
<p>Cloudera Manager is an end-to-end application for managing CDH clusters. See
the <a href="">Cloudera Manager documentation for
for more details.</p>
<h4 id="docker"><a href="">Docker</a></h4>
<p>Docker facilitates packaging software into standardized units for development,
shipment, and deployment. See the official <a href="">Apache Kudu
Dockerhub</a> and the <a href="">Apache Kudu Docker
Quickstart</a> for more details.</p>
<h4 id="wavefront"><a href="">Wavefront</a></h4>
<p>Wavefront is a high-performance streaming analytics platform that supports 3D
observability. See the <a href="">Wavefront Kudu integration
documentation</a> for more details.</p>
<h3 id="visualization">Visualization</h3>
<h4 id="zoomdata"><a href="">Zoomdata</a></h4>
<p>Zoomdata provides a high-performance BI engine and visually engaging,
interactive dashboards. See <a href="">Zoomdata’s Kudu
page</a> for
more details.</p>
<h2 id="distribution-and-support">Distribution and Support</h2>
<p>While Kudu is an Apache-licensed open source project, software vendors may
package and license it with other components to facilitate consumption. These
offerings are typically bundled with support to tune and facilitate
<li><a href="">Cloudera CDH</a></li>
<li><a href="">phData</a></li>
