| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| <p> Here is a description of a few of the popular use cases for Apache Kafka®. |
| For an overview of a number of these areas in action, see <a href="https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying/">this blog post</a>. </p> |
| |
| <h4 class="anchor-heading"><a id="uses_messaging" class="anchor-link"></a><a href="#uses_messaging">Messaging</a></h4> |
| |
| Kafka works well as a replacement for a more traditional message broker. |
| Message brokers are used for a variety of reasons (to decouple processing from data producers, to buffer unprocessed messages, etc). |
| In comparison to most messaging systems Kafka has better throughput, built-in partitioning, replication, and fault-tolerance which makes it a good |
| solution for large scale message processing applications. |
| <p> |
| In our experience messaging uses are often comparatively low-throughput, but may require low end-to-end latency and often depend on the strong |
| durability guarantees Kafka provides. |
| <p> |
| In this domain Kafka is comparable to traditional messaging systems such as <a href="http://activemq.apache.org">ActiveMQ</a> or |
| <a href="https://www.rabbitmq.com">RabbitMQ</a>. |
| |
| <h4 class="anchor-heading"><a id="uses_website" class="anchor-link"></a><a href="#uses_website">Website Activity Tracking</a></h4> |
| |
| The original use case for Kafka was to be able to rebuild a user activity tracking pipeline as a set of real-time publish-subscribe feeds. |
| This means site activity (page views, searches, or other actions users may take) is published to central topics with one topic per activity type. |
| These feeds are available for subscription for a range of use cases including real-time processing, real-time monitoring, and loading into Hadoop or |
| offline data warehousing systems for offline processing and reporting. |
| <p> |
| Activity tracking is often very high volume as many activity messages are generated for each user page view. |
| |
| <h4 class="anchor-heading"><a id="uses_metrics" class="anchor-link"></a><a href="#uses_metrics">Metrics</a></h4> |
| |
| Kafka is often used for operational monitoring data. |
| This involves aggregating statistics from distributed applications to produce centralized feeds of operational data. |
| |
| <h4 class="anchor-heading"><a id="uses_logs" class="anchor-link"></a><a href="#uses_logs">Log Aggregation</a></h4> |
| |
| Many people use Kafka as a replacement for a log aggregation solution. |
| Log aggregation typically collects physical log files off servers and puts them in a central place (a file server or HDFS perhaps) for processing. |
| Kafka abstracts away the details of files and gives a cleaner abstraction of log or event data as a stream of messages. |
| This allows for lower-latency processing and easier support for multiple data sources and distributed data consumption. |
| |
| In comparison to log-centric systems like Scribe or Flume, Kafka offers equally good performance, stronger durability guarantees due to replication, |
| and much lower end-to-end latency. |
| |
| <h4 class="anchor-heading"><a id="uses_streamprocessing" class="anchor-link"></a><a href="#uses_streamprocessing">Stream Processing</a></h4> |
| |
| Many users of Kafka process data in processing pipelines consisting of multiple stages, where raw input data is consumed from Kafka topics and then |
| aggregated, enriched, or otherwise transformed into new topics for further consumption or follow-up processing. |
| For example, a processing pipeline for recommending news articles might crawl article content from RSS feeds and publish it to an "articles" topic; |
| further processing might normalize or deduplicate this content and publish the cleansed article content to a new topic; |
| a final processing stage might attempt to recommend this content to users. |
| Such processing pipelines create graphs of real-time data flows based on the individual topics. |
| Starting in 0.10.0.0, a light-weight but powerful stream processing library called <a href="/documentation/streams">Kafka Streams</a> |
| is available in Apache Kafka to perform such data processing as described above. |
| Apart from Kafka Streams, alternative open source stream processing tools include <a href="https://storm.apache.org/">Apache Storm</a> and |
| <a href="http://samza.apache.org/">Apache Samza</a>. |
| |
| <h4 class="anchor-heading"><a id="uses_eventsourcing" class="anchor-link"></a><a href="#uses_eventsourcing">Event Sourcing</a></h4> |
| |
| <a href="http://martinfowler.com/eaaDev/EventSourcing.html">Event sourcing</a> is a style of application design where state changes are logged as a |
| time-ordered sequence of records. Kafka's support for very large stored log data makes it an excellent backend for an application built in this style. |
| |
| <h4 class="anchor-heading"><a id="uses_commitlog" class="anchor-link"></a><a href="#uses_commitlog">Commit Log</a></h4> |
| |
| Kafka can serve as a kind of external commit-log for a distributed system. The log helps replicate data between nodes and acts as a re-syncing |
| mechanism for failed nodes to restore their data. |
| The <a href="/documentation.html#compaction">log compaction</a> feature in Kafka helps support this usage. |
| In this usage Kafka is similar to <a href="https://bookkeeper.apache.org/">Apache BookKeeper</a> project. |