learning/prompts/documentation-lookup-nolinks/18_io

Prompt: Is Apache Kafka supported in Apache Beam?

Response: Apache Kafka is a distributed streaming platform that is used for building real-time data pipelines and streaming applications. It is horizontally scalable, fault-tolerant, and fast.

Apache Beam has a built-in KafkaIO connector for Apache Kafka that enables Apache Beam pipelines to read data from and write data to Kafka. Apache Kafka is supported in both batch and streaming pipelines and can serve as both a source and a sink. For more information, refer to the KafkaIO connector documentation for your preferred programming language.

To get started with Apache Kafka and Apache Beam, refer to the Dataflow Cookbook repository.

Here is an example of Apache Beam pipeline code for reading data from a Kafka topic:

with beam.Pipeline(options=options) as p:
    output = (
        p
        | "Read from Kafka"
        >> ReadFromKafka(
            consumer_config={"bootstrap.servers": "localhost:9092"},
            topics=["my_topic"],
            with_metadata=False,
        )
        | "Log Data" >> Map(logging.info)
    )

This code reads data from a Kafka topic my_topic on localhost:9092 and logs the data to the console.

For a detailed demonstration of using the KafkaIO connector, refer to the KafkaIO example in the Apache Beam GitHub repository.