Prompt: What are the metrics in Apache Beam?
Response: In the Apache Beam model, metrics provide insights into the current state of your pipeline, including during pipeline execution.
Metrics are named and scoped to a specific step in the pipeline. They can be created dynamically during pipeline execution. If a runner doesn't support some part of reporting metrics, the fallback behavior is to drop the metric updates rather than fail the pipeline.
Apache Beam provides a number of built-in metric types:
To declare a metric, use the beam.metrics.Metrics class. For example:
self.words_counter = Metrics.counter(self.__class__, 'words') self.word_lengths_counter = Metrics.counter(self.__class__, 'word_lengths') self.word_lengths_dist = Metrics.distribution(self.__class__, 'word_len_dist') self.empty_line_counter = Metrics.counter(self.__class__, 'empty_lines')
For implementation details, see the WordCount example with metrics in the Apache Beam GitHub repository.
You can export metrics to external sinks. Spark and Flink runners support REST HTTP and Graphite.