Prompt: What are the metrics in Apache Beam?
Response: In the Apache Beam model, metrics provide insights into the current state of your pipeline, including during pipeline execution.
Metrics are named and scoped to a specific step in the pipeline. They can be created dynamically during pipeline execution. If a runner doesn't support some part of reporting metrics, the fallback behavior is to drop the metric updates rather than fail the pipeline.
Beam provides a number of built-in metric types:
To declare a metric, use the beam.metrics.Metrics class. For example:
self.words_counter = Metrics.counter(self.__class__, 'words') self.word_lengths_counter = Metrics.counter(self.__class__, 'word_lengths') self.word_lengths_dist = Metrics.distribution(self.__class__, 'word_len_dist') self.empty_line_counter = Metrics.counter(self.__class__, 'empty_lines')
For implementation details, see the WordCount example with metrics.
You can export metrics to external sinks. Spark and Flink runners support REST HTTP and Graphite.