Gobblin ETL comes equipped with instrumentation using [Gobblin Metrics](Gobblin Metrics), as well as end points to easily extend this instrumentation.
The following configurations are used for metrics and event emission:
Configuration Key | Definition | Default |
---|---|---|
metrics.enabled | Whether metrics are enabled. If false, will not report metrics. | true |
metrics.report.interval | Metrics report interval in milliseconds. | 30000 |
metrics.reporting.file.enabled | Whether metrics will be reported to a file. | false |
metrics.log.dir | If file enabled, the directory where metrics will be written. If missing, will not report to file. | N/A |
metrics.reporting.kafka.enabled | Whether metrics will be reported to Kafka. | false |
metrics.reporting.kafka.brokers | Kafka brokers for Kafka metrics emission. | N/A |
metrics.reporting.kafka.topic.metrics | Kafka topic where metrics (but not events) will be reported. | N/A |
metrics.reporting.kafka.topic.events | Kafka topic where events (but not metrics) will be reported. | N/A |
metrics.reporting.kafka.format | Format of metrics / events emitted to Kafka. (Options: json, avro) | json |
metrics.reporting.kafka.avro.use.schema.registry | Whether to use a schema registry for Kafka emitting. | false |
kafka.schema.registry.url | If using schema registry, the url of the schema registry. | N/A |
metrics.reporting.jmx.enabled | Whether to report metrics to JMX. | false |
metrics.reporting.custom.builders | Comma-separated list of classes for custom metrics reporters. (See Custom Reporters) |
Each construct in a Gobblin ETL run computes metrics regarding it's performance / progress. Each metric is tagged by default with the following tags:
This is the list of operational metrics implemented by default, grouped by construct.
The Gobblin ETL runtime emits events marking its progress. All events have the following metadata:
This is the list of events that are emitted by the Gobblin runtime:
These events give information on timing on certain parts of the execution. Each timing event contains the following metadata:
The following timing events are emitted:
When using a custom construct (for example a custom extractor for your data source), you will get the above mentioned instrumentation for free. However, you may want to implement additional metrics. To aid with this, instead of extending the usual class Extractor, you can extend the class gobblin.instrumented.extractor.InstrumentedExtractor
. Similarly, for each construct there is an instrumented version that allows extension of the default metrics (InstrumentedExtractor, InstrumentedConverter, InstrumentedForkOperator, InstrumentedRowLevelPolicy, and InstrumentedDataWriter).
All of the instrumented constructs have Javadoc providing with additional information. In general, when extending an instrumented construct, you will have to implement a different method. For example, when extending an InstrumentedExtractor, instead of implementing readRecord
, you will implement readRecordImpl
. To make this clearer for the user, implementing readRecord
will throw a compilation error, and the javadoc of each method specifies the method that should be implemented.
Instrumented constructs extend the interface Instrumentable. It contains the following methods:
getMetricContext()
: get the default metric context generated for that instance of the construct, with all the appropriate tags. Use this metric context to create any additional metrics.isInstrumentationEnabled()
: returns true if instrumentation is enabled.switchMetricsContext(List<Tag<?>>)
: switches the default metric context returned by getMetricContext()
to a metric context containing the supplied tags. All default metrics will be reported to the new metric context. This method is useful when the state of a construct changes during the execution, and the user desires to reflect that in the emitted tags (for example, Kafka extractor can handle multiple topics in the same extractor, and we want to reflect this in the metrics).switchMetricContext(MetricContext)
: similar to the above method, but uses the supplied metric context instead of generating a new metric context. It is the responsibility of the caller to ensure the new metric context has the correct tags and parent.The following method can be re-implemented by the user:
generateTags(State)
: this method should return a list of tags to use for metric contexts created for this construct. If overriding this method, it is always a good idea to call super()
and only append tags to this list.Instrumented constructs have a set of callback methods that are called at different points in the processing of each record, and which can be used to update metrics. For example, the InstrumentedExtractor
has the callbacks beforeRead()
, afterRead(D, long)
, and onException(Exception)
. The javadoc for the instrumented constructs has further descriptions for each callback. Users should always call super()
when overriding this callbacks, as default metrics depend on that.
Besides the reporters implemented by default (file, Kafka, and JMX), users can add custom reporters to the classpath and instruct Gobblin to use these reporters. To do this, users should extend the interface CustomReporterFactory, and specify a comma-separated list of CustomReporterFactory classes in the configuration key metrics.reporting.custom.builders
.
Gobblin will automatically search for these CustomReporterFactory implementations, instantiate each one with a parameter-less constructor, and then call the method newScheduledReporter(MetricContext, Properties)
, where the properties contain all of the input configurations supplied to Gobblin. Gobblin will then manage this ScheduledReporter
.