Prometheus imposes strict constraints to the content sent to remote-write, including label format and ordering, sample time ordering etc.
For efficiency, the sink does not do any validation or reordering of the input. It's responsibility of the application ensuring that input is well-formed.
Any malformed data will be rejected on write to Prometheus. Depending on the error handling behaviors configured, the sink will throw an exception stopping the job (default), or drop the entire write-request, log the fact, and continue.
For complete details about these constraints, refer to the remote-write specifications.
The sink batches multiple time-series into a single write-request, retaining the order..
Batching is based on the number of samples. Each write-request contains up to 500 samples, with a max buffering time of 5 seconds (both configurable). The number of time-series doesn't matter.
As by Prometheus Remote-Write specifications, the sink retries 5xx and 429 responses. Retrying is blocking, to retain sample ordering, and uses and exponential backoff.
The exponential backoff starts with an initial delay (default 30 ms) and increases it exponentially up to a max retry delay (default 5 sec). It continues retrying until the max number of retries is reached (default reties forever).
On non-retryable error response (4xx, except 429, non retryable exceptions) the sink will always discard and continue (DISCARD_AND_CONTINUE
behavior; see details below).
On reaching the retry limit, depending on the configured error handling behavior for “Max retries exceeded”, the sink will either throw an exception (FAIL
, default behavior), or discard the entire write-request, log a warning and continue. See error handling behavior, below, for further details.
Example of sink initialisation (for documentation purposes, we are setting all parameters to their default values): Sink
PrometheusSink sink = PrometheusSink.builder() .setMaxBatchSizeInSamples(500) // Batch size (write-request size), in samples (default: 500) .setMaxRecordSizeInSamples(500) // Max sink input record size, in samples (default: 500), must be <= maxBatchSizeInSamples - If exceeded the job will continuously fail and restart! .setMaxTimeInBufferMS(5000) // Max time a time-series is buffered for batching (default: 5000 ms) .setRetryConfiguration(RetryConfiguration.builder() .setInitialRetryDelayMS(30L) // Initial retry delay (default: 30 ms) .setMaxRetryDelayMS(5000L) // Maximum retry delay, with exponential backoff (default: 5000 ms) .setMaxRetryCount(100) // Max number of retries (default: 100) .build()) .setSocketTimeoutMs(5000) // Http client socket timeout (default: 5000 ms) .setPrometheusRemoteWriteUrl(prometheusRemoteWriteUrl) // Remote-write URL .setRequestSigner(new AmazonManagedPrometheusWriteRequestSigner(prometheusRemoteWriteUrl, prometheusRegion)) // Optional request signed (AMP request signer in this example) .setErrorHandlingBehaviorConfiguration(SinkWriterErrorHandlingBehaviorConfiguration.builder() // Error handling behaviors. See description below, for more details. // Default is DISCARD_AND_CONTINUE for non-retryable errors .onPrometheusNonRetryableError(OnErrorBehavior.DISCARD_AND_CONTINUE) // Default is FAIL for other error types .onMaxRetryExceeded(OnErrorBehavior.FAIL) .build()) .setMetricGroupName("Prometheus") // Customizable metric-group suffix (default: "Prometheus") .build();
When the sink has parallelism > 1, the stream must be partitioned so that all time-series with same labels go to the same sink operator sub-task. If this is not the case, samples may be written out-of-order, and be rejected by Prometheus.
A keyBy()
using the provided key selector, PrometheusTimeSeriesLabelsAndMetricNameKeySelector
, automatically partitions the time-series by labels.
The sink supports optional request-signing for authentication, implementing the PrometheusRequestSigner
interface.
The sink complies with Prometheus remote-write specs not retrying any request that return status codes 4xx
, except 429
, and retrying requests that return 5xx
or 429
, with an exponential backoff strategy.
The retry strategy can be configured, as shown in the following snippet:
PrometheusSink sink = PrometheusSink.builder() .setRetryConfiguration(RetryConfiguration.builder() .setInitialRetryDelayMS(30L) // Initial retry delay (default: 30 ms) .setMaxRetryDelayMS(5000L) // Maximum retray delay, with exponential backoff (default: 5000 ms) .setMaxRetryCount(100) // Max number of retries (default: 100) .build()) // ... .build();
The behavior of the sink, when an unrecoverable error happens while writing to Prometheus remote-write endpoint, is configurable.
The possible behaviors are:
FAIL
: throw a PrometheusSinkWriteException
, causing the job to fail.DISCARD_AND_CONTINUE
: log the reason of the error, discard the offending request, and continue.There are two error conditions:
4xx
status code except 429
). Default: DISCARD_AND_CONTINUE
.5xx
or 429
) but the max retry limit is exceeded. Default: FAIL
.The error handling behaviors can be configured when creating the instance of the sink, as shown in this snipped:
PrometheusSink sink = PrometheusSink.builder() // ... .setErrorHandlingBehaviorConfiguration(SinkWriterErrorHandlingBehaviorConfiguration.builder() .onPrometheusNonRetryableError(OnErrorBehavior.DISCARD_AND_CONTINUE) .onMaxRetryExceeded(OnErrorBehavior.DISCARD_AND_CONTINUE) .build()) .build();
When configured for DISCARD_AND_CONTINUE
, the sink will do the following:
WARN
level, with information about the problem and the number of time-series and samples droppedNote that there is no partial-failure condition: the entire write-request is discarded regardless what data in the request are causing the problem. Prometheus does not return sufficient information to automatically handle partial requests.
In the current connector version,
DISCARD_AND_CONTINUE
is the only supported behavior for non-retryable error.The behavior cannot be set to
FAIL
. Failing on non-retryable error would make impossible for the application to restart from checkpoint. The reason is that restarting from checkpoint cause some duplicates, that are rejected by Prometheus as out of order, causing in turn another non-retryable error, in an endless loop.
Remote-write endpoint responses 403 (Forbidden) and 404 (Not found) are always considered fatal, regardless the error handling configuration.
Any I/O error during the communication with the endpoint is also fatal.
The sink exposes custom metrics, counting the samples and write-requests (batches) successfully written or discarded.
numSamplesOut
number of samples successfully written to PrometheusnumWriteRequestsOut
number of write-requests successfully written to PrometheusnumWriteRequestsRetries
number of write requests retried due to a retryable error (e.g. throttling)numSamplesDropped
number of samples dropped, for any reasonsnumSamplesNonRetryableDropped
(when onPrometheusNonRetryableError
is set to DISCARD_AND_CONTINUE
) number of samples dropped due to non-retryable errorsnumSamplesRetryLimitDropped
(when onMaxRetryExceeded
is set to DISCARD_AND_CONTINUE
) number of samples dropped due to reaching the max number of retriesnumWriteRequestsPermanentlyFailed
number of write requests permanently failed, due to any reasons (non retryable, max nr of retries)Note: the numBytesOut
does not measure the number of bytes, due to an internal limitation of the base sink. This metric should be ignored, and you should rely on numSamplesOut
and numWriteRequestsOut
instead.
These custom metrics are exposed on partially customizable scope. By default, the scope is Sink__Writer.Prometheus
. It can be customized to any Sink__Writer.<metric-group>
.
The connector includes classes for Protobuf objects, Remote, Types, and GoGoProtos.
You can find a complete application example using the connector in DataStreamExample.java.