PIP-264, which can also be viewed here, describes in high level a plan to greatly enhance Pulsar metric system by replacing it with OpenTelemetry. You can read in the PIP the numerous existing problems PIP-264 solves. Among them are:
You can here why OpenTelemetry was chosen.
Since OpenTelemetry (a.k.a. OTel) is an emerging industry standard, there are plenty of good articles, videos and documentation about it. In this very short paragraph I'll describe what you need to know about OTel from this PIP perspective.
OpenTelemetry is a project aimed to standardize the way we instrument, collect and ship metrics from applications to telemetry backends, be it databases (e.g. Prometheus, Cortex, Thanos) or vendors (e.g. Datadog, Logz.io). It is divided into API, SDK and Collector:
Just to have some context: Pulsar codebase will use the OTel API to create counters / histograms and records values to them. So will the Pulsar plugins and Pulsar Function authors. Pulsar itself will be the one creating the SDK and using that to hand over an implementation of the API where ever needed in Pulsar. Collector is up to the choice of the user, as OTel provides a way to expose the metrics as /metrics
endpoint on a configured port, so Prometheus compatible scrapers can grab it from it directly. They can also send it via OTLP to OTel collector.
PIP-264 clearly outlined there will be two layers of metrics, collected and exported, side by side: OpenTelemetry and the existing metric system - currently exporting in Prometheus. This PIP will explain in detail how it will work. The basic premise is that you will be able to enable or disable OTel metrics, alongside the existing Prometheus metric exporting.
As specified in PIP-264, OpenTelemetry Java SDK has several fixes the Pulsar community must complete before it can be used in production. They are documented in PIP-264. The most important one is reducing memory allocations to be negligible. OTel SDK is built upon immutability, hence allocated memory in O(#topics
) which is a performance killer for low latency application like Pulsar.
You can track the proposal and progress the Pulsar and OTel communities are making in this issue.
Today Pulsar metrics endpoint /metrics
has an option to be protected by the configured AuthenticationProvider
. The configuration option is named authenticateMetricsEndpoint
in the broker and authenticateMetricsEndpoint
in the proxy.
Implementing PIP-264 consists of a long list of steps, which are detailed in this issue. The first step is add all the bare-bones infrastructure to use OpenTelemetry in Pulsar, such that next PRs can use it to start translating existing metrics to their OTel form. It means the same metrics will co-exist in the codebase and also in runtime, if OTel was enabled.
OpenTelemetry, as any good telemetry library (e.g. log4j, logback), has its own configuration mechanisms:
Pulsar doesn't need to introduce any additional configuration. The user can decide, using OTel configuration things like:
Pulsar will use AutoConfiguredOpenTelemetrySdk
which uses all the above configuration mechanisms (documented here). This class builds an OpenTelemetrySdk
based on configurations. This is the entry point to OpenTelemetry API, as it implements OpenTelemetry
API class.
There are some configuration options we wish to change their default, but still allow the users to override it if they wish. We think those default values will make a much easier user experience.
otel.experimental.metrics.cardinality.limit
- value: 10,000 This property sets an upper bound on the amount of unique Attributes
an instrument can have. Take Pulsar for example, an instrument like pulsar.broker.messaging.topic.received.size
, the unique Attributes
would be in the amount of active topics in the broker. Since Pulsar can handle up to 1M topics, it makes more sense to put the default value to 10k, which translates to 10k topics.AutoConfiguredOpenTelemetrySdkBuilder
allows to add properties using the method addPropertiesSupplier
. The System properties and environment variables override it. The file-based configuration still doesn't take those properties supplied into account, but it will.
We would like to have the ability to toggle OpenTelemetry-based metrics, as they are still new. We won't need any special Pulsar configuration, as OpenTelemetry SDK comes with a configuration key to do that. Since OTel is still experimental, it will have to be opt-in, hence we will add the following property to be the default using the mechanism described above:
otel.sdk.disabled
- value: true This property value disables OpenTelemetry.With OTel disabled, the user remains with the existing metrics system. OTel in a disabled state operates in a no-op mode. This means, instruments do get built, but the instrument builders return the same instance of a no-op instrument, which does nothing on record-values method (e.g. add(number)
, record(number)
). The no-op MeterProvider
has no registered MetricReader
hence no metric collection will be made. The memory impact is almost 0 and the same goes for CPU impact.
The current metric system doesn‘t have a toggle which causes all existing data structures to stop collecting data. Inserting will need changing in so many places since we don’t have a single place which through all metric instrument are created (one of the motivations for PIP-264). The current system do have a toggle: exposeTopicLevelMetricsInPrometheus
. It enables toggling off topic-level metrics, which means the highest cardinality metrics will be namespace level. Once that toggle is false
, the amount of data structures accounting memory would in the range of a few thousands which shouldn't post a burden memory wise. If the user refrain from calling /metrics
it will also reduce the CPU and memory cost associated with collecting metrics.
When the user enables OTel it means there will be a memory increase, but if the user disabled topic-level metrics in existing system, as specified above, the majority of the memory increase will be due to topic level metrics in OTel, at the expense of not having them in the existing metric system.
A broker is part of a cluster. It is configured in the Pulsar configuration key clusterName
. When the broker is part of a cluster, it means it shares the topics defined in that cluster (persisted in Metadata service: e.g. ZK) among the brokers of that cluster.
Today, each unique time series emitted in Prometheus metrics contains the cluster
label (almost all of them, as it is done manually). We wish the same with OTel - to have that attribute in each exported unique time series.
OTel has the perfect location to place attributes which are shared across all time series: Resource. An application can have multiple Resource, with each having 1 or more attributes. You define it once, in OTel initialization or configuration. It can contain attributes like the hostname, AWS region, etc. The default contains the service name and some info on the SDK version.
Attributes can be added dynamically, through addResourceCustomizer()
in AutoConfiguredOpenTelemetrySdkBuilder
. We will use that to inject the cluster
attribute, taken from the configuration.
In Prometheus, we submitted a proposal to opentelemetry specifications, which was merged, to allow copying resource attributes into each exported unique time series in Prometheus exporter. We plan to contribute its implementation to OTel Java SDK.
Resources in Prometheus exporter, are exported as target_info{} 1
and the attributes are added to this time series. This will require making joins to get it, making it extremely difficult to use. The other alternative was to introduce our own PulsarAttributesBuilder
class, on top of AttributesBuilder
of OTel. Getting every contributor to know this class, use it, is hard. Getting this across Pulsar Functions or Plugins authors, will be immensely hard. Also, when exporting as OTLP, it is very inefficient to repeat the attribute across all unique time series, instead of once using Resource. Hence, this needed to be solved in the Prometheus exporter as we did in the proposal.
The attribute will be named pulsar.cluster
, as both the proxy and the broker are part of this cluster.
pulsar.
. Example: pulsar.topic
, pulsar.cluster
.We should have a clear hierarchy, hence use the following prefix
pulsar.broker
pulsar.proxy
pulsar.function_worker
It‘s customary to use reverse domain name for meter names. Hence, we’ll use:
org.apache.pulsar.broker
org.apache.pulsar.proxy
org.apache.pulsar.function_worker
OTel meter name is converted to the attribute name otel_scope_name
and added to each unique time series attributes by Prometheus exporter.
We won't specify a meter version, as it is used solely to signify the version of the instrumentation, and currently we are the first version, hence not use it.
OpenTelemetryService
class
PulsarBrokerOpenTelemetry
class
OpenTelemetryService
using the cluster name taken from the broker configurationgetMeter()
returns the Meter
for the brokerPulsarProxyOpenTelemetry
class
PulsarBrokerOpenTelemetry
but for Pulsar ProxyPulsarWorkerOpenTelemetry
class
PulsarBrokerOpenTelemetry
but for Pulsar function worker/metrics
endpoint on a user defined port, if user chose to use itAuthenticationProvider