= Camel K Operator Monitoring
NOTE: The Camel K monitoring architecture relies on[Prometheus] and the eponymous operator. Make sure you've checked the xref:observability/monitoring.adoc#prerequisites[Camel K monitoring prerequisites].
== Installation
The `kamel install` command provides the `--monitoring` option flag, that can be used to automatically creates the default resources required to monitor the Camel K operator, e.g.:
$ kamel install --monitoring=true
This creates:
* a `PodMonitor` resource targeting the operator _metrics_ endpoint, so that the Prometheus server can scrape the <<metrics>> exposed by the operator;
* a `PrometheusRule` resource with default alerting rules based on the exposed metrics. The <<alerting>> provides more details about these default rules.
The `kamel install` command also provides the `--monitoring-port` option, that can be used to change the port of the operator monitoring endpoint, e.g.:
$ kamel install --monitoring=true --monitoring-port=8888
You can refer to the <<discovery>> and <<alerting>> sections in case you don't want to rely on the default monitoring configuration.
== Metrics
The Camel K operator monitoring endpoint exposes the following metrics:
.Camel K operator metrics
|Name |Type |Description |Buckets |Labels
| `camel_k_reconciliation_duration_seconds`
| `HistogramVec`
| Reconciliation request duration
| 0.25s, 0.5s, 1s, 5s
| `namespace`, `group`, `version`, `kind`, `result`: `Reconciled`\|`Errored`\|`Requeued`, `tag`: `""`\|`PlatformError`\|`UserError`
| `camel_k_build_duration_seconds`
| `HistogramVec`
| Build duration
| 30s, 1m, 1.5m, 2m, 5m, 10m
| `result`: `Succeeded`\|`Error`
| `camel_k_build_recovery_attempts`
| `Histogram`
| Build recovery attempts
| 0, 1, 2, 3, 4, 5
| `result`: `Succeeded`\|`Error`
| `camel_k_build_queue_duration_seconds`
| `Histogram`
| Build queue duration
| 5s, 15s, 30s, 1m, 5m,
| N/A
| `camel_k_integration_first_readiness_seconds`
| `Histogram`
| Time to first integration readiness
| 5s, 10s, 30s, 1m, 2m
| N/A
== Discovery
A `PodMonitor` resource must be created for the Prometheus Operator to reconcile, so that the managed Prometheus instance can scrape the Camel K operator _metrics_ endpoint.
As an example, hereafter is the `PodMonitor` resource that is created when executing the `kamel install --monitoring=true` command:
kind: PodMonitor
name: camel-k-operator
labels: # <1>
matchLabels: # <2>
app: "camel-k" operator
- port: metrics
<1> The labels must match the `podMonitorSelector` field from the `Prometheus` resource
<2> This label selector matches the Camel K operator Deployment labels
The Prometheus Operator[getting started] guide documents the discovery mechanism, as well as the relationship between the operator resources.
In case your operator metrics are not discovered, you may want to rely on[Troubleshooting `ServiceMonitor` changes], which also applies to `PodMonitor` resources troubleshooting.
== Alerting
NOTE: The Prometheus Operator declares the `AlertManager` resource that can be used to configure _Alertmanager_ instances, along with `Prometheus` instances. The following section assumes an `AlertManager` resource already exists in your cluster.
A `PrometheusRule` resource can be created for the Prometheus Operator to reconcile, so that the managed AlertManager instance can trigger alerts, based on the metrics exposed by the Camel K operator.
As an example, hereafter is the alerting rules that are defined in `PrometheusRule` resource that is created when executing the `kamel install --monitoring=true` command:
.Camel K operator alerts
|Name |Severity |Description
| `CamelKReconciliationDuration`
| warning
| More than 10% of the reconciliation requests have their duration above 0.5s over at least 1 min.
| `CamelKReconciliationFailure`
| warning
| More than 1% of the reconciliation requests have failed over at least 10 min.
| `CamelKSuccessBuildDuration2m`
| warning
| More than 10% of the successful builds have their duration above 2 min over at least 1 min.
| `CamelKSuccessBuildDuration5m`
| critical
| More than 1% of the successful builds have their duration above 5 min over at least 1 min.
| `CamelKBuildError`
| critical
| More than 1% of the builds have errored over at least 10 min.
| `CamelKBuildQueueDuration1m`
| warning
| More than 1% of the builds have been queued for more than 1 min over at least 1 min.
| `CamelKBuildQueueDuration5m`
| critical
| More than 1% of the builds have been queued for more than 5 min over at least 1 min.
You can register your own `PrometheusRule` resources, to be used by Prometheus AlertManager instances to trigger alerts, e.g.:
kind: PrometheusRule
name: camel-k-alerts
- name: camel-k-alerts
- alert: CamelKIntegrationTimeToReadiness
expr: |
1 - sum(rate(camel_k_integration_first_readiness_seconds_bucket{le="60"}[5m])) by (job)
sum(rate(camel_k_integration_first_readiness_seconds_count[5m])) by (job)
* 100
> 10
for: 1m
severity: warning
message: |
{{ printf "%0.0f" $value }}% of the integrations
for {{ $labels.job }} have their first time to readiness above 1m.
More information can be found in the Prometheus Operator[Alerting] user guide.
You can also find more details in[Creating alerting rules] from the OpenShift documentation.