docs/modules/ROOT/pages/observability/operator.adoc - camel-k - Git at Google

 [[operator-monitoring]]
 = Camel K Operator Monitoring

 NOTE: The Camel K monitoring architecture relies on https://prometheus.io[Prometheus] and the eponymous operator. Make sure you've checked the xref:observability/monitoring.adoc#prerequisites[Camel K monitoring prerequisites].

 [[installation]]
 == Installation

 The `kamel install` command provides the `--monitoring` option flag, that can be used to automatically creates the default resources required to monitor the Camel K operator, e.g.:

 [source,sh]
 ----
 $ kamel install --monitoring=true
 ----

 This creates:

 * a `PodMonitor` resource targeting the operator _metrics_ endpoint, so that the Prometheus server can scrape the <<metrics>> exposed by the operator;
 * a `PrometheusRule` resource with default alerting rules based on the exposed metrics. The <<alerting>> provides more details about these default rules.

 The `kamel install` command also provides the `--monitoring-port` option, that can be used to change the port of the operator monitoring endpoint, e.g.:

 [source,sh]
 ----
 $ kamel install --monitoring=true --monitoring-port=8888
 ----

 You can refer to the <<discovery>> and <<alerting>> sections in case you don't want to rely on the default monitoring configuration.

 [[metrics]]
 == Metrics

 The Camel K operator monitoring endpoint exposes the following metrics:

 .Camel K operator metrics
 |===
 |Name |Type |Description |Buckets |Labels

 | `camel_k_reconciliation_duration_seconds`
 | `HistogramVec`
 | Reconciliation request duration
 | 0.25s, 0.5s, 1s, 5s
 | `namespace`, `group`, `version`, `kind`, `result`: `Reconciled`\|`Errored`\|`Requeued`, `tag`: `""`\|`PlatformError`\|`UserError`

 | `camel_k_build_duration_seconds`
 | `HistogramVec`
 | Build duration
 | 30s, 1m, 1.5m, 2m, 5m, 10m
 | `result`: `Succeeded`\|`Error`

 | `camel_k_build_recovery_attempts`
 | `Histogram`
 | Build recovery attempts
 | 0, 1, 2, 3, 4, 5
 | `result`: `Succeeded`\|`Error`

 | `camel_k_build_queue_duration_seconds`
 | `Histogram`
 | Build queue duration
 | 5s, 15s, 30s, 1m, 5m,
 | N/A

 | `camel_k_integration_first_readiness_seconds`
 | `Histogram`
 | Time to first integration readiness
 | 5s, 10s, 30s, 1m, 2m
 | N/A

 |===

 [[discovery]]
 == Discovery

 A `PodMonitor` resource must be created for the Prometheus Operator to reconcile, so that the managed Prometheus instance can scrape the Camel K operator _metrics_ endpoint.

 As an example, hereafter is the `PodMonitor` resource that is created when executing the `kamel install --monitoring=true` command:

 .operator-pod-monitor.yaml
 [source,yaml]
 ----
 apiVersion: monitoring.coreos.com/v1
 kind: PodMonitor
 metadata:
   name: camel-k-operator
   labels: # <1>
     ...
 spec:
   selector:
     matchLabels: # <2>
       app: "camel-k"
       camel.apache.org/component: operator
   podMetricsEndpoints:
     - port: metrics
 ----
 <1> The labels must match the `podMonitorSelector` field from the `Prometheus` resource
 <2> This label selector matches the Camel K operator Deployment labels

 The Prometheus Operator https://github.com/coreos/prometheus-operator/blob/v0.38.0/Documentation/user-guides/getting-started.md#related-resources[getting started] guide documents the discovery mechanism, as well as the relationship between the operator resources.

 In case your operator metrics are not discovered, you may want to rely on https://github.com/coreos/prometheus-operator/blob/v0.38.0/Documentation/troubleshooting.md#troubleshooting-servicemonitor-changes[Troubleshooting `ServiceMonitor` changes], which also applies to `PodMonitor` resources troubleshooting.

 [[alerting]]
 == Alerting

 NOTE: The Prometheus Operator declares the `AlertManager` resource that can be used to configure _Alertmanager_ instances, along with `Prometheus` instances. The following section assumes an `AlertManager` resource already exists in your cluster.

 A `PrometheusRule` resource can be created for the Prometheus Operator to reconcile, so that the managed AlertManager instance can trigger alerts, based on the metrics exposed by the Camel K operator.

 As an example, hereafter is the alerting rules that are defined in `PrometheusRule` resource that is created when executing the `kamel install --monitoring=true` command:

 .Camel K operator alerts
 |===
 |Name |Severity |Description

 | `CamelKReconciliationDuration`
 | warning
 | More than 10% of the reconciliation requests have their duration above 0.5s over at least 1 min.

 | `CamelKReconciliationFailure`
 | warning
 | More than 1% of the reconciliation requests have failed over at least 10 min.

 | `CamelKSuccessBuildDuration2m`
 | warning
 | More than 10% of the successful builds have their duration above 2 min over at least 1 min.

 | `CamelKSuccessBuildDuration5m`
 | critical
 | More than 1% of the successful builds have their duration above 5 min over at least 1 min.

 | `CamelKBuildError`
 | critical
 | More than 1% of the builds have errored over at least 10 min.

 | `CamelKBuildQueueDuration1m`
 | warning
 | More than 1% of the builds have been queued for more than 1 min over at least 1 min.

 | `CamelKBuildQueueDuration5m`
 | critical
 | More than 1% of the builds have been queued for more than 5 min over at least 1 min.

 |===

 You can register your own `PrometheusRule` resources, to be used by Prometheus AlertManager instances to trigger alerts, e.g.:

 [source,yaml]
 ----
 apiVersion: monitoring.coreos.com/v1
 kind: PrometheusRule
 metadata:
   name: camel-k-alerts
 spec:
   groups:
     - name: camel-k-alerts
       rules:
         - alert: CamelKIntegrationTimeToReadiness
           expr: |
             (
             1 - sum(rate(camel_k_integration_first_readiness_seconds_bucket{le="60"}[5m])) by (job)
             /
             sum(rate(camel_k_integration_first_readiness_seconds_count[5m])) by (job)
             )
             * 100
             > 10
           for: 1m
           labels:
             severity: warning
           annotations:
             message: |
               {{ printf "%0.0f" $value }}% of the integrations
               for {{ $labels.job }} have their first time to readiness above 1m.
 ----

 More information can be found in the Prometheus Operator https://github.com/coreos/prometheus-operator/blob/v0.38.0/Documentation/user-guides/alerting.md[Alerting] user guide.
 You can also find more details in https://docs.openshift.com/container-platform/4.4/monitoring/monitoring-your-own-services.html#creating-alerting-rules_monitoring-your-own-services[Creating alerting rules] from the OpenShift documentation.
	[[operator-monitoring]]
	= Camel K Operator Monitoring

	NOTE: The Camel K monitoring architecture relies on https://prometheus.io[Prometheus] and the eponymous operator. Make sure you've checked the xref:observability/monitoring.adoc#prerequisites[Camel K monitoring prerequisites].

	[[installation]]
	== Installation

	The `kamel install` command provides the `--monitoring` option flag, that can be used to automatically creates the default resources required to monitor the Camel K operator, e.g.:

	[source,sh]
	----
	$ kamel install --monitoring=true
	----

	This creates:

	* a `PodMonitor` resource targeting the operator _metrics_ endpoint, so that the Prometheus server can scrape the <<metrics>> exposed by the operator;
	* a `PrometheusRule` resource with default alerting rules based on the exposed metrics. The <<alerting>> provides more details about these default rules.

	The `kamel install` command also provides the `--monitoring-port` option, that can be used to change the port of the operator monitoring endpoint, e.g.:

	[source,sh]
	----
	$ kamel install --monitoring=true --monitoring-port=8888
	----

	You can refer to the <<discovery>> and <<alerting>> sections in case you don't want to rely on the default monitoring configuration.

	[[metrics]]
	== Metrics

	The Camel K operator monitoring endpoint exposes the following metrics:

	.Camel K operator metrics
	\|===
	\|Name \|Type \|Description \|Buckets \|Labels

	\| `camel_k_reconciliation_duration_seconds`
	\| `HistogramVec`
	\| Reconciliation request duration
	\| 0.25s, 0.5s, 1s, 5s
	\| `namespace`, `group`, `version`, `kind`, `result`: `Reconciled`\\|`Errored`\\|`Requeued`, `tag`: `""`\\|`PlatformError`\\|`UserError`

	\| `camel_k_build_duration_seconds`
	\| `HistogramVec`
	\| Build duration
	\| 30s, 1m, 1.5m, 2m, 5m, 10m
	\| `result`: `Succeeded`\\|`Error`

	\| `camel_k_build_recovery_attempts`
	\| `Histogram`
	\| Build recovery attempts
	\| 0, 1, 2, 3, 4, 5
	\| `result`: `Succeeded`\\|`Error`

	\| `camel_k_build_queue_duration_seconds`
	\| `Histogram`
	\| Build queue duration
	\| 5s, 15s, 30s, 1m, 5m,
	\| N/A

	\| `camel_k_integration_first_readiness_seconds`
	\| `Histogram`
	\| Time to first integration readiness
	\| 5s, 10s, 30s, 1m, 2m
	\| N/A

	\|===

	[[discovery]]
	== Discovery

	A `PodMonitor` resource must be created for the Prometheus Operator to reconcile, so that the managed Prometheus instance can scrape the Camel K operator _metrics_ endpoint.

	As an example, hereafter is the `PodMonitor` resource that is created when executing the `kamel install --monitoring=true` command:

	.operator-pod-monitor.yaml
	[source,yaml]
	----
	apiVersion: monitoring.coreos.com/v1
	kind: PodMonitor
	metadata:
	name: camel-k-operator
	labels: # <1>
	...
	spec:
	selector:
	matchLabels: # <2>
	app: "camel-k"
	camel.apache.org/component: operator
	podMetricsEndpoints:
	- port: metrics
	----
	<1> The labels must match the `podMonitorSelector` field from the `Prometheus` resource
	<2> This label selector matches the Camel K operator Deployment labels

	The Prometheus Operator https://github.com/coreos/prometheus-operator/blob/v0.38.0/Documentation/user-guides/getting-started.md#related-resources[getting started] guide documents the discovery mechanism, as well as the relationship between the operator resources.

	In case your operator metrics are not discovered, you may want to rely on https://github.com/coreos/prometheus-operator/blob/v0.38.0/Documentation/troubleshooting.md#troubleshooting-servicemonitor-changes[Troubleshooting `ServiceMonitor` changes], which also applies to `PodMonitor` resources troubleshooting.

	[[alerting]]
	== Alerting

	NOTE: The Prometheus Operator declares the `AlertManager` resource that can be used to configure _Alertmanager_ instances, along with `Prometheus` instances. The following section assumes an `AlertManager` resource already exists in your cluster.

	A `PrometheusRule` resource can be created for the Prometheus Operator to reconcile, so that the managed AlertManager instance can trigger alerts, based on the metrics exposed by the Camel K operator.

	As an example, hereafter is the alerting rules that are defined in `PrometheusRule` resource that is created when executing the `kamel install --monitoring=true` command:

	.Camel K operator alerts
	\|===
	\|Name \|Severity \|Description

	\| `CamelKReconciliationDuration`
	\| warning
	\| More than 10% of the reconciliation requests have their duration above 0.5s over at least 1 min.

	\| `CamelKReconciliationFailure`
	\| warning
	\| More than 1% of the reconciliation requests have failed over at least 10 min.

	\| `CamelKSuccessBuildDuration2m`
	\| warning
	\| More than 10% of the successful builds have their duration above 2 min over at least 1 min.

	\| `CamelKSuccessBuildDuration5m`
	\| critical
	\| More than 1% of the successful builds have their duration above 5 min over at least 1 min.

	\| `CamelKBuildError`
	\| critical
	\| More than 1% of the builds have errored over at least 10 min.

	\| `CamelKBuildQueueDuration1m`
	\| warning
	\| More than 1% of the builds have been queued for more than 1 min over at least 1 min.

	\| `CamelKBuildQueueDuration5m`
	\| critical
	\| More than 1% of the builds have been queued for more than 5 min over at least 1 min.

	\|===

	You can register your own `PrometheusRule` resources, to be used by Prometheus AlertManager instances to trigger alerts, e.g.:

	[source,yaml]
	----
	apiVersion: monitoring.coreos.com/v1
	kind: PrometheusRule
	metadata:
	name: camel-k-alerts
	spec:
	groups:
	- name: camel-k-alerts
	rules:
	- alert: CamelKIntegrationTimeToReadiness
	expr: \|
	(
	1 - sum(rate(camel_k_integration_first_readiness_seconds_bucket{le="60"}[5m])) by (job)
	/
	sum(rate(camel_k_integration_first_readiness_seconds_count[5m])) by (job)
	)
	* 100
	> 10
	for: 1m
	labels:
	severity: warning
	annotations:
	message: \|
	{{ printf "%0.0f" $value }}% of the integrations
	for {{ $labels.job }} have their first time to readiness above 1m.
	----

	More information can be found in the Prometheus Operator https://github.com/coreos/prometheus-operator/blob/v0.38.0/Documentation/user-guides/alerting.md[Alerting] user guide.
	You can also find more details in https://docs.openshift.com/container-platform/4.4/monitoring/monitoring-your-own-services.html#creating-alerting-rules_monitoring-your-own-services[Creating alerting rules] from the OpenShift documentation.