To configure the OAP Sever, we propose two CRDs:
Field Name | Description |
---|---|
Version | The version of OAP server, the default value is 9.5.0 |
Env | The environment variable of OAP server |
File | The static file in OAP Server, which contains three fieldsfile.path 、file.name and file.data . The file.path plus the file.name is the real file that needs to be replaced in the container image, and the file.data is the final data in the specific file. |
Field Name | Description |
---|---|
Desired | The number of oapserver that need to be configured |
Ready | The number of oapserver that configured successfully |
CreationTime | The time the OAPServerConfig was created. |
LastUpdateTime | The last time this condition was updated. |
When using the
file
, please don't set the same name
# static configuration of OAPServer apiVersion: operator.skywalking.apache.org/v1alpha1 kind: OAPServerConfig metadata: name: oapserverconfig-sample namespace: skywalking-system spec: # The version of OAPServer version: 9.5.0 # The env configuration of OAPServer env: - name: JAVA_OPTS value: -Xmx2048M - name: SW_CLUSTER value: kubernetes - name: SW_CLUSTER_K8S_NAMESPACE value: skywalking-system # enable the dynamic configuration - name: SW_CONFIGURATION value: k8s-configmap # set the labelselector of the dynamic configuration - name: SW_CLUSTER_K8S_LABEL value: app=collector,release=skywalking - name: SW_TELEMETRY value: prometheus - name: SW_HEALTH_CHECKER value: default - name: SKYWALKING_COLLECTOR_UID valueFrom: fieldRef: fieldPath: metadata.uid - name: SW_LOG_LAL_FILES value: test1 - name: SW_LOG_MAL_FILES value: test2 # The file configuration of OAPServer # we should avoid setting the same file name in the file file: - name: test1.yaml path: /skywalking/config/lal data: | rules: - name: example dsl: | filter { text { abortOnFailure false // for test purpose, we want to persist all logs regexp $/(?s)(?<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3}) \[TID:(?<tid>.+?)] \[(?<thread>.+?)] (?<level>\w{4,}) (?<logger>.{1,36}) (?<msg>.+)/$ } extractor { metrics { timestamp log.timestamp as Long labels level: parsed.level, service: log.service, instance: log.serviceInstance name "log_count" value 1 } } sink { } } - name: test2.yaml path: /skywalking/config/log-mal-rules data: | expSuffix: instance(['service'], ['instance'], Layer.GENERAL) metricPrefix: log metricsRules: - name: count_info exp: log_count.tagEqual('level', 'INFO').sum(['service', 'instance']).downsampling(SUM)
Field Name | Description |
---|---|
Version | The version of the OAP server, the default value is 9.5.0 |
LabelSelector | The label selector of the specific configmap, the default value is “app=collector,release=skywalking” |
Data | All configurations' key and value |
Field Name | Description |
---|---|
State | The state of dynamic configuration, running or stopped |
CreationTime | All configurations in one CR, the default value is false |
LastUpdateTime | The last time this condition was updated |
Notice, the CR's name cannot contain capital letters.
Users can split all configurations into several CRs. when using the OAPServerDynamicConfig, users can not only put some configurations in a CR, but also put a configuration in a CR, and the spec.data.name
in CR represents one dynamic configuration.
apiVersion: operator.skywalking.apache.org/v1alpha1 kind: OAPServerDynamicConfig metadata: name: oapserverdynamicconfig-sample spec: # The version of OAPServer version: 9.5.0 # The labelselector of OAPServer's dynamic configuration, it should be the same as labelSelector of OAPServerConfig labelSelector: app=collector,release=skywalking data: - name: agent-analyzer.default.slowDBAccessThreshold value: default:200,mongodb:50 - name: alarm.default.alarm-settings value: |- rules: # Rule unique name, must be ended with `_rule`. service_resp_time_rule: metrics-name: service_resp_time op: ">" threshold: 1000 period: 10 count: 3 silence-period: 5 message: Response time of service {name} is more than 1000ms in 3 minutes of last 10 minutes. service_sla_rule: # Metrics value need to be long, double or int metrics-name: service_sla op: "<" threshold: 8000 # The length of time to evaluate the metrics period: 10 # How many times after the metrics match the condition, will trigger alarm count: 2 # How many times of checks, the alarm keeps silence after alarm triggered, default as same as period. silence-period: 3 message: Successful rate of service {name} is lower than 80% in 2 minutes of last 10 minutes service_resp_time_percentile_rule: # Metrics value need to be long, double or int metrics-name: service_percentile op: ">" threshold: 1000,1000,1000,1000,1000 period: 10 count: 3 silence-period: 5 message: Percentile response time of service {name} alarm in 3 minutes of last 10 minutes, due to more than one condition of p50 > 1000, p75 > 1000, p90 > 1000, p95 > 1000, p99 > 1000 service_instance_resp_time_rule: metrics-name: service_instance_resp_time op: ">" threshold: 1000 period: 10 count: 2 silence-period: 5 message: Response time of service instance {name} is more than 1000ms in 2 minutes of last 10 minutes database_access_resp_time_rule: metrics-name: database_access_resp_time threshold: 1000 op: ">" period: 10 count: 2 message: Response time of database access {name} is more than 1000ms in 2 minutes of last 10 minutes endpoint_relation_resp_time_rule: metrics-name: endpoint_relation_resp_time threshold: 1000 op: ">" period: 10 count: 2 message: Response time of endpoint relation {name} is more than 1000ms in 2 minutes of last 10 minutes # Active endpoint related metrics alarm will cost more memory than service and service instance metrics alarm. # Because the number of endpoint is much more than service and instance. # # endpoint_resp_time_rule: # metrics-name: endpoint_resp_time # op: ">" # threshold: 1000 # period: 10 # count: 2 # silence-period: 5 # message: Response time of endpoint {name} is more than 1000ms in 2 minutes of last 10 minutes webhooks: # - http://127.0.0.1/notify/ # - http://127.0.0.1/go-wechat/ - name: core.default.apdexThreshold value: |- default: 500 # example: # the threshold of service "tomcat" is 1s # tomcat: 1000 # the threshold of service "springboot1" is 50ms # springboot1: 50 - name: agent-analyzer.default.uninstrumentedGateways value: |- #gateways: # - name: proxy0 # instances: # - host: 127.0.0.1 # the host/ip of this gateway instance # port: 9099 # the port of this gateway instance, defaults to 80
Set the dynamic configuration agent-analyzer.default.slowDBAccessThreshold
as follows.
apiVersion: operator.skywalking.apache.org/v1alpha1 kind: OAPServerDynamicConfig metadata: name: agent-analyzer.default spec: # The version of OAPServer version: 9.5.0 # The labelselector of OAPServer's dynamic configuration, it should be the same as labelSelector of OAPServerConfig labelSelector: app=collector,release=skywalking data: - name: slowDBAccessThreshold value: default:200,mongodb:50
Set the dynamic configuration core.default.endpoint-name-grouping-openapi.customerAPI-v1
and core.default.endpoint-name-grouping-openapi.productAPI-v1
as follows.
apiVersion: operator.skywalking.apache.org/v1alpha1 kind: OAPServerDynamicConfig metadata: name: core.default.endpoint-name-grouping-openapi spec: # The version of OAPServer version: 9.5.0 # The labelselector of OAPServer's dynamic configuration, it should be the same as labelSelector of OAPServerConfig labelSelector: app=collector,release=skywalking data: - name: customerAPI-v1 value: value of customerAPI-v1 - name: productAPI-v1 value: value of productAPI-v1