SkyWalking uses the Cilium Fetcher to gather traffic data between services from Cilium Hubble via the Observe API. It then leverages the OAL System for metrics and entity analysis.
SkyWalking fetches Cilium Node and Observability Data from gRPC API, analysis to generate entity and using OAL to generating metrics.
selector=default
in the YAML or set SW_CILIUM_FETCHER=default
through the system environment variable.cilium-fetcher: selector: ${SW_CILIUM_FETCHER:default} default: # Host name and port of Hubble peer component peerHost: ${SW_CILIUM_FETCHER_PEER_HOST:hubble-peer.kube-system.svc.cluster.local} peerPort: ${SW_CILIUM_FETCHER_PEER_PORT:80} fetchFailureRetrySecond: ${SW_CILIUM_FETCHER_FETCH_FAILURE_RETRY_SECOND:10} sslConnection: ${SW_CILIUM_FETCHER_SSL_CONNECTION:false} sslPrivateKeyFile: ${SW_CILIUM_FETCHER_PRIVATE_KEY_FILE_PATH:} sslCertChainFile: ${SW_CILIUM_FETCHER_CERT_CHAIN_FILE_PATH:} sslCaFile: ${SW_CILIUM_FETCHER_CA_FILE_PATH:} convertClientAsServerTraffic: ${SW_CILIUM_FETCHER_CONVERT_CLIENT_AS_SERVER_TRAFFIC:true}
peerPort
: usually should be updated to the 443
.sslConnection
: should be set to true
.sslPrivateKeyFile
: the path of the private key file.sslCertChainFile
: the path of the certificate chain file.sslCaFile
: the path of the CA file.cilium-rules/exclude.yaml
: Configure the which endpoint should be excluded from the monitoring, Please read exclude rules selection for more detail.cilium-rules/metadata-service-mapping.yaml
: Configure the service name and endpoint mapping.The exclude configuration in Cilium rules is used to specify which Cilium Endpoints would be excluded from being added to the topology map or from the generation of metrics and other data.
namespaces: # define with traffic from which namespace should be excluded - kube-system labels: # define with traffic from which endpoint labels should be excluded, if matches any labels, the traffic would be excluded. - k8s:io.cilium.k8s.namespace.labels.istio-injection: "enabled" # Each labels is a key-value pair, the key is the label key, the value is the label value. k8s:security.istio.io/tlsMode: istio
By default, all the traffic from kube-system
and traffic management by istio mesh would be excluded.
NOTE: Only the endpoint in both source and destination matches the exclude rules would be excluded. Otherwise, the traffic would be still included.
SkyWalking fetch the flow from Cilium, analyzes the source and destination endpoint to parse out the following corresponding entities:
For each of the above-mentioned entities, metrics such as L4 and L7 protocols can be analyzed.
Record the relevant metrics for every service read/write packages with other services.
Name | Unit | Description |
---|---|---|
Read Package CPM | Count | Total Read Package from other Service counts per minutes. |
Write Package CPM | Count | Total Write Package from other Service counts per minutes. |
Drop Package CPM | Count | Total Drop Package from other Service counts per minutes. |
Drop Package Reason Count | Labeled Count | Total Read Package reason(labeled) from other Service counts per minutes. |
Based on each transfer data analysis, extract the information of the 7-layer network protocol.
NOTE: By default, Cilium only reports L4 metrics. If you need L7 metrics, they must be explicitly specified in each service's CiliumNetworkPolicy. For details please refer to this document.
Name | Unit | Description |
---|---|---|
CPM | Count | HTTP Request calls per minutes. |
Duration | Nanoseconds | Total HTTP Response use duration. |
Success CPM | Count | Total HTTP Response success(status < 500) count. |
Status 1/2/3/4/5xx | Count | HTTP Response status code group by 1xx/2xx/3xx/4xx/5xx. |
Name | Unit | Description |
---|---|---|
CPM | Count | DNS Request calls per minutes. |
Duration | Nanoseconds | Total DNS Response use duration. |
Success CPM | Count | Total DNS Response success(code == 0) count. |
Error Count | Label Count | DNS Response error count with error description label. |
Name | Unit | Description |
---|---|---|
CPM | Count | Kafka Request calls per minutes. |
Duration | Nanoseconds | Total Kafka Response use duration. |
Success CPM | Count | Total Kafka Response success(errorCode == 0) count. |
Error Count | Label Count | Kafka Response error count with error description label. |
The Cilium Fetcher module relies on the Cluster module, when the Cilium Fetcher module starts up, it obtains information about all Cilium nodes and node information in the OAP cluster through Peers API on each OAP node.
Additionally, it averagely distributes collected Cilium nodes to every OAP node. Moreover, it ensures that a single Cilium node is not monitored by multiple OAP nodes.
You can customize your own metrics/dashboard panel. The metrics definition and expression rules are found in /config/oal/cilium.oal
, please refer the Scope Declaration Documentation. The Cilium dashboard panel configurations are found in /config/ui-initialized-templates/cilium_service
.