AWS Cloud EKS monitoring

SkyWalking leverages OpenTelemetry Collector with AWS Container Insights Receiver to transfer the metrics to OpenTelemetry receiver and into the Meter System.

Data flow

  1. OpenTelemetry Collector fetches metrics from EKS via AWS Container Insights Receiver and pushes metrics to SkyWalking OAP Server via the OpenCensus gRPC Exporter or OpenTelemetry gRPC exporter.
  2. The SkyWalking OAP Server parses the expression with MAL to filter/calculate/aggregate and store the results.

Set up

  1. Deploy amazon/aws-otel-collector with AWS Container Insights Receiver to EKS
  2. Config SkyWalking OpenTelemetry receiver.

EKS Monitoring

AWS Container Insights Receiver provides multiple dimensions metrics for EKS cluster, node, service, etc. Accordingly, SkyWalking observes the status, and payload of the EKS cluster, which is cataloged as a LAYER: AWS_EKS Service in the OAP. Meanwhile, the k8s nodes would be recognized as LAYER: AWS_EKS instances. The k8s service would be recognized as endpoints.

Specify Job Name

SkyWalking distinguishes AWS Cloud EKS metrics by attributes job_name, which value is aws-cloud-eks-monitoring. You could leverage OTEL Collector processor to add the attribute as follows:

processors:
  resource/job-name:
    attributes:
      - key: job_name
        value: aws-cloud-eks-monitoring
        action: insert     

Notice, if you don't specify job_name attribute, SkyWalking OAP will ignore the metrics

Supported Metrics

Monitoring PanelUnitMetric NameCatalogDescriptionData Source
Node Counteks_cluster_node_countServiceThe node count of the EKS clusterAWS Container Insights Receiver
Failed Node Counteks_cluster_failed_node_countServiceThe failed node count of the EKS clusterAWS Container Insights Receiver
Pod Count (namespace dimension)eks_cluster_namespace_countServiceThe count of pod in the EKS cluster(namespace dimension)AWS Container Insights Receiver
Pod Count (service dimension)eks_cluster_service_countServiceThe count of pod in the EKS cluster(service dimension)AWS Container Insights Receiver
Network RX Dropped Count (per second)count/seks_cluster_net_rx_droppedServiceNetwork RX dropped countAWS Container Insights Receiver
Network RX Error Count (per second)count/seks_cluster_net_rx_errorServiceNetwork RX error countAWS Container Insights Receiver
Network TX Dropped Count (per second)count/seks_cluster_net_rx_droppedServiceNetwork TX dropped countAWS Container Insights Receiver
Network TX Error Count (per second)count/seks_cluster_net_rx_errorServiceNetwork TX error countAWS Container Insights Receiver
Pod Counteks_cluster_node_pod_numberInstanceThe count of pod running on the nodeAWS Container Insights Receiver
CPU Utilizationpercenteks_cluster_node_cpu_utilizationInstanceThe CPU Utilization of the nodeAWS Container Insights Receiver
Memory Utilizationpercenteks_cluster_node_memory_utilizationInstanceThe Memory Utilization of the nodeAWS Container Insights Receiver
Network RXbytes/seks_cluster_node_net_rx_bytesInstanceNetwork RX bytes of the nodeAWS Container Insights Receiver
Network RX Error Countcount/seks_cluster_node_net_rx_bytesInstanceNetwork RX error count of the nodeAWS Container Insights Receiver
Network TXbytes/seks_cluster_node_net_rx_bytesInstanceNetwork TX bytes of the nodeAWS Container Insights Receiver
Network TX Error Countcount/seks_cluster_node_net_rx_bytesInstanceNetwork TX error count of the nodeAWS Container Insights Receiver
Disk IO Writebytes/seks_cluster_node_net_rx_bytesInstanceThe IO write bytes of the nodeAWS Container Insights Receiver
Disk IO Readbytes/seks_cluster_node_net_rx_bytesInstanceThe IO read bytes of the nodeAWS Container Insights Receiver
FS Utilizationpercenteks_cluster_node_net_rx_bytesInstanceThe filesystem utilization of the nodeAWS Container Insights Receiver
CPU Utilizationpercenteks_cluster_node_pod_cpu_utilizationInstanceThe CPU Utilization of the pod running on the nodeAWS Container Insights Receiver
Memory Utilizationpercenteks_cluster_node_pod_memory_utilizationInstanceThe Memory Utilization of the pod running on the nodeAWS Container Insights Receiver
Network RXbytes/seks_cluster_node_pod_net_rx_bytesInstanceNetwork RX bytes of the pod running on the nodeAWS Container Insights Receiver
Network RX Error Countcount/seks_cluster_node_pod_net_rx_errorInstanceNetwork RX error count of the pod running on the nodeAWS Container Insights Receiver
Network TXbytes/seks_cluster_node_pod_net_tx_bytesInstanceNetwork RX bytes of the pod running on the nodeAWS Container Insights Receiver
Network TX Error Countcount/seks_cluster_node_pod_net_tx_errorInstanceNetwork RX error count of the pod running on the nodeAWS Container Insights Receiver
CPU Utilizationpercenteks_cluster_service_pod_cpu_utilizationEndpointThe CPU Utilization of pod that belong to the serviceAWS Container Insights Receiver
Memory Utilizationpercenteks_cluster_service_pod_memory_utilizationEndpointThe Memory Utilization of pod that belong to the serviceAWS Container Insights Receiver
Network RXbytes/seks_cluster_service_pod_net_rx_bytesEndpointNetwork RX bytes of the pod that belong to the serviceAWS Container Insights Receiver
Network RX Error Countcount/seks_cluster_service_pod_net_rx_errorEndpointNetwork TX error count of the pod that belongs to the serviceAWS Container Insights Receiver
Network TXbytes/seks_cluster_service_pod_net_tx_bytesEndpointNetwork TX bytes of the pod that belong to the serviceAWS Container Insights Receiver
Network TX Error Countcount/seks_cluster_node_pod_net_tx_errorEndpointNetwork TX error count of the pod that belongs to the serviceAWS Container Insights Receiver

Customizations

You can customize your own metrics/expression/dashboard panel. The metrics definition and expression rules are found in /config/otel-rules/aws-eks/. The AWS Cloud EKS dashboard panel configurations are found in /config/ui-initialized-templates/aws_eks.

OTEL Configuration Sample With AWS Container Insights Receiver

extensions:
  health_check:
receivers:
  awscontainerinsightreceiver:
processors:
  resource/job-name:
    attributes:
      - key: job_name
        value: aws-cloud-eks-monitoring
        action: insert
exporters:
  otlp:
    endpoint: oap-service:11800
    tls:
      insecure: true
  logging:
    loglevel: debug
service:
  pipelines:
    metrics:
      receivers: [awscontainerinsightreceiver]
      processors: [resource/job-name]
      exporters: [otlp,logging]
  extensions: [health_check]

Refer to AWS Container Insights Receiver for more information