SkyWalking has provided an access log collector based on the Agent layer and Service Mesh layer, and can generate corresponding topology maps and metrics based on the data. However, the Kubernetes Layer still lacks corresponding access log collector and analysis work.
This proposal is dedicated to collecting and analyzing network access logs in Kubernetes.
There is no significant architecture-level change. Still using the Rover project to collect data and report it to SkyWalking OAP using the gRPC protocol.
Based on the content in Motivation, if we want to ignore the application types(different program languages) and only monitor network logs, using eBPF is a good choice. It mainly reflects in the following aspects:
Based on these reasons and collected data, they can be implemented in SkyWalking Rover and collected and monitored based on the following steps:
For content that uses TLS for data transmission, Rover will detect whether the current language uses libraries such as OpenSSL. If it is used, it will asynchronously intercept relevant OpenSSL methods when the process starts to perceive the original data content.
However, this approach is not feasible for Java because Java does not use the OpenSSL library but performs encryption/decryption through Java code. Currently, eBPF cannot intercept Java method calls. Therefore, it results in an inability to perceive the TLS data protocol in Java.
If the Service is deployed in Istio sidecar, it will still monitor each process. If the Service is a Java service and uses TLS, it can analyze the relevant traffic generated in the sidecar (envoy).
No new library is planned to be added to the codebase.
About the protocol, there should be no breaking changes, but enhancements only:
Rover: adding a new gRPC data collection protocol for reporting the access logs.OAP: It should have no protocol updates. The existing query protocols are already sufficient for querying Kubernetes topology and metric data.| column | data type | value description |
|---|---|---|
| name | string | kubernetes service name |
| short_name | string | same with name |
| service_id | string | base64(name).1 |
| group | string | empty string |
| layer | string | KUBERNETES |
| column | data type | value description |
|---|---|---|
| service_id | string | base64(service_name).1 |
| name | string | pod name |
| last_ping | long | last access log message timestamp(millisecond) |
| properties | json | empty string |
| column | data type | value description |
|---|---|---|
| service_id | string | base64(service_name).1 |
| name | string | access log endpoint name(for HTTP1, is URI) |
All entity information is built on connections. If the target address is remote, the name will be resolved in the following order:
Different entities have different displays for remote addresses. Please refer to the following table.
| table name | remote info(display by following order) |
|---|---|
| service_relation | service name, remote IP address |
| instance_relation | pod name, remote IP address |
NOTICE: If it is the internal data interaction within the pod, such as exchanging data between services and sidecar (envoy), no corresponding traffic will be generated. We only generate and interact with external pods.
If the service IP is used to send requests to the upstream, we will use eBPF to perceive the real target PodIP by perceiving relevant conntrack records.
However, if conntrack technology is not used, it is difficult to perceive the real target IP address. In this case, instance relation data of this kind will be dropped, but we will mark all discarded relationship generation counts through a metric for better understanding of the situation.
Integrate the data into the OAL system and generate corresponding metrics through predefined data combined with OAL statements.
This proposal will only add a module to Rover that explains the configuration of access logs, and changes in the Kubernetes module on the UI.
In the Kubernetes UI, users can see the following additions: