The profiling is an on-demand diagnosing method to locate bottleneck of the services. These typical scenarios usually are suitable for profiling through various profiling tools
In the SkyWalking landscape, we provided three ways to support profiling within reasonable resource cost.
In-process profiling is primarily provided by auto-instrument agents in the VM-based runtime.
This feature resolves the issue <1> through capture the snapshot of the thread stacks periodically. The OAP would aggregate the thread stack per RPC request, and provide a hierarchy graph to indicate the slow methods based on continuous snapshot.
The period is usually every 10-100 milliseconds, which is not recommended to be less, due to this capture would usually cause classical stop-the-world for the VM, which would impact the whole process performance.
Learn more tech details from the post, Use Profiling to Fix the Blind Spot of Distributed Tracing.
For now, Java and Python agents support this.
Java App Profiling uses the AsyncProfiler for sampling
Async Profiler is a low overhead sampling profiler for Java that does not suffer from Safepoint bias problem. It features HotSpot-specific APIs to collect stack traces and to track memory allocations. The profiler works with OpenJDK and other Java runtimes based on the HotSpot JVM.
Async Profiler can trace the following kinds of events:
Only Java agent support this.
Out-of-process profiling leverage eBPF technology with origins in the Linux kernel. It provides a way to extend the capabilities of the kernel safely and efficiently.
On-CPU profiling is suitable for analyzing thread stacks when service CPU usage is high.
If the stack is dumped more times, it means that the thread stack occupies more CPU resources.
This is pretty similar with in-process profiling to resolve the issue <1>, but it is made out-of-process and based on Linux eBPF. Meanwhile, this is made for languages without VM mechanism, which caused not supported by in-process agents, such as, C/C++, Rust. Golang is a special case, it exposed the metadata of the VM for eBPF, so, it could be profiled.
Off-CPU profiling is suitable for performance issues that are not caused by high CPU usage, but may be on high CPU load. This profiling aims to resolve the issue <2>.
For example,
Off-CPU profiling provides two perspectives
Learn more tech details about ON/OFF CPU profiling from the post, Pinpoint Service Mesh Critical Performance Impact by using eBPF
Network profiling captures the network packages to analysis traffic at L4(TCP) and L7(HTTP) to recognize network traffic from a specific process or a k8s pod. Through this traffic analysis, locate the root causes of the issues <3> and <4>.
Network profiling provides
Learn more tech details from the post, Diagnose Service Mesh Network Performance with eBPF
Continuous Profiling utilizes monitoring of system, processes, and network, and automatically initiates profiling tasks when conditions meet the configured thresholds and time windows.
Continuous profiling periodically collects the following types of performance metrics for processes and systems:
When the collected metric data matches the configured threshold, the following types of profiling tasks could be triggered: