ALS Load Balance

Using satellite as a load balancer in envoy and OAP can effectively prevent the problem of unbalanced messages received by OAP.

In this case, we mainly use memory queues for intermediate data storage.

Deference Envoy Count, OAP performance could impact the Satellite transmit performance.

Envoy InstanceConcurrent UserALS OPSSatellite CPUSatellite Memory
150100~50K1.2C0.5-1.0G
150300~80K1.8C1.0-1.5G
300100~50K1.4C0.8-1.2G
300300~100K2.2C1.3-2.0G
800100~50K1.5C0.9-1.5G
800300~100K2.6C1.7-2.7G
1500100~50K1.7C1.4-2.4G
1500300~100K2.7C2.3-3.0G
2300150~50K1.8C1.9-3.1G
2300300~90K2.5C2.3-4.0G
2300500~110K3.2C2.8-4.7G

Detail

Environment

Using GKE Environment, helm to build cluster.

ModuleVersionReplicate CountCPU LimitMemory LimitDescription
OAP8.9.0612C32GiUsing ElasticSearch as Storage
Satellite0.4.018C16Gi
ElasticSearch7.5.13816Gi

Setting

800 Envoy, 100K QPS ALS.

ModuleEnvironment ConfigUse ValueDefault ValueDescriptionRecommend Value
SatelliteSATELLITE_QUEUE_PARTITION504Support several goroutines concurrently to consume the queueSatellite CPU number * 4-6, It could help improve throughput, but the default value also could handle 800 Envoy Instance and 100K QPS ALS message.
SatelliteSATELLITE_QUEUE_EVENT_BUFFER_SIZE30001000The size of the queue in each concurrencyThis is related to the number of Envoys. If the number of Envoys is large, it is recommended to increase the value.
SatelliteSATELLITE_ENVOY_ALS_V3_PIPE_RECEIVER_FLUSH_TIME30001000When the Satellite receives the message, how long(millisecond) will the ALS message be merged into an Event.If a certain time delay is accepted, the value can be adjusted larger, which can effectively reduce CPU usage and make the Satellite more stable
SatelliteSATELLITE_ENVOY_ALS_V3_PIPE_SENDER_FLUSH_TIME30001000How long(millisecond) is the memory queue data for each Goroutine to be summarized and sent to OAPThis depends on the amount of data in your queue, you can keep it consistent with SATELLITE_ENVOY_ALS_V3_PIPE_RECEIVER_FLUSH_TIME
OAPSW_CORE_GRPC_MAX_CONCURRENT_CALL504A link between Satellite and OAP, how many requests parallelism is supportedSame with SATELLITE_QUEUE_PARTITION in Satellite