The following are the highlights in this release:
We exposed many metadata of a Pegasus cluster and some system states of a Pegasus server through HTTP interfaces. Thanks to @Skysheepwang.
Related PR: XiaoMi/rdsn#280, #360, XiaoMi/rdsn#321, XiaoMi/rdsn#296
Related Docs: https://pegasus.apache.org/api/http
In XiaoMi as our user base grows, improvements on our multi-tenant support become increasingly needed for stability. For example, we lack monitoring support for table-level latency (only server-level latency). When a table is observed unreasonably slow, we need to refine our slow query mechanism in order to dynamically configure the “slow” threshold per table without system reboot. Another requirement is to support size-based write throttling to reduce the influence of a high-throughput-low-QPS table to other tables in the same cluster.
Related PR: XiaoMi/rdsn#314, XiaoMi/rdsn#298, #400, XiaoMi/rdsn#336, XiaoMi/rdsn#340
Docs on table-level latency: TBD
Docs on table-level slow query: TBD
Docs on size-based throttling: TBD
Pegasus is continuously lowering our access costs for community users. In this release, we provide our experimental support on Promentheus, thanks to @ChenQShmily.
Related PR: XiaoMi/rdsn#287, XiaoMi/rdsn#284, #397, #368
Related Docs: TBD
Thanks to @linlinhaohao888.
Related PR: XiaoMi/rdsn#290
Related Docs: TBD
No configuration update is needed in this release.
For table-level latency, some new perf-counters are added. Every read/write operation to a table has two perf-counters (p99&p999) for latency.
replica*eon.replica*table.level.RPC_RRDB_RRDB_PUT.latency(ns)@${for.each.table}
replica*eon.replica*table.level.RPC_RRDB_RRDB_PUT.latency(ns)@${for.each.table}.p999
replica*eon.replica*table.level.RPC_RRDB_RRDB_GET.latency(ns)@${for.each.table}
replica*eon.replica*table.level.RPC_RRDB_RRDB_GET.latency(ns)@${for.each.table}.p999