e7b9b831e4b999236e2ebced370090321b5dbf28 - incubator-pegasus

commit	e7b9b831e4b999236e2ebced370090321b5dbf28	[log] [tgz]
author	Yingchun Lai <laiyingchun@apache.org>	Mon Apr 08 15:23:39 2024 +0800
committer	GitHub <noreply@github.com>	Mon Apr 08 15:23:39 2024 +0800
tree	d5f7e00f286b10836c10e87b5e705232b78c91c1
parent	bcb92f0fd30374d691d1cc5535ca0bf5ae671988 [diff]

refactor(HTTP): Improve the help description of HTTP APIs (#1959)

This patch improves the help description of HTTP APIs.

Before this patch, the path and parameters are output. The path is duplicate
because we can obtain all the paths directly (e.g. `curl 127.0.0.1:34601/`).

After this patch, the help description is more readable and we don't have to
maintain the website docs, e.g.
```
$ curl 127.0.0.1:34601/
{
    "/": "List all supported calls.",
    "/config": "name=<config_name>. Gets the details of a specified config. Only the configs which are registered by DSN_DEFINE_xxx macro can be queried.",
    "/configs": "List all configs. Only the configs which are registered by DSN_DEFINE_xxx macro can be queried.",
    "/meta/app": "name=<app_name>[&detail]. Query app info.",
    "/meta/app/duplication": "name=<app_name>. Query app duplication info.",
    "/meta/app/query_bulk_load": "name=<app_name>. Query app bulk load info.",
    "/meta/app/start_bulk_load": "A JSON format of start_bulk_load_request structure. Start bulk load on an app.",
    "/meta/app/start_compaction": "A JSON format of manual_compaction_info structure. Start compaction for an app.",
    "/meta/app/usage_scenario": "A JSON format of usage_scenario_info structure. Update usage scenario of an app.",
    "/meta/app_envs": "name=<app_name>. Query app environments.",
    "/meta/apps": "[detail]. List all apps in the cluster.",
    "/meta/backup_policy": "name=<app_name1>&name=<app_name2>. Query backup policy by policy names.",
    "/meta/cluster": "Query the cluster info.",
    "/meta/nodes": "[detail]. Query the replica servers info.",
    "/metrics": "[with_metric_fields=field1,field2,...][&types=type1,type2,...][&ids=id1,id2,...][&attributes=attr1,value1,attr2,value2,...][&metrics=metric1,metric2,...][&detail=true|false]Query the node metrics.",
    "/pprof/cmdline": "Query the process' cmdline.",
    "/pprof/growth": "Query the stack traces that caused growth in the address space size.",
    "/pprof/heap": "[seconds=<heap_profile_seconds>]. Query a sample of live objects and the stack traces that allocated these objects (an environment variable TCMALLOC_SAMPLE_PARAMETER should set to a positive value, such as 524288), or the current heap profiling information if 'seconds' parameter is specified.",
    "/pprof/profile": "[seconds=<cpu_profile_seconds>]. Query the CPU profile. 'seconds' is 60 if not specified.",
    "/pprof/symbol": "[symbol_address]. Query the process' symbols. Return the symbol count of the process if using GET, return the symbol of the 'symbol_address' if using POST.",
    "/recentStartTime": "Get the server start time.",
    "/updateConfig": "<key>=<new_value>. Update the config to the new value.",
    "/version": "Get the server version."
}
```
or
```
$ curl 127.0.0.1:34801/
{
    "/": "List all supported calls.",
    "/config": "name=<config_name>. Gets the details of a specified config. Only the configs which are registered by DSN_DEFINE_xxx macro can be queried.",
    "/configs": "List all configs. Only the configs which are registered by DSN_DEFINE_xxx macro can be queried.",
    "/metrics": "[with_metric_fields=field1,field2,...][&types=type1,type2,...][&ids=id1,id2,...][&attributes=attr1,value1,attr2,value2,...][&metrics=metric1,metric2,...][&detail=true|false]Query the node metrics.",
    "/pprof/cmdline": "Query the process' cmdline.",
    "/pprof/growth": "Query the stack traces that caused growth in the address space size.",
    "/pprof/heap": "[seconds=<heap_profile_seconds>]. Query a sample of live objects and the stack traces that allocated these objects (an environment variable TCMALLOC_SAMPLE_PARAMETER should set to a positive value, such as 524288), or the current heap profiling information if 'seconds' parameter is specified.",
    "/pprof/profile": "[seconds=<cpu_profile_seconds>]. Query the CPU profile. 'seconds' is 60 if not specified.",
    "/pprof/symbol": "[symbol_address]. Query the process' symbols. Return the symbol count of the process if using GET, return the symbol of the 'symbol_address' if using POST.",
    "/recentStartTime": "Get the server start time.",
    "/replica/data_version": "app_id=<app_id>. Query the data version of an app.",
    "/replica/duplication": "appid=<appid>. Query the duplication status of an app.",
    "/replica/manual_compaction": "app_id=<app_id>. Query the manual compaction status of an app.",
    "/updateConfig": "<key>=<new_value>. Update the config to the new value.",
    "/version": "Get the server version."
}
```

13 files changed

tree: d5f7e00f286b10836c10e87b5e705232b78c91c1

README.md

Note: The master branch may be in an unstable or even in a broken state during development. Please use GitHub Releases instead of the master branch in order to get stable binaries.

Apache Pegasus is a distributed key-value storage system which is designed to be:

horizontally scalable: distributed using hash-based partitioning
strongly consistent: ensured by PacificA consensus protocol
high-performance: using RocksDB as underlying storage engine
simple: well-defined, easy-to-use APIs

Background

Pegasus targets to fill the gap between Redis and HBase. As the former is in-memory, low latency, but does not provide a strong-consistency guarantee. And unlike the latter, Pegasus is entirely written in C++ and its write-path relies merely on the local filesystem.

Apart from the performance requirements, we also need a storage system to ensure multiple-level data safety and support fast data migration between data centers, automatic load balancing, and online partition split.

Features

Persistence of data: Each write is replicated three-way to different ReplicaServers before responding to the client. Using PacificA protocol, Pegasus has the ability for strong consistent replication and membership changes.
Automatic load balancing over ReplicaServers: Load balancing is a builtin function of MetaServer, which manages the distribution of replicas. When the cluster is in an inbalance state, the administrator can invoke a simple rebalance command that automatically schedules the replica migration.
Cold Backup: Pegasus supports an extensible backup and restore mechanism to ensure data safety. The location of snapshot could be a distributed filesystem like HDFS or local filesystem. The snapshot storing in the filesystem can be further used for analysis based on pegasus-spark.
Eventually-consistent intra-datacenter replication: This is a feature we called duplication. It allows a change made in the local cluster accesible after a short time period by the remote cluster. It help achieving higher availability of your service and gaining better performance by accessing only local cluster.

To start using Pegasus

See our documentation on the Pegasus Website.

Client drivers

Pegasus has support for several languages:

Contact us

Send emails to the Apache Pegasus developer mailing list: dev@pegasus.apache.org. This is the place where topics around development, community, and problems are officially discussed. Please remember to subscribe to the mailing list via dev-subscribe@pegasus.apache.org.
GitHub Issues: submit an issue when you have any idea to improve Pegasus, and when you encountered some bugs or problems.

Related Projects

Test tools:

Java YCSB

Data import/export tools:

DataX