Vermeer API Documentation

1. Installation and Deployment

1.1 Run Configuration

The computing framework has two roles, worker and master, but only one executable file, which is set by run_mode (master/worker). The --env parameter can specify which configuration file to use, for example, --env=master specifies to use master.ini, and the configuration file is placed in the config/ directory. The master needs to specify the listening port number, and the worker needs to specify the listening port number and the master's ip:port.

Configuration itemsillustrateExample
log_levelLog Leveldebug/info/warn/error
debug_modedebug moderelease
http_peerhttp listening address0.0.0.0:6688
grpc_peergrpc listening address0.0.0.0:6689
master_peermaster grpc address127.0.0.1:6689
run_modeOperation Modemaster/worker

Running the example:

./vermeer --env=master
./vermeer --env=worker01

1.2 Authentication Access

Vermeer has enabled authentication access by default since version 0.2.0. All http requests to access the Vermeer interface need to carry a valid token. Requests without a valid token will be denied access to the service. The token is provided by the Vermeer development side. When used, add the Authorization field in the request Headers.

Postman example:

Host: <calculated when request is sent>
User-Agent: PostmanRuntime/7.29.4
Accept: gzip,deflate,br
Accept-Encoding: gzip,deflate,br
Connection: keep-alive
Authorization: 2Bq2ToHwLnci5wAyvXbhhYwz1YJhb4QLt1P2ttmfD...

Go example:

func setAuth(request *http.Request, token string) {
    request.Header.Set("Authorization", token)
}

2. Task creation class REST API

2.1 General Description

This type of REST API provides all the functions for creating tasks, including reading graph data and various computing functions. It provides two interfaces: asynchronous return and synchronous return. The returned content contains the information of the created task. The overall process of using Vermeer is to first create a task to read the graph, and then create a computing task to perform the calculation after the graph is read. The graph will not be deleted automatically. There is no need to read it repeatedly to run multiple computing tasks on a graph. If you need to delete it, you can use the graph deletion interface. For computing type tasks, you can iterate and batch obtain the calculation results through the 4.8 interface. The task status is divided into the reading task status and the computing task status. The client usually only needs to understand the four states of creation, task in progress, task end, and task error. The graph status is the basis for judging whether the graph is available. If the graph is being read or the graph status is wrong, the graph cannot be used to create a computing task. The graph deletion interface is only available in the loaded and error states and there is no computing task for the graph.

Asynchronous return interface: POST http://master_ip:port/tasks/create , only returns whether the task creation is successful. You need to actively query the task status to determine whether it is completed. Synchronous return interface: POST http://master_ip:port/tasks/create/sync , returns after the task is completed.

parameterIs this field required?typedefault valueValue rangeRemark
task_typestring-load/computeTask Type
graphstring--Figure name
paramsjson{}-Dynamic parameters, see related tasks for specific parameters

Load task status:

stateDiagram
    [*] --> error
    error --> load_vertex : 可能发生错误
    load_vertex --> load_vertex_ok : 顶点加载成功
    load_vertex_ok --> load_scatter : 继续下一步
    load_scatter --> load_scatter_ok : 散射加载成功
    load_scatter_ok --> load_edge : 加载边
    load_edge --> load_edge_ok : 边加载成功
    load_edge_ok --> load_degree : 计算度
    load_degree --> load_degree_ok : 度计算成功
    load_degree_ok --> loaded : 加载完成

Compute task status:

stateDiagram
    [*] --> error
    error --> created : 任务创建
    created --> step_doing : 计算步骤进行中
    step_doing --> step_done : 计算步骤完成
    step_done --> complete : 计算完成(可能循环直到计算结束)

Figure Status:

stateDiagram
    [*] --> created
    created --> loading : 图加载中
    loading --> loaded : 加载完成
    created --> error : 出错

2.2 Loading graph data

Supports the new reload feature, which allows you to reload the graph to update the graph data. Set “task_type”: “reload”. If you do not fill in the params parameter, the default is to use the parameters of the last graph loading task to reload. If you fill in the parameters, the existing parameters will be overwritten. The prerequisite is that the graph must already exist. The graph vertex data is stored in the disk database by default. For small data volumes (<100 million), if you need faster calculation speed, you can change “load.vertex_backend” to “mem”.

Asynchronous return interface: POST http://master_ip:port/tasks/create Synchronous return interface: POST http://master_ip:port/tasks/create/sync

parameterIs this field required?typedefault valueValue rangeRemark
task_typestring-load/reloadTask Type
graphstring--Figure name
paramsjson{}-Dynamic parameters, see the table below for specific parameters
Dynamic parametersIs this field required?typedefault valueValue rangeRemark
load.parallelint1Greater than 0Loading Thread
load.typestringlocallocal/hdfs/afs/hugegraphData loading method
load.edges_per_vertexint10Greater than or equal to 0Each vertex is estimated, affecting loading speed
load.edge_filesRequired for local/hdfs/afs modestring--Edge file name. The local method requires specifying the worker IP address, such as {“10.81.116.77”:“e_00”}. The hdfs/afs method supports wildcards, such as “/data/twitter-2010.e_*”
load.use_propertyint00/1Whether to use attributes
load.vertex_propertyNo (read data from hugegraph, if load.use_property=1, you must specify property, this item is required)json--Vertex attributes, in JSON format, example: {“label”: “2,0”, “name”: “...” }; hugegraph reads as a string, example: “1,2,10001”
load.edge_propertyNo (read data from hugegraph, if load.use_property=1, you must specify property, this item is required)json--Edge attributes, in JSON format, example: {“label”: “2,0”, “name”: “...” }; when hugegraph is read, it is a string, example: “1,2,10001”
load.use_outedgeint00/1Whether to generate vertex out-edge data
load.use_out_degreeint00/1Whether to generate vertex out-degree data
load.use_undirectedint00/1Whether to generate undirected graph data
load.vertex_backendstringdbdb/memVertex underlying storage method, small data volume (<100 million) can be set to mem
load.delimiterstringSpace-Import file column delimiter
load.hdfs_conf_pathRequired for HDFS readstring--hdfs configuration file path
load.hdfs_namenodeRequired for HDFS readstring--hdfs namenode
load.hdfs_use_krbint00/1Whether to use Keberos authentication
load.krb_namestring--Keberos Username
load.krb_realmstring--Keberos realm
load.krb_conf_pathstring--Keberos configuration file path
load.krb_keytab_pathstring--Keberos keytab path
load.hg_pd_peersRequired for hugegraph readingstring--The peers of the hugegraph pd that imports data, example: [“10.14.139.69:8686”]
load.hugegraph_nameRequired for hugegraph readingstring--Hugegraph name of the imported data
load.hugegraph_usernameRequired for hugegraph readingstring--Hugegraph username for importing data
load.hugegraph_passwordRequired for hugegraph readingstring--hugegraph password for importing data
load.hugegraph_vertex_conditionstring--Import vertex number filter statement, example: “element.id > 100”, string meaning is Boolean expression
load.hugegraph_edge_conditionstring--Import edge number filter statement, example: “element.weight > 0.5”, string meaning is boolean expression
load.hg_partitionsstring--Import hugestore partitions. Example: “load.hg_partitions”: “0,1,2”, pull partitions with id 0, 1, 2; if not configured, pull all partitions
load.hugestore_batch_timeoutint120>=60Timeout for importing a batch into hugestore
load.hugestore_batchsizeint100000>0The size of a batch to pull when importing
load.afs_uriAFS read requiredstring--AFS cluster URI for importing data
load.afs_usernameAFS read requiredstring--AFS username for importing data
load.afs_passwordAFS read requiredstring--AFS password for importing data
load.afs_config_pathstring--afs-api configuration file path, generally no need to change

Request example:

// load.type: local
POST http://10.81.116.77:8688/tasks/create
{
    "task_type": "load",
    "graph": "testdb",
    "params": {
        "load.parallel": "50",
        "load.type": "local",
        "load.vertex_files": "{\"10.81.116.77\":\"data/twitter-2010.v_[0,99]\"}",
        "load.edge_files": "{\"10.81.116.77\":\"data/twitter-2010.e_[0,99]\"}",
        "load.use_out_degree": "1",
        "load.use_outedge": "1"
    }
}
// load.type: hdfs
POST http://10.81.116.77:8688/tasks/create
{
    "task_type": "load",
    "graph": "testdb",
    "params": {
        "load.parallel": "50",
        "load.type": "hdfs",
        "load.vertex_files": "/data/twitter-2010.v_*",
        "load.edge_files": "/data/twitter-2010.e_*",
        "load.hdfs_namenode": "hdfs://10.81.116.77:8020",
        "load.hdfs_conf_path": "/home/hadoop/hadoop-3.3.1/etc/hadoop/",
        "load.hdfs_use_krb": "0",
        "load.krb_realm": "",
        "load.krb_name": "",
        "load.krb_keytab_path": "",
        "load.krb_conf_path": "",
        "load.use_out_degree": "1",
        "load.use_outedge": "1"
    }
}
// load.type: afs
POST http://10.81.116.77:8688/tasks/create
{
    "task_type": "load",
    "graph": "testdb",
    "params": {
        "load.parallel": "50",
        "load.type": "afs",
        "load.vertex_files": "/user/vermeer_test/twitter-2010.v_*",
        "load.edge_files": "/user/vermeer_test/twitter-2010.e_*",
        "load.afs_username": "test",
        "load.afs_password": "xxxxxx",
        "load.afs_uri": "afs://wudang.afs.baidu.com:9902/",
        "load.use_out_degree": "1",
        "load.use_outedge": "1"
    }
}
// load.type: hugegraph
POST http://10.81.116.77:8688/tasks/create
{
    "task_type": "load",
    "graph": "testdb",
    "params": {
        "load.parallel": "50",
        "load.type": "hugegraph",
        "load.hg_pd_peers": "[\"10.14.139.69:8686\"]",
        "load.hugegraph_name": "DEFAULT/hugegraph2/g",
        "load.hugegraph_password":"xxxxx",
        "load.use_out_degree": "1",
        "load.use_outedge": "1"
    }
}
// reload
POST http://10.81.116.77:8688/tasks/create
{
    "task_type": "reload",
    "graph": "testdb"
}

2.3 Output calculation result description

All computing tasks (only examples showing local output) support multiple result output methods, and you can customize the output method: local, hdfs, afs or hugegraph. Add the corresponding parameters to the params parameter when sending the request to take effect. Support the output of statistical information of the calculation results. You need to specify output.need_statistics as 1, and the results will be written in the interface task information. The statistical mode operator currently supports “count” and “modularity”, but it is only applicable to the community discovery algorithm.

Dynamic parametersIs this field required?typedefault valueValue range
output.parallelint1Greater than 0
output.delimiterstringSpace-
output.file_pathRequired for local/hdfs/afs modestring--
output.typestringlocallocal/hdfs/afs/hugegraph
output.need_queryint00/1
output.hdfs_conf_pathHDFS mode is requiredstring--
output.hdfs_namenodeHDFS mode is requiredstring--
output.hdfs_use_krbint00/1
output.krb_namestring--
output.krb_realmstring--
output.krb_conf_pathstring--
output.krb_keytab_pathstring--
output.afs_uriafs mode requiredstring--
output.afs_usernameafs mode requiredstring--
output.afs_passwordafs mode requiredstring--
output.afs_config_pathstring--
output.hg_pd_peersHugegraph mode is requiredstring--
output.hugegraph_nameHugegraph mode is requiredstring--
output.hugegraph_usernameHugegraph mode is requiredstring--
output.hugegraph_passwordHugegraph mode is requiredstring--
output.hugegraph_propertyHugegraph mode is requiredstring--
output.hugegraph_write_typestringOLAP_COMMONOLAP_COMMON / OLAP_SECONDARY / OLAP_RANGE
output.need_statisticsint00/1
output.statistics_modestring-count/modularity

Example:

// local
POST http://10.81.116.77:8688/tasks/create
{
    "task_type": "compute",
    "graph": "testdb",
    "params": {
        "compute.algorithm": "pagerank",
        "compute.parallel":"10",
        "compute.max_step":"10",
        "output.type":"local",
        "output.parallel":"1",
        "output.file_path":"result/pagerank"
    }
}
// hdfs
POST http://10.81.116.77:8688/tasks/create
{
    "task_type": "compute",
    "graph": "testdb",
    "params": {
        "compute.algorithm": "pagerank",
        "compute.parallel":"10",
        "compute.max_step":"10",
        "output.type":"hdfs",
        "output.parallel":"1",
        "output.file_path":"data/result/pagerank",
        "output.hdfs_namenode": "hdfs://10.81.116.77:8020",
        "output.hdfs_conf_path": "/home/hadoop/etc/hadoop",
        "output.hdfs_use_krb": "0"
    }
}
// afs
POST http://10.81.116.77:8688/tasks/create
{
    "task_type": "compute",
    "graph": "testdb",
    "params": {
        "compute.algorithm": "pagerank",
        "compute.parallel":"10",
        "compute.max_step":"10",
        "output.type":"afs",
        "output.parallel":"1",
        "output.file_path":"/user/vermeer_test/result/pagerank",
        "output.afs_username": "test",
        "output.afs_password": "xxxxxx",
        "output.afs_uri": "afs://wudang.afs.baidu.com:9902/"
    }
}
// hugegraph
POST http://10.81.116.77:8688/tasks/create
{
    "task_type": "compute",
    "graph": "testdb",
    "params": {
        "compute.algorithm": "pagerank",
        "compute.parallel":"10",
        "compute.max_step":"10",
        "output.type":"hugegraph",
        "output.parallel":"10",
        "output.hg_pd_peers": "[\"10.14.139.69:8686\"]",
        "output.hugegraph_username":"admin", 
        "output.hugegraph_password":"xxxxx",
        "output.hugegraph_name": "DEFAULT/hugegraph0/g",
        "output.hugegraph_property": "pagerank"
    }
}

2.4 PageRank

The PageRank algorithm, also known as the web page ranking algorithm, reflects the relevance and importance of web pages (nodes) by calculating the hyperlinks between web pages (nodes). It is suitable for scenarios such as web page sorting and discovering key figures in social networks. If a web page is linked to by many other web pages, its PageRank value is high; when a web page with a high PageRank value links to other web pages, the PageRank value of the linked web page will be correspondingly increased.

Asynchronous return interface: POST http://master_ip:port/tasks/create Synchronous return interface: POST http://master_ip:port/tasks/create/sync

parameterIs this field required?typedefault valueValue rangeRemark
task_typestring-computeTask Type
graphstring--Figure name
paramsjson{}-Dynamic parameters, see the table below for specific parameters
Dynamic parametersIs this field required?typedefault valueValue rangeRemark
compute.algorithmstring-pagerankAlgorithm Name
compute.parallelint1Greater than 0Worker count threads
compute.max_stepint10Greater than 0Maximum number of iterations
pagerank.dampingfloat0.850 - 1Damping coefficient, the percentage of transmission to the next point
pagerank.diff_thresholdfloat0.000010 - 1Convergence accuracy, the upper limit of the absolute value accumulation and change of each point in each iteration. The algorithm stops when it is less than this value.
POST http://10.81.116.77:8688/tasks/create
{
    "task_type": "compute",
    "graph": "testdb",
    "params": {
        "compute.algorithm": "pagerank",
        "compute.parallel":"10",
        "output.type":"local",
        "output.parallel":"1",
        "output.file_path":"result/pagerank",
        "compute.max_step":"10"
    }
}

2.5 WCC (weakly connected component)

Calculate all connected subgraphs in an undirected graph and output the weakly connected subgraph id to which each vertex belongs, which is used to indicate the connectivity between each point and distinguish different connected communities.

Asynchronous return interface: POST http://master_ip:port/tasks/create Synchronous return interface: POST http://master_ip:port/tasks/create/sync

Dynamic parametersIs this field required?typedefault valueValue rangeRemark
compute.algorithmstring-wccAlgorithm Name
compute.parallelint1Greater than 0Worker count threads
compute.max_stepint10Greater than 0Maximum number of iterations
POST http://10.81.116.77:8688/tasks/create
{
    "task_type": "compute",
    "graph": "testdb",
    "params": {
        "compute.algorithm": "wcc",
        "compute.parallel":"10",
        "output.type":"local",
        "output.parallel":"1",
        "output.file_path":"result/wcc",
        "compute.max_step":"10"
    }
}

2.6 LPA (Label Propagation)

The label transfer algorithm is a graph clustering algorithm that is often used to discover potential communities in social networks.

Asynchronous return interface: POST http://master_ip:port/tasks/create Synchronous return interface: POST http://master_ip:port/tasks/create/sync

Dynamic parametersIs this field required?typedefault valueValue rangeRemark
compute.algorithmstring-lpaAlgorithm Name
compute.parallelint1Greater than 0Worker count threads
compute.max_stepint10Greater than 0Maximum number of iterations
lpa.vertex_weight_propertystring-Vertex weight attribute name, must be int or float
POST http://10.81.116.77:8688/tasks/create
{
    "task_type": "compute",
    "graph": "testdb",
    "params": {
        "compute.algorithm": "lpa",
        "compute.parallel":"10",
        "output.type":"local",
        "output.parallel":"1",
        "output.file_path":"result/lpa",
        "compute.max_step":"10"
    }
}

2.7 Degree Centrality

This algorithm is used to calculate the degree centrality value of each node in the graph, and supports both undirected and directed graphs. In undirected graphs, the degree centrality is calculated based on the edge information and the number of node occurrences; in directed graphs, the in-degree value or out-degree value of the node is obtained by filtering according to the direction of the edge and counting the number of node occurrences based on the input edge or output edge information. The more edges a node has with other nodes, the larger the degree centrality value is, and the higher its importance in the graph.

Asynchronous return interface: POST http://master_ip:port/tasks/create Synchronous return interface: POST http://master_ip:port/tasks/create/sync

Dynamic parametersIs this field required?typedefault valueValue rangeRemark
compute.algorithmstring-degreeAlgorithm Name
compute.parallelint1Greater than 0Worker count threads
degree.directionstringoutout/in/bothEdge direction
compute.max_stepint10Greater than 0Maximum number of iterations
POST http://10.81.116.77:8688/tasks/create
{
    "task_type": "compute",
    "graph": "testdb",
    "params": {
        "compute.algorithm": "degree",
        "compute.parallel":"10",
        "output.type":"local",
        "output.parallel":"1",
        "output.file_path":"result/degree",
        "degree.direction":"both"
    }
}

2.8 Closeness Centrality

Calculate the inverse of the shortest distance from a node to all other reachable nodes, and accumulate and normalize the values. Closeness centrality measures the time it takes for information to be transmitted from the node to other nodes. The greater the “Closeness Centrality” of a node, the closer it is to the center of the graph, which is suitable for scenarios such as key node discovery in social networks.

Asynchronous return interface: POST http://master_ip:port/tasks/create Synchronous return interface: POST http://master_ip:port/tasks/create/sync

Dynamic parametersIs this field required?typedefault valueValue rangeRemark
compute.algorithmstring-closeness_centralityAlgorithm Name
compute.parallelint1Greater than 0Worker count threads
closeness_centrality.sample_ratefloat1.00 ~ 1The sampling rate of the edge. This algorithm requires high computing power. You need to set a reasonable sampling rate according to your needs to obtain approximate results.
closeness_centrality.wf_improvedint10/1Whether to use Wasserman and Faust centrality formula
compute.max_stepint10Greater than 0Maximum number of iterations
POST http://10.81.116.77:8688/tasks/create
{
    "task_type": "compute",
    "graph": "testdb",
    "params": {
        "compute.algorithm": "closeness_centrality",
        "compute.parallel":"10",
        "output.type":"local",
        "output.parallel":"1",
        "output.file_path":"result/closeness_centrality",
        "closeness_centrality.sample_rate":"0.01"
    }
}

2.9 Betweenness Centrality

The betweenness centrality algorithm is used to determine whether a node has the value of a “bridge” node. The larger the value, the more likely it is to be the necessary path between two points in the graph. Typical examples include people who are commonly followed in social networks, which is suitable for measuring the degree of community aggregation around a certain node.

Asynchronous return interface: POST http://master_ip:port/tasks/create Synchronous return interface: POST http://master_ip:port/tasks/create/sync

Dynamic parametersIs this field required?typedefault valueValue rangeRemark
compute.algorithmstring-betweenness_centralityAlgorithm Name
compute.parallelint1Greater than 0Worker count threads
betweenness_centrality.sample_ratefloat1.00 ~ 1The edge sampling rate requires high computing power, and the sampling rate needs to be set as needed to obtain approximate results.
betweenness_centrality.use_endpointint00/1Whether to calculate the last point
compute.max_stepint10Greater than 0Maximum number of iterations
POST http://10.81.116.77:8688/tasks/create
{
    "task_type": "compute",
    "graph": "testdb",
    "params": {
        "compute.algorithm": "betweenness_centrality",
        "compute.parallel":"10",
        "output.type":"local",
        "output.parallel":"1",
        "output.file_path":"result/betweenness_centrality",
        "betweenness_centrality.sample_rate":"0.01"
    }
}

2.10 Triangle Count

Used to count the number of triangles passing through each vertex. In social networks, triangles represent cohesive communities, which helps to understand the clustering and interconnection of individuals or groups in the network; in financial networks or transaction networks, the presence of triangles may indicate suspicious or fraudulent activities, which can help identify transaction patterns that require further investigation. The output result is a Triangle Count for each vertex, that is, the number of triangles in which each vertex is located. This algorithm is an undirected graph algorithm and ignores the direction of the edge.

Asynchronous return interface: POST http://master_ip:port/tasks/create Synchronous return interface: POST http://master_ip:port/tasks/create/sync

Dynamic parametersIs this field required?typedefault valueValue rangeRemark
compute.algorithmstring-triangle_countAlgorithm Name
compute.parallelint1Greater than 0Worker count threads
compute.max_stepint10Greater than 0Maximum number of iterations
POST http://10.81.116.77:8688/tasks/create
{
    "task_type": "compute",
    "graph": "testdb",
    "params": {
        "compute.algorithm": "triangle_count",
        "compute.parallel":"10",
        "output.type":"local",
        "output.parallel":"1",
        "output.file_path":"result/triangle_count"
    }
}

2.11 K-Core

The K-Core algorithm is used to mark all vertices with a degree of K. It is suitable for graph pruning and finding the core part of the graph.

Asynchronous return interface: POST http://master_ip:port/tasks/create Synchronous return interface: POST http://master_ip:port/tasks/create/sync

Dynamic parameterstypedefault valueValue rangeRemark
compute.algorithmstring-kcoreAlgorithm Name
compute.parallelint1Greater than 0Worker count threads
kcore.degree_kint3Greater than 0Minimum degree threshold
compute.max_stepint10Greater than 0Maximum number of iterations
POST http://10.81.116.77:8688/tasks/create
{
    "task_type": "compute",
    "graph": "testdb",
    "params": {
        "compute.algorithm": "kcore",
        "compute.parallel":"10",
        "output.type":"local",
        "output.parallel":"1",
        "output.file_path":"result/kcore",
        "kcore.degree_k":"5"
    }
}

2.12 SSSP (Single Shortest Path)

The single-source shortest path algorithm is used to find the shortest distance from a point to all other points.

Asynchronous return interface: POST http://master_ip:port/tasks/create Synchronous return interface: POST http://master_ip:port/tasks/create/sync

Dynamic parametersIs this field required?typedefault valueValue rangeRemark
compute.algorithmstring-ssssAlgorithm Name
compute.parallelint1Greater than 0Worker count threads
sssp.sourcestring--Starting point ID
compute.max_stepint10Greater than 0Maximum number of iterations
POST http://10.81.116.77:8688/tasks/create
{
    "task_type": "compute",
    "graph": "testdb",
    "params": {
        "compute.algorithm": "sssp",
        "compute.parallel":"10",
        "output.type":"local",
        "output.parallel":"1",
        "output.file_path":"result/degree",
        "sssp.source":"tom"
    }
}

2.13 Short

Taking a point as the starting point, get the k-layer nodes of this point.

Asynchronous return interface: POST http://master_ip:port/tasks/create Synchronous return interface: POST http://master_ip:port/tasks/create/sync

Dynamic parametersIs this field required?typedefault valueValue rangeRemark
compute.algorithmstring-shortAlgorithm Name
compute.parallelint1Greater than 0Worker count threads
short.sourcestring--Starting point ID
compute.max_stepint10大于0使用最大步数作为K值
POST http://10.81.116.77:8688/tasks/create
{
    "task_type": "compute",
    "graph": "testdb",
    "params": {
        "compute.algorithm": "kout",
        "compute.parallel":"10",
        "output.type":"local",
        "output.parallel":"1",
        "output.file_path":"result/kout",
        "kout.source":"tom",
        "compute.max_step":"6"
    }
}

2.14 Louvain 

Louvain算法是一种基于模块度的社区发现算法。其基本思想是网络中节点尝试遍历所有邻居的社区标签,并选择最大化模块度增量的社区标签。在最大化模块度之后,每个社区看成一个新的节点,重复直到模块度不再增大。Vermeer上实现的分布式Louvain算法受节点顺序、并行计算等因素影响,且由于其遍历顺序的随机导致社区压缩也具有一定的随机性,重复多次执行可能存在不同结果,但整体趋势不会有大的变化。 

异步返回接口:POST http://master_ip:port/tasks/create 同步返回接口:POST http://master_ip:port/tasks/create/sync 

动态参数是否必填类型默认值取值范围备注
compute.algorithmstring-louvain算法名
compute.parallelint1大于0worker计算线程数
compute.max_stepint10大于0最大迭代步数(推荐设置到1000)
louvain.thresholdfloat0.0000001(0,1)louvain算法模块度收敛精度,当前模块度计算差值小于精度阈值即停止
louvain.resolutionfloat1.0(0,1]louvain的resolution参数
louvain.stepint10大于0louvain的迭代轮次

注:compute.max_step并不与单机版本的louvain计算轮次对应,一个louvain计算轮次包含多个不确定的compute.max_step。twitter - 2010数据实测需要300 - 400轮才可收敛。推荐将迭代轮次设置较大,调整阈值控制算法是否提前结束。可通过louvain.step控制louvain的迭代轮次,compute.max_step、louvain.step、louvain.threshold任一满足条件都会退出算法。 

POST http://10.81.116.77:8688/tasks/create
{
    "task_type": "compute",
    "graph": "testdb",
    "params": {
        "compute.algorithm": "louvain",
        "compute.parallel":"10",
        "compute.max_step":"1000",
        "louvain.threshold":"0.0000001",
        "louvain.resolution":"1.0",
        "louvain.step":"10",
        "output.type":"local",
        "output.parallel":"1",
        "output.file_path":"result/louvain"
    }
}

2.15 Jaccard相似度系数 

Jaccard index,又称为Jaccard相似系数(Jaccard similarity coefficient)用于比较有限样本集之间的相似性与差异性。Jaccard系数值越大,样本相似度越高。该功能用于计算一个给定的源点,与图中其他所有点的Jaccard相似系数。 

异步返回接口:POST http://master_ip:port/tasks/create 同步返回接口:POST http://master_ip:port/tasks/create/sync 

动态参数是否必填类型默认值取值范围备注
compute.algorithmstring-jaccard算法名
compute.parallelint1大于0worker计算线程数
compute.max_stepint10大于0最大迭代步数(计算仅需2步就结束)
jaccard.sourcestring--给定的源点
POST http://10.81.116.77:8688/tasks/create
{
    "task_type": "compute",
    "graph": "testdb",
    "params": {
        "compute.algorithm": "jaccard",
        "compute.parallel":"10",
        "compute.max_step":"2",
        "jaccard.source":"123",
        "output.type":"local",
        "output.file_path":"result/jaccard"
    }
}

2.16 Personalized PageRank 

个性化的pagerank目标是计算所有节点相对于用户u的相关度。从用户u对应的节点开始游走,每到一个节点都以1 - d的概率停止游走并从u重新开始,或者以d的概率继续游走,从当前节点指向的节点中按照均匀分布随机选择一个节点往下游走。用于给定一个起点,计算此起点开始游走的个性化pagerank得分,适用于社交推荐等场景。注:由于计算需要使用出度,需要在读取图时设置“load.use_out_degree”: “1”。 

异步返回接口:POST http://master_ip:port/tasks/create 同步返回接口:POST http://master_ip:port/tasks/create/sync 

动态参数是否必填类型默认值取值范围备注
compute.algorithmstring-ppr算法名
compute.parallelint1大于0worker计算线程数
compute.max_stepint10大于0最大迭代步数(计算仅需2步就结束)
ppr.sourcestring--给定的起始点
ppr.dampingfloat0.850 - 1阻尼系数,传导到下个点的百分比
ppr.diff_thresholdfloat0.000010 - 1收敛精度,每次迭代各点变化绝对值累加和上限,小于此值时算法停止
POST http://10.81.116.77:8688/tasks/create
{
    "task_type": "compute",
    "graph": "testdb",
    "params": {
        "compute.algorithm": "ppr",
        "compute.parallel":"100",
        "compute.max_step":"10",
        "ppr.source":"123",
        "ppr.damping":"0.85",
        "ppr.diff_threshold":"0.00001",
        "output.type":"local",
        "output.parallel":"1",
        "output.file_path":"result/ppr"
    }
}

2.17 全图Kout 

计算图的所有节点的k度邻居(不包含自己以及1 - k - 1度的邻居),由于全图kout算法内存膨胀比较厉害,目前k限制在1和2。另外,全局kout算法支持过滤功能(参数如:“compute.filter”:“risk_level==1”),在计算第k度的时候进行过滤条件的判断,符合过滤条件的进入最终结果集,算法最终输出是符合条件的邻居个数。 

异步返回接口:POST http://master_ip:port/tasks/create 同步返回接口:POST http://master_ip:port/tasks/create/sync 

动态参数是否必填类型默认值取值范围备注
compute.algorithmstring-kout_all算法名
compute.max_stepint10大于0使用最大步数作为K值
compute.filterstring-属性支持数值类型和文本类型的比较。属性为数值类型:“level==1”,支持所有比较运算符;属性为文本类型:“name==“test””,文本类型只支持 == 和 != 运算符

备注:compute.filter参数如果没有则不进行过滤。 

POST http://10.81.116.77:8688/tasks/create
{
    "task_type": "compute",
    "graph": "testdb",
    "params": {
        "compute.algorithm": "kout_all",
        "compute.parallel":"10",
        "output.type":"local",
        "output.parallel":"10",
        "output.file_path":"result/kout",
        "compute.max_step":"2",
        "compute.filter":"risk_level==1" 
    }
}

2.18 集聚系数clustering coefficient 

集聚系数表示一个图中节点聚集程度的系数。在现实网络中,节点总是趋向于建立一组严密的组织关系。集聚系数算法(Cluster Coefficient)用于计算图中节点的聚集程度,本算法为局部集聚系数,可测量图中每一个结点附近的集聚程度。 

异步返回接口:POST http://master_ip:port/tasks/create 同步返回接口:POST http://master_ip:port/tasks/create/sync 

动态参数是否必填类型默认值取值范围备注
compute.algorithmstring-clustering_coefficient算法名
compute.parallelint1大于0worker计算线程数
compute.max_stepint10大于0最大迭代步数(计算仅需2步)
POST http://10.81.116.77:8688/tasks/create
{
    "task_type": "compute",
    "graph": "testdb",
    "params": {
        "compute.algorithm": "clustering_coefficient",
        "compute.parallel":"100",
        "compute.max_step":"10",
        "output.type":"local",
        "output.parallel":"1",
        "output.file_path":"result/cc"
    }
}

2.19 SCC(强连通分量) 

在有向图的数学理论中,如果一个图的每一个顶点都可从该图其他任意一点到达,则称该图是强连通的。在任意有向图中能够实现强连通的部分我们称其为强连通分量,用于表明各个点之间的连通性,区分不同的连通社区。 

异步返回接口:POST http://master_ip:port/tasks/create 同步返回接口:POST http://master_ip:port/tasks/create/sync 

动态参数是否必填类型默认值取值范围备注
compute.algorithmstring-scc算法名
compute.parallelint1大于0worker计算线程数
compute.max_stepint10大于0最大迭代步数

注:强连通分量算法通过多次的前向和后向传播找出强连通分量子图,需要较多轮次才能收敛,建议设置迭代轮次较大,算法会在收敛后自动结束。实测twitter数据集需120轮收敛。 

POST http://10.81.116.77:8688/tasks/create
{
    "task_type": "compute",
    "graph": "testdb",
    "params": {
        "compute.algorithm": "scc",
        "compute.parallel":"10",
        "output.type":"local",
        "output.parallel":"1",
        "output.file_path":"result/scc",
        "compute.max_step":"200"
    }
}

三、其他rest api 

3.1 获取graph列表 

获取数据库中的所有图列表。 

  • uriGET http://master_ip:port/graphs 
  • params:无 
  • response: 
{
    "errcode": 0,
    "graphs": [
        {
            "name": "testdb",
            "status": "loaded",
            "create_time": "2022-11-01T17:08:43.100831+08:00",
            "update_time": "2022-11-01T17:08:43.137682+08:00",
            "vertex_count": 10,
            "edge_count": 18,
            "workers": [
                {
                    "Name": "1587370926564790272",
                    "VertexCount": 4,
                    "VertIdStart": 0,
                    "EdgeCount": 7,
                    "IsSelf": false,
                    "ScatterOffset": 0
                },
                {
                    "Name": "1587370993006116864",
                    "VertexCount": 6,
                    "VertIdStart": 4,
                    "EdgeCount": 11,
                    "IsSelf": false,
                    "ScatterOffset": 0
                }
            ]
        }
    ]
}

3.2 获取指定graph信息 

获取数据库中的指定图数据。 

  • uriGET http://master_ip:port/graphs/{$db_name} 
  • params: |参数名|是否必填|类型|默认值|取值范围|备注| |---|---|---|---|---|---| |db_name|是|string|-|-|要查询的图库名| 
  • response: 
{
    "errcode": 0,
    "graph": {
        "name": "testdb",
        "status": "loaded",
        "create_time": "2022-11-01T17:08:43.100831+08:00",
        "update_time": "2022-11-01T17:08:43.137682+08:00",
        "vertex_count": 10,
        "edge_count": 18,
        "workers": [
            {
                "Name": "1587370926564790272",
                "VertexCount": 4,
                "IsSelf": false,
                "ScatterOffset": 0
            },
            {
                "Name": "1587370993006116864",
                "VertexCount": 6,
                "VertIdStart": 4,
                "EdgeCount": 11,
                "IsSelf": false,
                "ScatterOffset": 0
            }
        ],
        "use_out_edges": true,
        "use_out_degree": true
    }
}

3.3 删除graph 

删除图并释放空间,仅在图状态 “status” 为“loaded”或“error”时可用。 

  • uriDELETE http://master_ip:port/graphs/{$db_name} 
  • params: |参数名|是否必填|类型|默认值|取值范围|备注| |---|---|---|---|---|---| |db_name|是|string|-|-|要删除的图库名| 
  • response: 
{
    "errcode": 0,
    "deleted": "ok"
}

3.4 查询顶点的边信息 

不推荐业务依赖此接口查询,仅适合于debug。接口有效条件:1. 图的导入任务必须完成,图状态为loaded;2. 接口有效的生命周期仅为本图进行计算任务时,非本图进行计算时,会对非计算任务的图进行落盘以释放内存空间,此后不可查询图的具体数据,直到下一次发起该图的计算任务时,该图会从磁盘中调起。 

  • uriGET http://master_ip:port/graphs/{$db_name}/edges?vertex_id={$vertex_id}&direction={$direction} 
  • params: |参数名|是否必填|类型|默认值|取值范围|备注| |---|---|---|---|---|---| |db_name|是|string|-|-|要查询的图库名| |vertex_id|是|string|-|-|图中的顶点| |direction|是|string|-|in/out/both|入边、出边或全部| 
  • response: 
{
    "errcode": 0,
    "in_edges": [
        "13426",
        "13441",
        "13450",
        "700623"
    ],
    "out_edges": [
        "13441",
        "13450",
        "700623"
    ]
}

3.5 查询worker信息 

  • 简介:查询worker的id、name、ip等信息。 
  • uriGET http://master_ip:port/workers 
  • params:无 
  • response: 
{
    "workers": [
        {
            "id": 1,
            "name": "1590231886607126528",
            "grpc_peer": "10.157.12.67:8397",
            "ip_addr": "10.157.12.67",
            "launch_time": "2022-12-26T11:27:33.59824+08:00"
        },
        {
            "id": 2,
            "name": "1590232206384066560",
            "grpc_peer": "10.157.12.68:8395",
            "ip_addr": "10.157.12.68",
            "state": "READY",
            "version": "0.0.1",
            "launch_time": "2022-12-26T11:27:33.59824+08:00"
        }
    ]
}
  • 示例: 
GET http://10.81.116.77:8688/workers

3.6 查询master信息 

  • 简介:查询master的相关信息。 
  • uriGET http://master_ip:port/master 
  • params:无 
  • response: 
{
    "master": {
        "grpc_peer": "0.0.0.0:6689",
        "ip_addr": "0.0.0.0:6688",
        "debug_mod": "release",
        "launch_time": "2022-12-26T11:27:30.031169+08:00"
    }
}
  • 示例: 
GET http://0.0.0.0:6688/master

3.7 获取所有任务列表 

  • 简介:查询已创建的所有任务。 
  • uriGET http://master_ip:port/tasks?type={$type} 
  • params: |参数|是否必填|类型|默认值|取值范围|备注| |---|---|---|---|---|---| |type|是|string|all|all/todo|任务类型,all:所有任务,todo:待执行任务| 
  • response: 
{
    "errcode": 0,
    "tasks": [
        {
            "id": 1,
            "status": "loaded",
            "create_time": "2022-11-10T11:50:57.101218+08:00",
            "update_time": "2022-11-10T11:51:14.445569+08:00",
            "graph_name": "testGraph",
            "task_type": "load",
            "params": {
                "load.delimiter": " ",
                "load.type": "local",
                "load.use_out_degree": "1",
                "load.use_outedge": "1",
                "load.use_property": "0",
                "load.use_undirected": "0",
                "load.vertex_files": "{\"127.0.0.1\":\"test_case/vertex/vertex_[0,29]\"}",
                "load.edge_files": "{\"127.0.0.1\":\"test_case/edge/edge_[0,29]\"}"
            },
            "workers": [
                {
                    "name": "1590552492724137984",
                    "status": "loaded"
                },
                {
                    "name": "1590552530107645952",
                    "status": "loaded"
                }
            ]
        },
        {
            "id": 2,
            "status": "complete",
            "create_time": "2022-11-10T11:51:20.098327+08:00",
            "update_time": "2022-11-10T11:51:22.95215+08:00",
            "graph_name": "testGraph",
            "task_type": "compute",
            "params": {
                "compute.algorithm": "pagerank",
                "compute.max_step": "10",
                "compute.parallel": "30",
                "output.delimiter": ",",
                "output.file_path": "./data/pagerank",
                "output.parallel": "1",
                "output.type": "local",
                "pagerank.damping": "0.85"
            },
            "workers": [
                {
                    "name": "1590552492724137984",
                    "status": "complete"
                },
                {
                    "name": "1590552530107645952",
                    "status": "complete"
                }
            ]
        }
    ]
}
  • 示例: 
GET http://0.0.0.0:6688/tasks

3.8 获取单个任务信息 

  • 简介:查询指定task_id的任务信息。 
  • uriGET http://master_ip:port/task/{$task_id} 
  • params: |参数名|是否必填|类型|默认值|取值范围|备注| |---|---|---|---|---|---| |task_id|是|int|-|-|要查询的任务id| 
  • response: 
{
    "errcode": 0,
    "task": {
        "id": 1,
        "status": "loaded",
        "create_time": "2022-11-10T11:50:57.101218+08:00",
        "update_time": "2022-11-10T11:51:14.445569+08:00",
        "graph_name": "testGraph",
        "task_type": "load",
        "params": {
            "load.delimiter": " ",
            "load.type": "local",
            "load.use_out_degree": "1",
            "load.use_outedge": "1",
            "load.use_property": "0",
            "load.use_undirected": "0",
            "load.vertex_files": "{\"127.0.0.1\":\"test_case/vertex/vertex_[0,29]\"}",
            "load.edge_files": "{\"127.0.0.1\":\"test_case/edge/edge_[0,29]\"}"
        },
        "workers": [
            {
                "name": "1590552492724137984",
                "status": "loaded"
            },
            {
                "name": "1590552530107645952",
                "status": "loaded"
            }
        ]
    }
}
  • 示例: 
GET http://0.0.0.0:6688/task/1

3.9 计算任务迭代查询结果 

  • 简介:在计算任务中,可在计算任务结束后,通过迭代器批量查询计算结果。计算结果查询功能在十分钟内若未接收到新的查询请求,将会自动释放内存。 
  • uriPOST http://master_ip:port/tasks/value/{$task_id}?cursor={$cursor}&limit={$limit} 
  • params: |参数|是否必填|类型|默认值|取值范围|备注| |---|---|---|---|---|---| |task_id|是|int|-|-|任务id| |cursor|否|int|0|0 - 顶点总数|下标| 
  • response:见示例 
  • 示例: 创建计算任务并完成,在创建时需要传入参数output.need_query。 
POST http://10.81.116.77:8688/tasks/create/sync
{
    "task_type": "compute",
    "graph": "testdb",
    "params": {
        "compute.algorithm": "pagerank",
        "compute.parallel":"10",
        "output.type":"local",
        "output.need_query":"1",
        "output.parallel":"1",
        "output.file_path":"result/pagerank",
        "compute.max_step":"10"
    }
}

response示例: 

{
    "errcode": 0,
    "task": {
        "id": 3,
        "status": "complete",
        "create_type": "sync",
        "create_time": "2022-12-05T16:42:07.669406+08:00",
        "update_time": "2022-12-05T16:42:10.441145+08:00",
        "graph_name": "testGraph",
        "task_type": "compute",
        "params": {
            "compute.algorithm": "pagerank",
            "compute.max_step": "10",
            "compute.parallel": "100",
            "output.delimiter": ",",
            "output.parallel": "1",
            "output.type": "local"
        },
        "workers": [
            {
                "name": "1599650248728154112",
                "status": "complete"
            },
            {
                "name": "1599650253686800384",
                "status": "complete"
            },
            {
                "name": "1599650242993369088",
                "status": "complete"
            }
        ]
    }
}

The initial value of cursor is 0. Each step of iterative query will return a new cursor value. During the iterative acquisition process, the new cursor returned by the previous step is passed in to iteratively obtain the calculation results. After the calculation task is completed, the calculation results can be iteratively queried by the task id. Use the task id to pass it into the following uri for use.

POST http://master_ip:port/tasks/value/{$task_id}?index={$index}&limit={$limit}

Example of iteratively obtaining the calculation result response:

{
    "errcode": 0,
    "vertices": [
        {
            "ID": "646354",
            "Value": "6.712826E-07"
        },
        {
            "ID": "646357",
            "Value": "4.7299116E-07"
        },
        {
            "ID": "646362",
            "Value": "9.832202E-07"
        },
        {
            "ID": "646365",
            "Value": "1.2438179E-06"
        },
        {
            "ID": "646368",
            "Value": "4.4179902E-07"
        }
    ],
    "cursor": 10
}

3.10 Interrupt Task Interface

  • Description : Interrupt the ongoing specified task according to the task id. Possible interruption failure situations: the task has been completed, the task does not exist, and the task status is an error.
  • type : GET http://ip:port/task/cancel/{$task_id}
  • params
Parameter nameIs this field required?typedefault valueValue rangeRemark
task_idint--The id of the task to be interrupted
  • response : see example
  • Example :
GET http://10.81.116.77:8688/task/cancel/10
//中断成功
//200 OK
response:
{ 
    "errcode": 0,
    "message": "cancel task: ok"
}
GET http://10.81.116.77:8688/task/cancel/10
//中断失败
//400 BadRequest
response:
{
    "errcode": -1,
    "message": "task already complete"
}

3.11 healthcheck

  • Description : Check whether the master or worker program is normal.
  • type : GET http://ip:port/healthcheck
  • params : None
  • response
{
    "code": 200
}
  • Example :
GET http://10.81.116.77:8688/healthcheck

3.12 metrics

Both master and worker can use it. In addition to the built-in indicators of Go, the added custom indicators include: http request related information, graph related information, task related information, etc. See the example below for details.

  • uriGET http://ip:port/metrics
  • params : None
  • response
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 3.7791e-05
go_gc_duration_seconds{quantile="0.25"} 5.2001e-05
go_gc_duration_seconds{quantile="0.5"} 0.000126126
go_gc_duration_seconds{quantile="0.75"} 0.000167708
go_gc_duration_seconds{quantile="1"} 0.013796375
go_gc_duration_seconds_sum 0.014740127
go_gc_duration_seconds_count 9
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 39
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.18.8"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 6.429872e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 1.696e+07
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.461579e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 271491
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 5.4024e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 6.429872e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 1.1165696e+07
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 8.658944e+06
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 29651
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 5.414912

### 3.13 查询顶点的属性信息
不推荐业务依赖此接口查询,仅适合于debug
- **接口有效条件**:
    - 图的导入任务必须完成,图状态为loaded
    - 接口有效的生命周期仅为本图进行计算任务时,非本图进行计算时,会对非计算任务的图进行落盘以释放内存空间,此后不可查询图的具体数据,直到下一次发起该图的计算任务时,该图会从磁盘中调起。
- **简介**:通过传参可批量查询顶点的属性信息。
- **uri**:`POST http://master_ip:port/graphs/{$graph_name}/vertices`
- **params**:
|参数|是否必填|类型|默认值|取值范围|备注|
|---|---|---|---|---|---|
|graph_name|是|string|-|-|图名|
|vertices|是|array|-|-|顶点id集合|
- **示例**:
```json
POST http://10.81.116.77:8688/graphs/{$graph_name}/vertices
request:
{
    "vertices": [
        "100",
        "101",
        "102",
        "103"
    ]
}
response:
{
    "errcode": 0,
    "vertices": [
        {
            "id": "100",
            "property": {
                "label": "user"
            }
        },
        {
            "id": "101",
            "property": {
                "label": "user"
            }
        },
        {
            "id": "102",
            "property": {
                "label": "user"
            }
        },
        {
            "id": "103",
            "property": {
                "label": "user"
            }
        }
    ]
}

IV. Appendix: Task creation parameter list

Parameter nametypedefault valueValue rangeRemark
load.parallelint1Greater than 0Number of data loading threads
load.typestringlocallocal/hdfs/afsLoading method
load.vertex_filesstring--Vertex file name. Multiple files use the [n,m] format, where n is the starting number, such as v_[0,99] for v_00. If the loading method is local, only local files can be read. For example: {“10.81.116.77”:“v_[0,49]”, “10.81.116.78”:“v_[50,99]”}
load.edge_filesstring--Edge file name. For multiple files, use the [n,m] format, where n is the starting number, such as e_[0,99] for e_00. If the loading mode is local, only local files can be read. For example: {“10.81.116.77”:“e_[0,49]”, “10.81.116.78”:“e_[50,99]”}
load.use_propertyint00/1Whether to use attributes
load.vertex_propertyjson--Vertex attribute list, in json format, key is the attribute name, example: {“label”: “2,0”, “name”:...}; if the reading method is hugegraph, only pass in a string, the string content is the attribute list to be read, if this item is not passed in, all attributes will be read, example: “1,2,10001”
load.edge_propertyjson--Edge attribute list, in json format, key is the attribute name, example: {“label”: “2,0”, “name”:...}; if the read mode is hugegraph, only pass in a string, the string content is the attribute list to be read, if this item is not passed in, all attributes will be read, example: “1,2,10001”
load.use_outedgeint00/1Whether to generate vertex out-edge data
load.use_out_degreeint00/1Whether to generate vertex out-degree data
load.use_undirectedint00/1Whether to generate undirected graph data
load.delimiterstringSpace-Column delimiter for import file
load.hdfs_conf_pathstring--hdfs configuration file path
load.hdfs_namenodestring--hdfs namenode
load.hdfs_use_krbint00/1Whether to use Keberos authentication
load.krb_namestring--Keberos Username
load.krb_realmstring--Keberos realm
load.krb_conf_pathstring--Keberos configuration file path
load.krb_keytab_pathstring--Keberos keytab path
load.hg_pd_peersstring--The peers of the hugegraph pd that imports data, example: [“10.14.139.69:8686”]
load.hugegraph_namestring--Hugegraph name of the imported data
load.hugegraph_usernamestring--Hugegraph username for importing data
load.hugegraph_passwordstring--hugegraph password for importing data
load.afs_uristring--AFS cluster URI for importing data
load.afs_usernamestring--AFS username for importing data
load.afs_passwordstring--AFS password for importing data
load.afs_config_pathstring--AFS configuration file path for importing data
output.parallelint1Greater than 0Number of data loading threads
output.delimiterstringSpace-Column delimiter for result file
output.file_pathstring--Result file path
output.typestringlocallocal/hdfs/afsExport method
output.need_queryint00/1Whether to query the calculation results through the master http interface
output.hdfs_conf_pathstring--hdfs configuration file path
output.hdfs_namenodestring--hdfs namenode
output.hdfs_use_krbint00/1Whether to use Keberos authentication
output.krb_namestring--Keberos Username
output.krb_realmstring--Keberos realm
output.krb_conf_pathstring--Keberos configuration file path
output.krb_keytab_pathstring--Keberos keytab path
output.afs_uristring--AFS cluster URI for exporting data
output.afs_usernamestring--Afs username for exporting data
output.afs_passwordstring--AFS password for exporting data
output.afs_config_pathstring--AFS configuration file path for exporting data
output.hg_pd_peersstring--Export data hugegraph pd peers
output.hugegraph_namestring--hugegraph name of the exported data
output.hugegraph_usernamestring--Hugegraph username for exporting data
output.hugegraph_passwordstring--hugegraph password for exporting data
output.hugegraph_propertystring--Export data to hugegraph properties
compute.max_stepint10Greater than 0The number of supersteps calculated
compute.parallelint1Greater than 0Number of threads for worker calculation
pagerank.dampingfloat0.850 - 1Damping coefficient, the percentage of transmission to the next point
pagerank.diff_thresholdfloat0.000010 - 1Convergence accuracy, the upper limit of the cumulative absolute value of each point change in each iteration. The algorithm stops when it is less than this value.
short.sourcestring--Starting point ID
degree.directionstringoutin/out/bothEdge direction
closeness_centrality.sample_ratefloat1.00 - 1The edge sampling rate. Since this algorithm is an exponential growth algorithm, the computing power requirement is very high. You need to set a reasonable sampling rate according to business needs to get an approximate result.
closeness_centrality.wf_improvedint10/1Whether to use Wasserman and Faust centrality formula
betweenness_centrality.sample_ratefloat1.00 - 1The edge sampling rate. Since this algorithm is an exponential growth algorithm, the computing power requirement is very high. You need to set a reasonable sampling rate according to business needs to get an approximate result.
betweenness_centrality.use_endpointint00/1Whether to use the last point
sssp.sourcestring--Starting point ID
kcore.degree_kint3Greater than 0Minimum degree threshold