HugeGraph-Computer 是分布式图处理系统 (OLAP). 它是 Pregel的一个实现。它可以运行在 Kubernetes(K8s)/Yarn 上。(它侧重可支持百亿~千亿的图数据量下进行图计算, 会使用磁盘进行排序和加速, 这是它和 Vermeer 相对最大的区别之一)
必须在 ≥ Java 11 的环境上启动 Computer,然后自行配置。
在往下阅读之前务必执行 java -version 命令查看 jdk 版本
要使用 HugeGraph-Computer 运行算法,必须装有 Java 11 或更高版本。
还需要首先部署 HugeGraph-Server 和 Etcd.
有两种方式可以获取 HugeGraph-Computer:
下载最新版本的 HugeGraph-Computer release 包:
wget https://downloads.apache.org/hugegraph/${version}/apache-hugegraph-computer-incubating-${version}.tar.gz tar zxvf apache-hugegraph-computer-incubating-${version}.tar.gz -C hugegraph-computer
克隆最新版本的 HugeGraph-Computer 源码包:
$ git clone https://github.com/apache/hugegraph-computer.git
编译生成 tar 包:
cd hugegraph-computer mvn clean package -DskipTests
您可以使用
-c参数指定配置文件,更多 computer 配置请看:Computer Config Options
cd hugegraph-computer bin/start-computer.sh -d local -r master
bin/start-computer.sh -d local -r worker
2.5.1 为 server 启用 OLAP 索引查询
如果没有启用 OLAP 索引,则需要启用,更多参考:modify-graphs-read-mode
PUT http://localhost:8080/graphs/hugegraph/graph_read_mode "ALL"
3.1.5.2 查询 page_rank 属性值:
curl "http://localhost:8080/graphs/hugegraph/graph/vertices?page&limit=3" | gunzip
要使用 HugeGraph-Computer 运行算法,您需要先部署 HugeGraph-Server
# Kubernetes version >= v1.16 kubectl apply -f https://raw.githubusercontent.com/apache/hugegraph-computer/master/computer-k8s-operator/manifest/hugegraph-computer-crd.v1.yaml # Kubernetes version < v1.16 kubectl apply -f https://raw.githubusercontent.com/apache/hugegraph-computer/master/computer-k8s-operator/manifest/hugegraph-computer-crd.v1beta1.yaml
kubectl get crd NAME CREATED AT hugegraphcomputerjobs.hugegraph.apache.org 2021-09-16T08:01:08Z
kubectl apply -f https://raw.githubusercontent.com/apache/hugegraph-computer/master/computer-k8s-operator/manifest/hugegraph-computer-operator.yaml
kubectl get pod -n hugegraph-computer-operator-system NAME READY STATUS RESTARTS AGE hugegraph-computer-operator-controller-manager-58c5545949-jqvzl 1/1 Running 0 15h hugegraph-computer-operator-etcd-28lm67jxk5 1/1 Running 0 15h
更多 computer crd spec 请看:Computer CRD
更多 Computer 配置请看:Computer Config Options
cat <<EOF | kubectl apply --filename - apiVersion: hugegraph.apache.org/v1 kind: HugeGraphComputerJob metadata: namespace: hugegraph-computer-operator-system name: &jobName pagerank-sample spec: jobId: *jobName algorithmName: page_rank image: hugegraph/hugegraph-computer:latest # algorithm image url jarFile: /hugegraph/hugegraph-computer/algorithm/builtin-algorithm.jar # algorithm jar path pullPolicy: Always workerCpu: "4" workerMemory: "4Gi" workerInstances: 5 computerConf: job.partitions_count: "20" algorithm.params_class: org.apache.hugegraph.computer.algorithm.centrality.pagerank.PageRankParams hugegraph.url: http://${hugegraph-server-host}:${hugegraph-server-port} # hugegraph server url hugegraph.name: hugegraph # hugegraph graph name EOF
kubectl get hcjob/pagerank-sample -n hugegraph-computer-operator-system NAME JOBID JOBSTATUS pagerank-sample pagerank-sample RUNNING
# Show the master log kubectl logs -l component=pagerank-sample-master -n hugegraph-computer-operator-system # Show the worker log kubectl logs -l component=pagerank-sample-worker -n hugegraph-computer-operator-system # Show diagnostic log of a job # 注意: 诊断日志仅在作业失败时存在,并且只会保存一小时。 kubectl get event --field-selector reason=ComputerJobFailed --field-selector involvedObject.name=pagerank-sample -n hugegraph-computer-operator-system
NOTE: it will only be saved for one hour
kubectl get event --field-selector reason=ComputerJobSucceed --field-selector involvedObject.name=pagerank-sample -n hugegraph-computer-operator-system
如果输出到 Hugegraph-Server 则与 Locally 模式一致,如果输出到 HDFS ,请检查 hugegraph-computerresults{jobId}目录下的结果文件。
更多算法请看:Built-In algorithms
TODO
TODO
mvn compile来提前生成对应的类。