The HugeGraph-Computer is a distributed graph processing system for HugeGraph (OLAP). It is an implementation of Pregel. It runs on a Kubernetes(K8s) framework.(It focuses on supporting graph data volumes of hundreds of billions to trillions, using disk for sorting and acceleration, which is one of the biggest differences from Vermeer)
Must use ≥ Java 11 to run Computer, and configure by yourself.
Be sure to execute the java -version command to check the jdk version before reading
To run the algorithm with HugeGraph-Computer, you need to install Java 11 or later versions.
You also need to deploy HugeGraph-Server and Etcd.
There are two ways to get HugeGraph-Computer:
Download the latest version of the HugeGraph-Computer release package:
wget https://downloads.apache.org/hugegraph/${version}/apache-hugegraph-computer-incubating-${version}.tar.gz tar zxvf apache-hugegraph-computer-incubating-${version}.tar.gz -C hugegraph-computer
Clone the latest version of HugeGraph-Computer source package:
$ git clone https://github.com/apache/hugegraph-computer.git
Compile and generate tar package:
cd hugegraph-computer mvn clean package -DskipTests
Edit conf/computer.properties to configure the connection to HugeGraph-Server and etcd:
# Job configuration job.id=local_pagerank_001 job.partitions_count=4 # HugeGraph connection (✅ Correct configuration keys) hugegraph.url=http://localhost:8080 hugegraph.name=hugegraph # If authentication is enabled on HugeGraph-Server hugegraph.username= hugegraph.password= # BSP coordination (✅ Correct key: bsp.etcd_endpoints) bsp.etcd_endpoints=http://localhost:2379 bsp.max_super_step=10 # Algorithm parameters (⚠️ Required) algorithm.params_class=org.apache.hugegraph.computer.algorithm.centrality.pagerank.PageRankParams
Important Configuration Notes:
- Use
bsp.etcd_endpoints(NOTbsp.etcd.url) for etcd connectionalgorithm.params_classis required for all algorithms- For multiple etcd endpoints, use comma-separated list:
http://host1:2379,http://host2:2379
You can use
-cparameter specify the configuration file, more computer config please see:Computer Config Options
cd hugegraph-computer bin/start-computer.sh -d local -r master
bin/start-computer.sh -d local -r worker
3.1.6.1 Enable OLAP index query for server
If the OLAP index is not enabled, it needs to be enabled. More reference: modify-graphs-read-mode
PUT http://localhost:8080/graphs/hugegraph/graph_read_mode "ALL"
3.1.6.2 Query page_rank property value:
curl "http://localhost:8080/graphs/hugegraph/graph/vertices?page&limit=3" | gunzip
To run an algorithm with HugeGraph-Computer, you need to deploy HugeGraph-Server first
# Kubernetes version >= v1.16 kubectl apply -f https://raw.githubusercontent.com/apache/hugegraph-computer/master/computer-k8s-operator/manifest/hugegraph-computer-crd.v1.yaml # Kubernetes version < v1.16 kubectl apply -f https://raw.githubusercontent.com/apache/hugegraph-computer/master/computer-k8s-operator/manifest/hugegraph-computer-crd.v1beta1.yaml
kubectl get crd NAME CREATED AT hugegraphcomputerjobs.hugegraph.apache.org 2021-09-16T08:01:08Z
kubectl apply -f https://raw.githubusercontent.com/apache/hugegraph-computer/master/computer-k8s-operator/manifest/hugegraph-computer-operator.yaml
kubectl get pod -n hugegraph-computer-operator-system NAME READY STATUS RESTARTS AGE hugegraph-computer-operator-controller-manager-58c5545949-jqvzl 1/1 Running 0 15h hugegraph-computer-operator-etcd-28lm67jxk5 1/1 Running 0 15h
More computer crd please see: Computer CRD
More computer config please see: Computer Config Options
Basic Example:
cat <<EOF | kubectl apply --filename - apiVersion: hugegraph.apache.org/v1 kind: HugeGraphComputerJob metadata: namespace: hugegraph-computer-operator-system name: &jobName pagerank-sample spec: jobId: *jobName algorithmName: page_rank # ✅ Correct: use underscore format (matches algorithm implementation) image: hugegraph/hugegraph-computer:latest jarFile: /hugegraph/hugegraph-computer/algorithm/builtin-algorithm.jar pullPolicy: Always workerCpu: "4" workerMemory: "4Gi" workerInstances: 5 computerConf: job.partitions_count: "20" algorithm.params_class: org.apache.hugegraph.computer.algorithm.centrality.pagerank.PageRankParams hugegraph.url: http://${hugegraph-server-host}:${hugegraph-server-port} hugegraph.name: hugegraph EOF
Complete Example with Advanced Features:
cat <<EOF | kubectl apply --filename - apiVersion: hugegraph.apache.org/v1 kind: HugeGraphComputerJob metadata: namespace: hugegraph-computer-operator-system name: &jobName pagerank-advanced spec: jobId: *jobName algorithmName: page_rank # ✅ Correct: underscore format image: hugegraph/hugegraph-computer:latest jarFile: /hugegraph/hugegraph-computer/algorithm/builtin-algorithm.jar pullPolicy: Always # Resource limits masterCpu: "2" masterMemory: "2Gi" workerCpu: "4" workerMemory: "4Gi" workerInstances: 5 # JVM options jvmOptions: "-Xmx3g -Xms3g -XX:+UseG1GC" # Environment variables (optional) envVars: - name: REMOTE_JAR_URI value: "http://example.com/custom-algorithm.jar" # Download custom algorithm JAR - name: LOG_LEVEL value: "INFO" # Computer configuration computerConf: # Job settings job.partitions_count: "20" # Algorithm parameters (⚠️ Required) algorithm.params_class: org.apache.hugegraph.computer.algorithm.centrality.pagerank.PageRankParams page_rank.alpha: "0.85" # PageRank damping factor # HugeGraph connection hugegraph.url: http://hugegraph-server:8080 hugegraph.name: hugegraph hugegraph.username: "" # Fill if authentication is enabled hugegraph.password: "" # BSP configuration (⚠️ System-managed in K8s, do not override) # bsp.etcd_endpoints is automatically set by operator bsp.max_super_step: "20" bsp.log_interval: "30000" # Snapshot configuration (optional) snapshot.write: "true" # Enable snapshot writing snapshot.load: "false" # Do not load from snapshot this time snapshot.name: "pagerank-snapshot-v1" snapshot.minio_endpoint: "http://minio:9000" snapshot.minio_access_key: "minioadmin" snapshot.minio_secret_key: "minioadmin" snapshot.minio_bucket_name: "hugegraph-snapshots" # Output configuration output.result_name: "page_rank" output.batch_size: "500" output.with_adjacent_edges: "false" EOF
Configuration Notes:
| Configuration Key | ⚠️ Important Notes |
|---|---|
algorithmName | Must use page_rank (underscore format), matches the algorithm's name() method return value |
bsp.etcd_endpoints | System-managed in K8s - automatically set by operator, do not override in computerConf |
algorithm.params_class | Required - must specify for all algorithms |
REMOTE_JAR_URI | Optional environment variable to download custom algorithm JAR from remote URL |
snapshot.* | Optional - enable snapshots for checkpoint recovery or repeated computations |
kubectl get hcjob/pagerank-sample -n hugegraph-computer-operator-system NAME JOBID JOBSTATUS pagerank-sample pagerank-sample RUNNING
# Show the master log kubectl logs -l component=pagerank-sample-master -n hugegraph-computer-operator-system # Show the worker log kubectl logs -l component=pagerank-sample-worker -n hugegraph-computer-operator-system # Show diagnostic log of a job # NOTE: diagnostic log exist only when the job fails, and it will only be saved for one hour. kubectl get event --field-selector reason=ComputerJobFailed --field-selector involvedObject.name=pagerank-sample -n hugegraph-computer-operator-system
NOTE: it will only be saved for one hour
kubectl get event --field-selector reason=ComputerJobSucceed --field-selector involvedObject.name=pagerank-sample -n hugegraph-computer-operator-system
If the output to Hugegraph-Server is consistent with Locally, if output to HDFS, please check the result file in the directory of /hugegraph-computer/results/{jobId} directory.
Understanding the differences helps you choose the right deployment mode for your use case.
| Feature | Local Mode | Kubernetes Mode |
|---|---|---|
| Configuration | conf/computer.properties file | CRD YAML computerConf field |
| Etcd Management | Manual deployment of external etcd | Operator auto-deploys etcd StatefulSet |
| Worker Scaling | Manual start of multiple processes | CRD workerInstances field auto-scales |
| Resource Isolation | Shared host resources | Pod-level CPU/Memory limits |
| Remote JAR | JAR_FILE_PATH environment variable | CRD remoteJarUri or envVars.REMOTE_JAR_URI |
| Log Viewing | Local logs/ directory | kubectl logs command |
| Fault Recovery | Manual process restart | K8s auto-restarts failed pods |
| Use Cases | Development, testing, small datasets | Production, large-scale data |
Local Mode Prerequisites:
K8s Mode Prerequisites:
Configuration Key Differences:
# Local Mode (computer.properties) bsp.etcd_endpoints=http://localhost:2379 # ✅ User-configured job.workers_count=4 # User-configured
# K8s Mode (CRD) spec: workerInstances: 5 # Overrides job.workers_count computerConf: # bsp.etcd_endpoints is auto-set by operator, do NOT configure job.partitions_count: "20"
Error: “Failed to connect to etcd”
Symptoms: Master or Worker cannot connect to etcd
Local Mode Solutions:
# Check configuration key name (common mistake) grep "bsp.etcd_endpoints" conf/computer.properties # Should output: bsp.etcd_endpoints=http://localhost:2379 # ❌ WRONG: bsp.etcd.url (old/incorrect key) # ✅ CORRECT: bsp.etcd_endpoints # Test etcd connectivity curl http://localhost:2379/version
K8s Mode Solutions:
# Check Operator etcd service kubectl get svc hugegraph-computer-operator-etcd -n hugegraph-computer-operator-system # Verify etcd pod is running kubectl get pods -n hugegraph-computer-operator-system -l app=hugegraph-computer-operator-etcd # Should show: Running status # Test connectivity from worker pod kubectl exec -it pagerank-sample-worker-0 -n hugegraph-computer-operator-system -- \ curl http://hugegraph-computer-operator-etcd:2379/version
Error: “Algorithm class not found”
Symptoms: Cannot find algorithm implementation class
Cause: Incorrect algorithmName format
# ❌ WRONG formats: algorithmName: pageRank # Camel case algorithmName: PageRank # Title case # ✅ CORRECT format (matches PageRank.name() return value): algorithmName: page_rank # Underscore lowercase
Verification:
# Check algorithm implementation in source code # File: computer-algorithm/.../PageRank.java # Method: public String name() { return "page_rank"; }
Error: “Required option ‘algorithm.params_class’ is missing”
Solution:
computerConf: algorithm.params_class: org.apache.hugegraph.computer.algorithm.centrality.pagerank.PageRankParams # ⚠️ Required
Issue: REMOTE_JAR_URI not working
Solution:
spec: envVars: - name: REMOTE_JAR_URI value: "http://example.com/my-algorithm.jar"
Issue: Etcd connection timeout in K8s
Check Operator etcd:
# Verify etcd is running kubectl get pods -n hugegraph-computer-operator-system -l app=hugegraph-computer-operator-etcd # Should show: Running # From worker pod, test etcd connectivity kubectl exec -it pagerank-sample-worker-0 -n hugegraph-computer-operator-system -- \ curl http://hugegraph-computer-operator-etcd:2379/version
Issue: Snapshot/MinIO configuration problems
Verify MinIO service:
# Test MinIO reachability kubectl run -it --rm debug --image=alpine --restart=Never -- sh wget -O- http://minio:9000/minio/health/live # Test bucket permissions (requires MinIO client) mc config host add myminio http://minio:9000 minioadmin minioadmin mc ls myminio/hugegraph-snapshots
Check job overall status:
kubectl get hcjob pagerank-sample -n hugegraph-computer-operator-system # Output example: # NAME JOBSTATUS SUPERSTEP MAXSUPERSTEP SUPERSTEPSTAT # pagerank-sample Running 5 20 COMPUTING
Check detailed events:
kubectl describe hcjob pagerank-sample -n hugegraph-computer-operator-system
Check failure reasons:
kubectl get events --field-selector reason=ComputerJobFailed \ --field-selector involvedObject.name=pagerank-sample \ -n hugegraph-computer-operator-system
Real-time master logs:
kubectl logs -f -l component=pagerank-sample-master -n hugegraph-computer-operator-system
All worker logs:
kubectl logs -l component=pagerank-sample-worker -n hugegraph-computer-operator-system --all-containers=true
More algorithms please see: Built-In algorithms
TODO
TODO
mvn compile in advance to generate corresponding classes.