Deployment Guide

This guide provides comprehensive instructions for deploying HugeGraph Store in various environments, from development to production clusters.

Deployment Topologies
Configuration Reference
Deployment Steps
Docker Deployment
Kubernetes Deployment
Verification and Testing

Deployment Topologies

Topology 1: Minimal Development Setup

Use Case: Local development and testing

Components:

1 PD node (fake-pd mode or real PD)
1 Store node
1 Server node (optional)

Configuration:

Store Node (with fake-pd):

pdserver:
  address: localhost:8686

grpc:
  host: 127.0.0.1
  port: 8500

raft:
  address: 127.0.0.1:8510

app:
  data-path: ./storage
  fake-pd: true  # Built-in PD mode

Characteristics:

✅ Simple setup, fast startup
✅ No external PD cluster required
❌ No high availability
❌ No data replication
❌ Not for production

Topology 2: Small Production Cluster

Use Case: Small production deployments, testing environments

Components:

3 PD nodes
3 Store nodes
2-3 Server nodes

Architecture:

┌─────────────────────────────────────────────────┐
│  Client Applications                             │
└──────────────┬──────────────────────────────────┘
               │
        ┌──────┴──────┬──────────────┐
        │             │              │
   ┌────▼────┐   ┌────▼────┐   ┌────▼────┐
   │ Server1 │   │ Server2 │   │ Server3 │
   │ :8080   │   │ :8080   │   │ :8080   │
   └────┬────┘   └────┬────┘   └────┬────┘
        │             │              │
        └──────┬──────┴──────┬───────┘
               │             │
        ┌──────▼─────────────▼──────┐
        │   PD Cluster (3 nodes)    │
        │   192.168.1.10:8686       │
        │   192.168.1.11:8686       │
        │   192.168.1.12:8686       │
        └──────┬────────────────────┘
               │
        ┌──────┴──────┬──────────────┐
        │             │              │
   ┌────▼────┐   ┌────▼────┐   ┌────▼────┐
   │ Store1  │   │ Store2  │   │ Store3  │
   │ :8500   │   │ :8500   │   │ :8500   │
   │ Raft:   │◄──┤ Raft:   │◄──┤ Raft:   │
   │ :8510   │   │ :8510   │   │ :8510   │
   └─────────┘   └─────────┘   └─────────┘

IP Allocation Example:

PD: 192.168.1.10-12
Store: 192.168.1.20-22
Server: 192.168.1.30-32

Partition Configuration (in PD):

partition:
  default-shard-count: 3     # 3 replicas per partition
  store-max-shard-count: 12  # Max 12 partitions per Store

Capacity:

Data size: Up to 1TB (with proper disk)
QPS: ~5,000-10,000 queries/second
Availability: Tolerates 1 node failure per component

Characteristics:

✅ High availability (HA)
✅ Data replication (3 replicas)
✅ Automatic failover
✅ Production-ready
⚠️ Limited horizontal scalability

Topology 3: Medium Production Cluster

Use Case: Medium-scale production deployments

Components:

3 PD nodes
6-9 Store nodes
3-6 Server nodes

Architecture:

Load Balancer (Nginx/HAProxy)
        │
   ┌────┴────┬────────┬────────┬────────┐
   │         │        │        │        │
Server1  Server2  Server3  Server4  Server5
   │         │        │        │        │
   └────┬────┴────┬───┴────┬───┴────┬───┘
        │         │        │        │
    PD Cluster (3 nodes)
        │
   ┌────┴────┬────────┬────────┬────────┬────────┐
   │         │        │        │        │        │
Store1   Store2   Store3   Store4   Store5   Store6
  (Rack 1)  (Rack 1) (Rack 2) (Rack 2) (Rack 3) (Rack 3)

Rack-Aware Placement (configured in PD):

Distribute replicas across racks for fault isolation
Each partition has replicas on different racks

Partition Configuration:

partition:
  default-shard-count: 3        # 3 replicas
  store-max-shard-count: 20     # More partitions per Store

Capacity:

Data size: 5-10TB
QPS: ~20,000-50,000 queries/second
Availability: Tolerates rack-level failures

Characteristics:

✅ High availability with rack isolation
✅ Better horizontal scalability
✅ Higher throughput
⚠️ More complex deployment

Topology 4: Large-Scale Cluster

Use Case: Large-scale production deployments with high throughput

Components:

5 PD nodes
12+ Store nodes
6+ Server nodes

Architecture:

         Load Balancer Layer
                │
        ┌───────┴───────┐
        │               │
   Server Pool     Server Pool
   (Zone A)        (Zone B)
        │               │
        └───────┬───────┘
                │
         PD Cluster (5 nodes)
         (Multi-Zone)
                │
        ┌───────┴───────────┐
        │                   │
  Store Pool (Zone A)  Store Pool (Zone B)
  6-12 nodes           6-12 nodes

Multi-Zone Deployment:

PD: 5 nodes across 2-3 availability zones
Store: Distributed across zones with zone-aware replica placement
Server: Load-balanced across zones

Partition Configuration:

partition:
  default-shard-count: 3
  store-max-shard-count: 30-50  # High partition count for load distribution

Capacity:

Data size: 20TB+
QPS: 100,000+ queries/second
Availability: Tolerates zone-level failures

Characteristics:

✅ Maximum availability and scalability
✅ Zone-level fault tolerance
✅ Elastic scaling
⚠️ Complex operational overhead

Topology 5: Co-located Deployment

Use Case: Resource optimization, smaller deployments

Components:

3 nodes, each running: PD + Store + Server

Architecture:

Node 1 (192.168.1.10)          Node 2 (192.168.1.11)          Node 3 (192.168.1.12)
┌─────────────────────┐        ┌─────────────────────┐        ┌─────────────────────┐
│ Server :8080        │        │ Server :8080        │        │ Server :8080        │
│ PD     :8686, :8620 │        │ PD     :8686, :8620 │        │ PD     :8686, :8620 │
│ Store  :8500, :8510 │        │ Store  :8500, :8510 │        │ Store  :8500, :8510 │
└─────────────────────┘        └─────────────────────┘        └─────────────────────┘

Port Allocation (per node):

Server: 8080 (REST), 8182 (Gremlin)
PD: 8686 (gRPC), 8620 (REST), 8610 (Raft)
Store: 8500 (gRPC), 8520 (REST), 8510 (Raft)

Characteristics:

✅ Lower hardware cost (fewer machines)
✅ Simplified networking
⚠️ Resource contention between components
⚠️ Lower fault isolation (node failure affects all components)

Recommendations:

Use for small to medium workloads
Ensure sufficient CPU (16+ cores) and memory (64GB+) per node
Use separate disks for Store data and PD metadata

Configuration Reference

PD Configuration

File: hugegraph-pd/conf/application.yml

# PD gRPC Server
grpc:
  host: 192.168.1.10          # Bind address (use actual IP)
  port: 8686                  # gRPC port

# PD REST API
server:
  port: 8620

# Raft Configuration
raft:
  address: 192.168.1.10:8610                                   # This PD's Raft address
  peers-list: 192.168.1.10:8610,192.168.1.11:8610,192.168.1.12:8610  # All PD nodes

# PD Data Path
pd:
  data-path: ./pd_data
  initial-store-count: 3      # Min stores before auto-activation
  initial-store-list: 192.168.1.20:8500,192.168.1.21:8500,192.168.1.22:8500  # Auto-activate stores

# Partition Settings
partition:
  default-shard-count: 3      # Replicas per partition
  store-max-shard-count: 20   # Max partitions per Store node

# Store Monitoring
store:
  max-down-time: 172800       # Seconds before marking Store permanently offline (48h)
  monitor_data_enabled: true
  monitor_data_interval: 1 minute
  monitor_data_retention: 7 days

Store Configuration

File: hugegraph-store/conf/application.yml

# PD Connection
pdserver:
  address: 192.168.1.10:8686,192.168.1.11:8686,192.168.1.12:8686  # PD cluster endpoints

# Store gRPC Server
grpc:
  host: 192.168.1.20                    # Bind address (use actual IP)
  port: 8500                            # gRPC port for client connections
  max-inbound-message-size: 1000MB      # Max request size
  netty-server-max-connection-idle: 3600000  # Connection idle timeout (ms)

# Store REST API
server:
  port: 8520                            # REST API for management/metrics

# Raft Configuration
raft:
  address: 192.168.1.20:8510            # Raft RPC address
  snapshotInterval: 1800                # Snapshot interval (seconds)
  disruptorBufferSize: 1024             # Raft log buffer
  max-log-file-size: 10737418240        # Max log file: 10GB

# Data Storage
app:
  data-path: ./storage                  # Data directory (supports multiple paths: ./storage,/data1,/data2)
  fake-pd: false                        # Use real PD cluster

File: hugegraph-store/conf/application-pd.yml (RocksDB tuning)

rocksdb:
  # Memory Configuration
  total_memory_size: 32000000000        # Total memory for RocksDB (32GB)
  write_buffer_size: 134217728          # Memtable size (128MB)
  max_write_buffer_number: 6            # Max memtables
  min_write_buffer_number_to_merge: 2   # Min memtables to merge

  # Compaction
  level0_file_num_compaction_trigger: 4
  max_background_jobs: 8                # Background compaction/flush threads

  # Block Cache
  block_cache_size: 16000000000         # Block cache (16GB)

  # SST File Size
  target_file_size_base: 134217728      # Target SST size (128MB)
  max_bytes_for_level_base: 1073741824  # L1 size (1GB)

Server Configuration

File: hugegraph-server/conf/graphs/hugegraph.properties

# Backend Type
backend=hstore
serializer=binary

# Store Connection
store.provider=org.apache.hugegraph.backend.store.hstore.HstoreProvider
store.pd_peers=192.168.1.10:8686,192.168.1.11:8686,192.168.1.12:8686

# Connection Pool
store.max_sessions=4
store.session_timeout=30000

# Graph Configuration
graph.name=hugegraph

Deployment Steps

Step 1: Prerequisites

On all nodes:

# Check Java version (11+ required)
java -version

# Check Maven (for building from source)
mvn -version

# Check network connectivity
ping 192.168.1.10
ping 192.168.1.11

# Check available disk space
df -h

# Open required ports (firewall)
# PD: 8620, 8686, 8610
# Store: 8500, 8510, 8520
# Server: 8080, 8182

Disk Recommendations:

PD: 50GB+ (for metadata and Raft logs)
Store: 500GB+ per node (depends on data size)
Server: 20GB (for logs and temp data)

Step 2: Deploy PD Cluster

On each PD node:

# Extract PD distribution
tar -xzf apache-hugegraph-pd-incubating-1.7.0.tar.gz
cd apache-hugegraph-pd-incubating-1.7.0

# Edit configuration
vi conf/application.yml
# Update grpc.host, raft.address, raft.peers-list

Node 1 (192.168.1.10):

grpc:
  host: 192.168.1.10
  port: 8686
raft:
  address: 192.168.1.10:8610
  peers-list: 192.168.1.10:8610,192.168.1.11:8610,192.168.1.12:8610

Node 2 (192.168.1.11):

grpc:
  host: 192.168.1.11
  port: 8686
raft:
  address: 192.168.1.11:8610
  peers-list: 192.168.1.10:8610,192.168.1.11:8610,192.168.1.12:8610

Node 3 (192.168.1.12):

grpc:
  host: 192.168.1.12
  port: 8686
raft:
  address: 192.168.1.12:8610
  peers-list: 192.168.1.10:8610,192.168.1.11:8610,192.168.1.12:8610

Start PD nodes:

# On each PD node
bin/start-hugegraph-pd.sh

# Check logs
tail -f logs/hugegraph-pd.log

# Verify PD is running
curl http://localhost:8620/actuator/health

Verify PD cluster:

# Check cluster members
curl http://192.168.1.10:8620/v1/members

# Expected output:
{
  "message":"OK",
  "data":{
    "pdLeader":null,
    "pdList":[{
      "raftUrl":"127.0.0.1:8610",
      "grpcUrl":"",
      "restUrl":"",
      "state":"Offline",
      "dataPath":"",
      "role":"Leader",
      "replicateState":"",
      "serviceName":"-PD",
      "serviceVersion":"1.7.0",
      "startTimeStamp":1761818483830
      }],
    "stateCountMap":{
      "Offline":1
      },
      "numOfService":1,
      "state":"Cluster_OK",
      "numOfNormalService":0
      },
    "status":0
}

Step 3: Deploy Store Cluster

On each Store node:

# Extract Store distribution
tar -xzf apache-hugegraph-store-incubating-1.7.0.tar.gz
cd apache-hugegraph-store-incubating-1.7.0

# Edit configuration
vi conf/application.yml

Store Node 1 (192.168.1.20):

pdserver:
  address: 192.168.1.10:8686,192.168.1.11:8686,192.168.1.12:8686

grpc:
  host: 192.168.1.20
  port: 8500

raft:
  address: 192.168.1.20:8510

app:
  data-path: ./storage
  fake-pd: false

Store Node 2 (192.168.1.21):

pdserver:
  address: 192.168.1.10:8686,192.168.1.11:8686,192.168.1.12:8686

grpc:
  host: 192.168.1.21
  port: 8500

raft:
  address: 192.168.1.21:8510

app:
  data-path: ./storage
  fake-pd: false

Store Node 3 (192.168.1.22):

pdserver:
  address: 192.168.1.10:8686,192.168.1.11:8686,192.168.1.12:8686

grpc:
  host: 192.168.1.22
  port: 8500

raft:
  address: 192.168.1.22:8510

app:
  data-path: ./storage
  fake-pd: false

Start Store nodes:

# On each Store node
bin/start-hugegraph-store.sh

# Check logs
tail -f logs/hugegraph-store.log

# Verify Store is running
curl http://localhost:8520/v1/health

Verify Store registration with PD:

# Query PD for registered stores
curl http://192.168.1.10:8620/v1/stores

# Expected output:
{
  "message":"OK",
  "data":{
    "stores":[{
      "storeId":"1783423547167821026",
      "address":"192.168.1.10:8500",
      "raftAddress":"192.168.1.10:8510",
      "version":"","state":"Up",
      "deployPath":"/Users/user/incubator-hugegraph/hugegraph-store/hg-store-node/target/classes/",
      "dataPath":"./storage",
      "startTimeStamp":1761818547335,
      "registedTimeStamp":1761818547335,
      "lastHeartBeat":1761818727631,
      "capacity":245107195904,
      "available":118497292288,
      "partitionCount":0,
      "graphSize":0,
      "keyCount":0,
      "leaderCount":0,
      "serviceName":"192.168.1.10:8500-store",
      "serviceVersion":"",
      "serviceCreatedTimeStamp":1761818547000,
      "partitions":[]}],
      "stateCountMap":{"Up":1},
      "numOfService":1,
      "numOfNormalService":1
      },
      "status":0
}

Step 4: Deploy HugeGraph Server

On each Server node:

# Extract Server distribution
tar -xzf apache-hugegraph-incubating-1.7.0.tar.gz
cd apache-hugegraph-incubating-1.7.0

# Configure backend
vi conf/graphs/hugegraph.properties

Configuration:

backend=hstore
serializer=binary

store.provider=org.apache.hugegraph.backend.store.hstore.HstoreProvider
store.pd_peers=192.168.1.10:8686,192.168.1.11:8686,192.168.1.12:8686

store.max_sessions=4
store.session_timeout=30000

graph.name=hugegraph

Initialize and start:

# Initialize schema (only needed once)
bin/init-store.sh

# Start Server
bin/start-hugegraph.sh

# Check logs
tail -f logs/hugegraph-server.log

# Verify Server is running
curl http://localhost:8080/versions

Docker Deployment

Docker Compose: Complete Cluster

File: docker-compose.yml

version: '3.8'

services:
  # PD Cluster (3 nodes)
  pd1:
    image: hugegraph/hugegraph-pd:1.7.0
    container_name: hugegraph-pd1
    ports:
      - "8686:8686"
      - "8620:8620"
      - "8610:8610"
    environment:
      - GRPC_HOST=pd1
      - RAFT_ADDRESS=pd1:8610
      - RAFT_PEERS=pd1:8610,pd2:8610,pd3:8610
    networks:
      - hugegraph-net

  pd2:
    image: hugegraph/hugegraph-pd:1.7.0
    container_name: hugegraph-pd2
    ports:
      - "8687:8686"
    environment:
      - GRPC_HOST=pd2
      - RAFT_ADDRESS=pd2:8610
      - RAFT_PEERS=pd1:8610,pd2:8610,pd3:8610
    networks:
      - hugegraph-net

  pd3:
    image: hugegraph/hugegraph-pd:1.7.0
    container_name: hugegraph-pd3
    ports:
      - "8688:8686"
    environment:
      - GRPC_HOST=pd3
      - RAFT_ADDRESS=pd3:8610
      - RAFT_PEERS=pd1:8610,pd2:8610,pd3:8610
    networks:
      - hugegraph-net

  # Store Cluster (3 nodes)
  store1:
    image: hugegraph/hugegraph-store:1.7.0
    container_name: hugegraph-store1
    ports:
      - "8500:8500"
      - "8510:8510"
      - "8520:8520"
    environment:
      - PD_ADDRESS=pd1:8686,pd2:8686,pd3:8686
      - GRPC_HOST=store1
      - RAFT_ADDRESS=store1:8510
    volumes:
      - store1-data:/hugegraph-store/storage
    depends_on:
      - pd1
      - pd2
      - pd3
    networks:
      - hugegraph-net

  store2:
    image: hugegraph/hugegraph-store:1.7.0
    container_name: hugegraph-store2
    ports:
      - "8501:8500"
    environment:
      - PD_ADDRESS=pd1:8686,pd2:8686,pd3:8686
      - GRPC_HOST=store2
      - RAFT_ADDRESS=store2:8510
    volumes:
      - store2-data:/hugegraph-store/storage
    depends_on:
      - pd1
      - pd2
      - pd3
    networks:
      - hugegraph-net

  store3:
    image: hugegraph/hugegraph-store:1.7.0
    container_name: hugegraph-store3
    ports:
      - "8502:8500"
    environment:
      - PD_ADDRESS=pd1:8686,pd2:8686,pd3:8686
      - GRPC_HOST=store3
      - RAFT_ADDRESS=store3:8510
    volumes:
      - store3-data:/hugegraph-store/storage
    depends_on:
      - pd1
      - pd2
      - pd3
    networks:
      - hugegraph-net

  # Server (2 nodes)
  server1:
    image: hugegraph/hugegraph:1.7.0
    container_name: hugegraph-server1
    ports:
      - "8080:8080"
    environment:
      - BACKEND=hstore
      - PD_PEERS=pd1:8686,pd2:8686,pd3:8686
    depends_on:
      - store1
      - store2
      - store3
    networks:
      - hugegraph-net

  server2:
    image: hugegraph/hugegraph:1.7.0
    container_name: hugegraph-server2
    ports:
      - "8081:8080"
    environment:
      - BACKEND=hstore
      - PD_PEERS=pd1:8686,pd2:8686,pd3:8686
    depends_on:
      - store1
      - store2
      - store3
    networks:
      - hugegraph-net

networks:
  hugegraph-net:
    driver: bridge

volumes:
  store1-data:
  store2-data:
  store3-data:

Deploy:

# Start cluster
docker-compose up -d

# Check status
docker-compose ps

# View logs
docker-compose logs -f store1

# Stop cluster
docker-compose down

Kubernetes Deployment

StatefulSet: Store Cluster

File: hugegraph-store-statefulset.yaml

apiVersion: v1
kind: Service
metadata:
  name: hugegraph-store
  labels:
    app: hugegraph-store
spec:
  clusterIP: None  # Headless service
  selector:
    app: hugegraph-store
  ports:
    - name: grpc
      port: 8500
    - name: raft
      port: 8510
    - name: rest
      port: 8520

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: hugegraph-store
spec:
  serviceName: hugegraph-store
  replicas: 3
  selector:
    matchLabels:
      app: hugegraph-store
  template:
    metadata:
      labels:
        app: hugegraph-store
    spec:
      containers:
      - name: store
        image: hugegraph/hugegraph-store:1.7.0
        ports:
        - containerPort: 8500
          name: grpc
        - containerPort: 8510
          name: raft
        - containerPort: 8520
          name: rest
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: PD_ADDRESS
          value: "hugegraph-pd-0.hugegraph-pd:8686,hugegraph-pd-1.hugegraph-pd:8686,hugegraph-pd-2.hugegraph-pd:8686"
        - name: GRPC_HOST
          value: "$(POD_NAME).hugegraph-store"
        - name: RAFT_ADDRESS
          value: "$(POD_NAME).hugegraph-store:8510"
        volumeMounts:
        - name: data
          mountPath: /hugegraph-store/storage
        resources:
          requests:
            cpu: "2"
            memory: "8Gi"
          limits:
            cpu: "4"
            memory: "16Gi"
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 500Gi
      storageClassName: fast-ssd

Deploy:

# Create namespace
kubectl create namespace hugegraph

# Deploy PD cluster (prerequisite)
kubectl apply -f hugegraph-pd-statefulset.yaml -n hugegraph

# Deploy Store cluster
kubectl apply -f hugegraph-store-statefulset.yaml -n hugegraph

# Check pods
kubectl get pods -n hugegraph

# Check Store logs
kubectl logs -f hugegraph-store-0 -n hugegraph

# Access Store service
kubectl port-forward svc/hugegraph-store 8500:8500 -n hugegraph

Verification and Testing

Health Check

# PD health
curl http://192.168.1.10:8620/v1/health

# Store health
curl http://192.168.1.20:8520/v1/health

Cluster Status

# PD cluster members
curl http://192.168.1.10:8620/v1/members

# Registered stores
curl http://192.168.1.10:8620/v1/stores

# Partitions
curl http://192.168.1.10:8620/v1/partitions

# Graph list
curl http://192.168.1.10:8620/v1/graphs

Basic Operations Test

# Create vertex via Server
curl -X POST "http://192.168.1.30:8080/graphspaces/{graphspace_name}/graphs/{graph_name}/graph/vertices" \
     -H "Content-Type: application/json" \
     -d '{
         "label": "person",
         "properties": {
             "name": "marko",
             "age": 29
         }
     }'

# Query vertex (using -u if auth is enabled)
curl -u admin:admin \
     -X GET "http://localhost:8080/graphspaces/{graphspace_name}/graphs/graphspace_name}/graph/vertices/{graph_id}

Performance Baseline Test

# Install HugeGraph-Loader (for bulk loading)
tar -xzf apache-hugegraph-loader-1.7.0.tar.gz

# Run benchmark
bin/hugegraph-loader.sh -g hugegraph -f ./example/struct.json -s ./example/schema.groovy

For production monitoring and troubleshooting, see Operations Guide.

For performance tuning, see Best Practices.

Deployment Guide

Table of Contents

Deployment Topologies

Topology 1: Minimal Development Setup

Topology 2: Small Production Cluster

Topology 3: Medium Production Cluster

Topology 4: Large-Scale Cluster

Topology 5: Co-located Deployment

Configuration Reference

PD Configuration

Store Configuration

Server Configuration

Deployment Steps

Step 1: Prerequisites

Step 2: Deploy PD Cluster

Step 3: Deploy Store Cluster

Step 4: Deploy HugeGraph Server

Docker Deployment

Docker Compose: Complete Cluster

Kubernetes Deployment

StatefulSet: Store Cluster

Verification and Testing

Health Check

Cluster Status

Basic Operations Test

Performance Baseline Test