tree: fdb51cb6282057a5c1fb93878cd5d5215157ae1b [path history] [tgz]
  1. src/
  2. AGENTS.md
  3. pom.xml
  4. README.md
hugegraph-struct/README.md

HugeGraph-Struct

Overview

hugegraph-struct is a foundational data structures module that defines the core abstractions and type definitions shared across HugeGraph's distributed components. It serves as the “data contract layer” enabling type-safe communication between hugegraph-pd (Placement Driver), hugegraph-store (distributed storage), and hugegraph-server (graph engine).

Key Characteristics:

  • Pure data structure definitions without business logic
  • Lightweight and stateless (no HugeGraph instance dependencies)
  • Shared type system for distributed RPC communication
  • Binary serialization for efficient storage and network transmission

Why hugegraph-struct?

The Problem

Originally, all data structures and graph engine logic resided in hugegraph-server/hugegraph-core. As HugeGraph evolved toward a distributed architecture, this created several challenges:

  1. Tight Coupling: PD and Store components needed schema definitions but not the entire graph engine
  2. Circular Dependencies: Distributed components couldn't share types without pulling in heavy dependencies
  3. Build Inefficiency: Changes to core required rebuilding all dependent modules
  4. Large Dependencies: PD/Store had to depend on Jersey, JRaft, K8s client, and other server-specific libraries

The Solution

We extracted stateless data structures from hugegraph-core into a separate hugegraph-struct module:

Before (Monolithic):
hugegraph-server/hugegraph-core (everything together)
  ├─ Data structures (schema, types, IDs)
  ├─ Graph engine (traversal, optimization)
  ├─ Transactions (GraphTransaction, SchemaTransaction)
  ├─ Storage backends (memory, raft, cache)
  └─ Business logic (jobs, tasks, auth)

After (Modular):
hugegraph-struct (shared foundation)
  ├─ Schema definitions (VertexLabel, EdgeLabel, PropertyKey, IndexLabel)
  ├─ Type system (HugeType, DataType, IdStrategy)
  ├─ Data structures (BaseVertex, BaseEdge, BaseProperty)
  ├─ Serialization (BytesBuffer, BinarySerializer)
  ├─ Query abstractions (Query, ConditionQuery, Aggregate)
  └─ Utilities (ID generation, text analyzers, exceptions)

hugegraph-core (graph engine only)
  ├─ Depends on hugegraph-struct
  ├─ Implements graph engine logic
  ├─ Manages transactions and storage
  └─ Provides TinkerPop API

Module Responsibilities

ModulePurposeDependencies
hugegraph-structShared data structures, type definitions, serializationMinimal (Guava, TinkerPop, serialization libs)
hugegraph-coreGraph engine, traversal, transactions, storage abstractionhugegraph-struct + heavy libs (Jersey, JRaft, K8s)
hugegraph-pdMetadata coordination, service discoveryhugegraph-struct only
hugegraph-storeDistributed storage with Rafthugegraph-struct only

Dependency Architecture

                    hugegraph-struct (foundational)
                           ↑
        ┌──────────────────┼──────────────────┐
        │                  │                  │
   hugegraph-pd      hugegraph-store    hugegraph-core
        │                  │                  │
        └──────────────────┼──────────────────┘
                           ↓
                   hugegraph-server (REST API)

Build Order:

# 1. Build struct first (required dependency)
mvn install -pl hugegraph-struct -am -DskipTests

# 2. Then build dependent modules
mvn install -pl hugegraph-pd -am -DskipTests
mvn install -pl hugegraph-store -am -DskipTests
mvn install -pl hugegraph-server -am -DskipTests

Migration Plan

Current Status (Transition Period):

Both hugegraph-struct and hugegraph-core contain similar data structures for backward compatibility. This is a temporary state during the migration period.

Future Direction:

  • hugegraph-struct: Will become the single source of truth for all data structure definitions
  • ⚠️ hugegraph-core: Data structure definitions will be gradually removed and replaced with references to hugegraph-struct
  • 🎯 End Goal: hugegraph-core will only contain graph engine logic and depend on hugegraph-struct for all type definitions

Migration Strategy:

  1. Phase 1 (Current): Both modules coexist; new features use struct
  2. Phase 2 (In Progress): Gradually migrate core's data structures to import from struct
  3. Phase 3 (Future): Remove duplicate definitions from core completely

Example Migration:

// OLD (hugegraph-core)
import org.apache.hugegraph.schema.SchemaElement;  // ❌ Will be deprecated

// NEW (hugegraph-struct)
import org.apache.hugegraph.struct.schema.SchemaElement;  // ✅ Use this

Developer Guide

When to Use hugegraph-struct

Use struct when:

  • Building distributed components (PD, Store)
  • Defining data transfer objects (DTOs) for RPC
  • Implementing serialization/deserialization logic
  • Working with type definitions, schema elements, or IDs
  • Creating shared utilities needed across modules

When to Use hugegraph-core

Use core when:

  • Implementing graph engine features
  • Working with TinkerPop API (Gremlin traversal)
  • Managing transactions or backend storage
  • Implementing graph algorithms or jobs
  • Building server-side business logic

Adding New Data Structures

Rule: All new shared data structures should go into hugegraph-struct, not hugegraph-core.

Example:

// ✅ Correct: Add to hugegraph-struct/src/main/java/org/apache/hugegraph/struct/
public class NewSchemaType extends SchemaElement {
    // Pure data structure, no HugeGraph dependency
}

// ❌ Wrong: Don't add to hugegraph-core unless it's graph engine logic

Modifying Existing Structures

If you need to modify a data structure:

  1. Check if it exists in struct: Modify the struct version
  2. If it only exists in core: Consider migrating it to struct first
  3. Update serialization: Ensure binary compatibility or provide migration

Package Structure

org.apache.hugegraph/
├── struct/schema/          # Schema definitions (VertexLabel, EdgeLabel, etc.)
├── structure/              # Graph elements (BaseVertex, BaseEdge, BaseProperty)
├── type/                   # Type system (HugeType, DataType, IdStrategy)
├── id/                     # ID generation and management
├── serializer/             # Binary serialization (BytesBuffer, BinarySerializer)
├── query/                  # Query abstractions (Query, ConditionQuery, Aggregate)
├── analyzer/               # Text analyzers (8 Chinese NLP implementations)
├── auth/                   # Auth utilities (JWT, constants)
├── backend/                # Backend abstractions (BinaryId, BackendColumn, Shard)
├── options/                # Configuration options
├── util/                   # Utilities (encoding, compression, collections)
└── exception/              # Exception hierarchy

Key Design Principles

  1. Stateless: No HugeGraph instance dependencies in struct
  2. Minimal Dependencies: Only essential libraries (no Jersey, JRaft, K8s)
  3. Serialization-Friendly: All structures support binary serialization
  4. Type Safety: Strong typing for distributed RPC communication
  5. Backward Compatible: Careful versioning to avoid breaking changes

Building and Testing

# Build struct module
mvn clean install -DskipTests

# Build with tests (when tests are added)
mvn clean install

# From parent directory
cd /path/to/hugegraph
mvn install -pl hugegraph-struct -am -DskipTests

Contributing

When contributing to hugegraph-struct:

  1. No Business Logic: Keep it pure data structures
  2. No Graph Instances: Avoid HugeGraph graph fields
  3. Document Changes: Update AGENTS.md if adding new packages
  4. Binary Compatibility: Consider serialization impact
  5. Minimal Dependencies: Justify any new dependency additions

License

Apache License 2.0 - See LICENSE file for details.