blob: f004bc0c239bd7c672a5db6b8d85e9686d09fddc [file] [log] [blame] [view]
# HugeGraph-Struct
### Overview
**hugegraph-struct** is a foundational data structures module that defines the core abstractions and type definitions shared across HugeGraph's distributed components. It serves as the "data contract layer" enabling type-safe communication between hugegraph-pd (Placement Driver), hugegraph-store (distributed storage), and hugegraph-server (graph engine).
**Key Characteristics**:
- Pure data structure definitions without business logic
- Lightweight and stateless (no `HugeGraph` instance dependencies)
- Shared type system for distributed RPC communication
- Binary serialization for efficient storage and network transmission
### Why hugegraph-struct?
#### The Problem
Originally, all data structures and graph engine logic resided in `hugegraph-server/hugegraph-core`. As HugeGraph evolved toward a distributed architecture, this created several challenges:
1. **Tight Coupling**: PD and Store components needed schema definitions but not the entire graph engine
2. **Circular Dependencies**: Distributed components couldn't share types without pulling in heavy dependencies
3. **Build Inefficiency**: Changes to core required rebuilding all dependent modules
4. **Large Dependencies**: PD/Store had to depend on Jersey, JRaft, K8s client, and other server-specific libraries
#### The Solution
We extracted **stateless data structures** from `hugegraph-core` into a separate `hugegraph-struct` module:
```
Before (Monolithic):
hugegraph-server/hugegraph-core (everything together)
├─ Data structures (schema, types, IDs)
├─ Graph engine (traversal, optimization)
├─ Transactions (GraphTransaction, SchemaTransaction)
├─ Storage backends (memory, raft, cache)
└─ Business logic (jobs, tasks, auth)
After (Modular):
hugegraph-struct (shared foundation)
├─ Schema definitions (VertexLabel, EdgeLabel, PropertyKey, IndexLabel)
├─ Type system (HugeType, DataType, IdStrategy)
├─ Data structures (BaseVertex, BaseEdge, BaseProperty)
├─ Serialization (BytesBuffer, BinarySerializer)
├─ Query abstractions (Query, ConditionQuery, Aggregate)
└─ Utilities (ID generation, text analyzers, exceptions)
hugegraph-core (graph engine only)
├─ Depends on hugegraph-struct
├─ Implements graph engine logic
├─ Manages transactions and storage
└─ Provides TinkerPop API
```
### Module Responsibilities
| Module | Purpose | Dependencies |
|--------|---------|--------------|
| **hugegraph-struct** | Shared data structures, type definitions, serialization | Minimal (Guava, TinkerPop, serialization libs) |
| **hugegraph-core** | Graph engine, traversal, transactions, storage abstraction | hugegraph-struct + heavy libs (Jersey, JRaft, K8s) |
| **hugegraph-pd** | Metadata coordination, service discovery | hugegraph-struct only |
| **hugegraph-store** | Distributed storage with Raft | hugegraph-struct only |
### Dependency Architecture
```
hugegraph-struct (foundational)
┌──────────────────┼──────────────────┐
│ │ │
hugegraph-pd hugegraph-store hugegraph-core
│ │ │
└──────────────────┼──────────────────┘
hugegraph-server (REST API)
```
**Build Order**:
```bash
# 1. Build struct first (required dependency)
mvn install -pl hugegraph-struct -am -DskipTests
# 2. Then build dependent modules
mvn install -pl hugegraph-pd -am -DskipTests
mvn install -pl hugegraph-store -am -DskipTests
mvn install -pl hugegraph-server -am -DskipTests
```
### Migration Plan
**Current Status (Transition Period)**:
Both `hugegraph-struct` and `hugegraph-core` contain similar data structures for backward compatibility. This is a **temporary state** during the migration period.
**Future Direction**:
- **hugegraph-struct**: Will become the **single source of truth** for all data structure definitions
- ⚠️ **hugegraph-core**: Data structure definitions will be **gradually removed** and replaced with references to hugegraph-struct
- 🎯 **End Goal**: hugegraph-core will only contain graph engine logic and depend on hugegraph-struct for all type definitions
**Migration Strategy**:
1. **Phase 1 (Current)**: Both modules coexist; new features use struct
2. **Phase 2 (In Progress)**: Gradually migrate core's data structures to import from struct
3. **Phase 3 (Future)**: Remove duplicate definitions from core completely
**Example Migration**:
```java
// OLD (hugegraph-core)
import org.apache.hugegraph.schema.SchemaElement; // ❌ Will be deprecated
// NEW (hugegraph-struct)
import org.apache.hugegraph.struct.schema.SchemaElement; // ✅ Use this
```
### Developer Guide
#### When to Use hugegraph-struct
Use struct when:
- Building distributed components (PD, Store)
- Defining data transfer objects (DTOs) for RPC
- Implementing serialization/deserialization logic
- Working with type definitions, schema elements, or IDs
- Creating shared utilities needed across modules
#### When to Use hugegraph-core
Use core when:
- Implementing graph engine features
- Working with TinkerPop API (Gremlin traversal)
- Managing transactions or backend storage
- Implementing graph algorithms or jobs
- Building server-side business logic
#### Adding New Data Structures
**Rule**: All new shared data structures should go into `hugegraph-struct`, not `hugegraph-core`.
Example:
```java
// ✅ Correct: Add to hugegraph-struct/src/main/java/org/apache/hugegraph/struct/
public class NewSchemaType extends SchemaElement {
// Pure data structure, no HugeGraph dependency
}
// ❌ Wrong: Don't add to hugegraph-core unless it's graph engine logic
```
#### Modifying Existing Structures
If you need to modify a data structure:
1. **Check if it exists in struct**: Modify the struct version
2. **If it only exists in core**: Consider migrating it to struct first
3. **Update serialization**: Ensure binary compatibility or provide migration
### Package Structure
```
org.apache.hugegraph/
├── struct/schema/ # Schema definitions (VertexLabel, EdgeLabel, etc.)
├── structure/ # Graph elements (BaseVertex, BaseEdge, BaseProperty)
├── type/ # Type system (HugeType, DataType, IdStrategy)
├── id/ # ID generation and management
├── serializer/ # Binary serialization (BytesBuffer, BinarySerializer)
├── query/ # Query abstractions (Query, ConditionQuery, Aggregate)
├── analyzer/ # Text analyzers (8 Chinese NLP implementations)
├── auth/ # Auth utilities (JWT, constants)
├── backend/ # Backend abstractions (BinaryId, BackendColumn, Shard)
├── options/ # Configuration options
├── util/ # Utilities (encoding, compression, collections)
└── exception/ # Exception hierarchy
```
### Key Design Principles
1. **Stateless**: No `HugeGraph` instance dependencies in struct
2. **Minimal Dependencies**: Only essential libraries (no Jersey, JRaft, K8s)
3. **Serialization-Friendly**: All structures support binary serialization
4. **Type Safety**: Strong typing for distributed RPC communication
5. **Backward Compatible**: Careful versioning to avoid breaking changes
### Building and Testing
```bash
# Build struct module
mvn clean install -DskipTests
# Build with tests (when tests are added)
mvn clean install
# From parent directory
cd /path/to/hugegraph
mvn install -pl hugegraph-struct -am -DskipTests
```
### Contributing
When contributing to hugegraph-struct:
1. **No Business Logic**: Keep it pure data structures
2. **No Graph Instances**: Avoid `HugeGraph graph` fields
3. **Document Changes**: Update AGENTS.md if adding new packages
4. **Binary Compatibility**: Consider serialization impact
5. **Minimal Dependencies**: Justify any new dependency additions
### License
Apache License 2.0 - See [LICENSE](../LICENSE) file for details.