AGENTS.md

This file provides guidance to an AI coding tool when working with code in this repository.

Module Overview

hugegraph-struct is a foundational data structures module that defines the core abstractions shared across HugeGraph distributed components. This module must be built before hugegraph-pd and hugegraph-store as they depend on its structure definitions.

Key Responsibilities:

Schema element definitions (VertexLabel, EdgeLabel, PropertyKey, IndexLabel)
Graph element structures (BaseVertex, BaseEdge, BaseProperty)
Binary serialization/deserialization for efficient storage and RPC
Type system definitions (HugeType enum, data types, ID strategies)
Query abstractions (Query, ConditionQuery, IdQuery, Aggregate)
Chinese text analyzers (multiple implementations: Jieba, IK, HanLP, etc.)
Authentication utilities (JWT token generation, constants)

Build Commands

Building This Module

# From hugegraph-struct directory
mvn clean install -DskipTests

# Build with tests (if any exist in future)
mvn clean install

# From parent directory (hugegraph root)
mvn install -pl hugegraph-struct -am -DskipTests

Dependency Chain

This module is a critical dependency for distributed components:

# Correct build order for distributed components:
# 1. Build hugegraph-struct first
mvn install -pl hugegraph-struct -am -DskipTests

# 2. Then build PD
mvn clean package -pl hugegraph-pd -am -DskipTests

# 3. Then build Store
mvn clean package -pl hugegraph-store -am -DskipTests

Code Architecture

Package Structure

org.apache.hugegraph/
├── struct/schema/          # Schema element definitions
│   ├── SchemaElement       # Base class for all schema types
│   ├── VertexLabel         # Vertex label definitions
│   ├── EdgeLabel           # Edge label definitions
│   ├── PropertyKey         # Property key definitions
│   ├── IndexLabel          # Index label definitions
│   └── builder/            # Builder pattern implementations
├── structure/              # Graph element structures
│   ├── BaseElement         # Base class for vertices/edges
│   ├── BaseVertex          # Vertex implementation
│   ├── BaseEdge            # Edge implementation
│   ├── BaseProperty        # Property implementation
│   └── builder/            # Element builders
├── type/                   # Type system
│   ├── HugeType            # Enum for all graph types (VERTEX, EDGE, etc.)
│   ├── GraphType           # Type interface
│   ├── Namifiable          # Name-based types
│   ├── Idfiable            # ID-based types
│   └── define/             # Type definitions (DataType, IdStrategy, etc.)
├── id/                     # ID generation and management
│   ├── Id                  # ID interface
│   ├── IdGenerator         # ID generation utilities
│   ├── EdgeId              # Edge-specific ID handling
│   └── IdUtil              # ID utility methods
├── serializer/             # Binary serialization
│   ├── BytesBuffer         # Buffer for binary I/O
│   ├── BinaryElementSerializer    # Element serialization
│   └── DirectBinarySerializer     # Direct binary access
├── query/                  # Query abstractions
│   ├── Query               # Base query interface
│   ├── ConditionQuery      # Conditional queries
│   ├── IdQuery             # ID-based queries
│   ├── Condition           # Query conditions
│   └── Aggregate           # Aggregation queries
├── analyzer/               # Text analyzers (Chinese NLP)
│   ├── Analyzer            # Base analyzer interface
│   ├── AnalyzerFactory     # Factory for creating analyzers
│   ├── IKAnalyzer          # IK Chinese word segmentation
│   ├── JiebaAnalyzer       # Jieba segmentation
│   ├── HanLPAnalyzer       # HanLP NLP
│   ├── AnsjAnalyzer        # Ansj segmentation
│   ├── WordAnalyzer        # Word-based analysis
│   ├── JcsegAnalyzer       # Jcseg segmentation
│   ├── MMSeg4JAnalyzer     # MMSeg4J segmentation
│   └── SmartCNAnalyzer     # Lucene SmartCN
├── auth/                   # Authentication utilities
│   ├── TokenGenerator      # JWT token generation
│   └── AuthConstant        # Auth constants
├── backend/                # Backend abstractions
│   ├── BinaryId            # Binary ID representation
│   ├── BackendColumn       # Column abstraction
│   └── Shard               # Shard information
├── options/                # Configuration options
│   ├── CoreOptions         # Core configuration
│   └── AuthOptions         # Auth configuration
├── util/                   # Utilities
│   ├── StringEncoding      # String encoding utilities
│   ├── GraphUtils          # Graph utility methods
│   ├── LZ4Util             # LZ4 compression
│   ├── Blob                # Binary blob handling
│   └── collection/         # Collection utilities (IdSet, CollectionFactory)
└── exception/              # Exception hierarchy
    ├── HugeException       # Base exception
    ├── BackendException    # Backend errors
    ├── NotSupportException # Unsupported operations
    ├── NotFoundException   # Not found errors
    └── NotAllowException   # Permission errors

Key Architectural Concepts

1. Two-Layer Schema System

The module defines a dual schema hierarchy:

struct.schema.*: Schema element definitions (VertexLabel, EdgeLabel, etc.) - these are metadata about the graph structure
structure.*: Actual graph elements (BaseVertex, BaseEdge, etc.) - these are data instances

The schema layer defines the “blueprint” while the structure layer implements the “instances”.

2. Type System

The HugeType enum (type/HugeType.java) defines all possible types:

Schema types: VERTEX_LABEL, EDGE_LABEL, PROPERTY_KEY, INDEX_LABEL
Data types: VERTEX, EDGE, PROPERTY, AGGR_PROPERTY_V, AGGR_PROPERTY_E
Special types: META, COUNTER, TASK, OLAP, INDEX

3. ID Management

IDs are critical for distributed systems:

Id interface provides abstraction over different ID types
IdGenerator creates IDs based on strategy (AUTO_INCREMENT, PRIMARY_KEY, CUSTOMIZE)
EdgeId uses special encoding: source vertex ID + edge label ID + sort values + target vertex ID
Binary serialization optimizes ID storage

4. Binary Serialization

BytesBuffer and serializers enable:

Efficient storage in RocksDB and other backends
Fast gRPC message passing between PD/Store/Server
Compact on-disk and in-memory representation

5. Query Abstraction

Query classes provide backend-agnostic query building:

Query: Base interface with limit, offset, ordering
ConditionQuery: Supports conditions (EQ, GT, LT, IN, CONTAINS, etc.)
IdQuery: Direct ID-based lookups
Aggregate: Aggregation operations (SUM, MAX, MIN, AVG)

Dependencies

Critical Dependencies

hg-pd-client (${project.version}): PD client for metadata coordination
hugegraph-common (${project.version}): Shared utilities
Apache TinkerPop 3.5.1: Graph computing framework
Guava 25.1-jre: Google utilities
Eclipse Collections 10.4.0: High-performance collections
fastutil 8.1.0: Fast primitive collections

Text Analysis Dependencies

Multiple Chinese NLP libraries for different use cases:

jieba-analysis 1.0.2: Popular Chinese word segmentation
IKAnalyzer 2012_u6: IK word segmentation
HanLP portable-1.5.0: Natural language processing
Ansj 5.1.6: Ansj segmentation
Word 1.3: APDPlat word segmentation
Jcseg 2.2.0: Jcseg segmentation
mmseg4j-core 1.10.0: MMSeg4J segmentation
lucene-analyzers-smartcn 7.4.0: Lucene SmartCN

Security Dependencies

jjwt-api/impl/jackson 0.11.2: JWT token handling
jbcrypt 0.4: Password hashing

Development Notes

When Modifying This Module

Understand the impact: Changes here affect hugegraph-pd, hugegraph-store, and hugegraph-server
Rebuild dependent modules: After modifying, rebuild PD and Store modules
Binary compatibility: Serialization changes require careful version migration
ID changes: Modifying ID generation can break existing data

Working with Schema Elements

When adding or modifying schema elements in struct/schema/:

Extend SchemaElement base class
Implement required interfaces (Namifiable, Typifiable)
Add corresponding HugeType enum value if needed
Update serialization logic in BinaryElementSerializer
Verify schema builder patterns in struct/schema/builder/

Working with Binary Serialization

When modifying serialization:

Changes to BytesBuffer format require version migration
Test with all backends (RocksDB, HStore)
Ensure backward compatibility or provide migration path
Update both write and read paths consistently

Adding Text Analyzers

To add a new text analyzer:

Implement the Analyzer interface in analyzer/
Register in AnalyzerFactory
Add dependency to pom.xml
Test with Chinese text queries

Common Patterns

Creating Schema Elements

// Schema elements use builders
PropertyKey propertyKey = schema.propertyKey("name")
                                .asText()
                                .valueSingle()
                                .create();

ID Generation

// Generate IDs based on strategy
Id id = IdGenerator.of(value, IdType.LONG);
Id edgeId = EdgeId.parse(sourceId, direction, label, sortValues, targetId);

Binary Serialization

// Write to buffer
BytesBuffer buffer = BytesBuffer.allocate(size);
buffer.writeId(id);
buffer.writeString(name);

// Read from buffer
Id id = buffer.readId();
String name = buffer.readString();

Cross-Module References

This module is referenced by:

hugegraph-pd: Uses schema definitions for metadata management
hugegraph-store: Uses serialization for storage and RPC
hugegraph-server/hugegraph-core: Uses all abstractions for graph operations
hugegraph-server/hugegraph-api: Uses structures for REST API serialization

License and Compliance

This module follows Apache Software Foundation guidelines:

All files must have Apache 2.0 license headers
Third-party dependencies require license documentation in install-dist/release-docs/licenses/
Excluded from Apache RAT: None (all source files checked)