| # AGENTS.md |
| |
| This file provides guidance to an AI coding tool when working with code in this repository. |
| |
| ## Module Overview |
| |
| **hugegraph-struct** is a foundational data structures module that defines the core abstractions shared across HugeGraph distributed components. This module **must be built before hugegraph-pd and hugegraph-store** as they depend on its structure definitions. |
| |
| **Key Responsibilities**: |
| - Schema element definitions (VertexLabel, EdgeLabel, PropertyKey, IndexLabel) |
| - Graph element structures (BaseVertex, BaseEdge, BaseProperty) |
| - Binary serialization/deserialization for efficient storage and RPC |
| - Type system definitions (HugeType enum, data types, ID strategies) |
| - Query abstractions (Query, ConditionQuery, IdQuery, Aggregate) |
| - Chinese text analyzers (multiple implementations: Jieba, IK, HanLP, etc.) |
| - Authentication utilities (JWT token generation, constants) |
| |
| ## Build Commands |
| |
| ### Building This Module |
| |
| ```bash |
| # From hugegraph-struct directory |
| mvn clean install -DskipTests |
| |
| # Build with tests (if any exist in future) |
| mvn clean install |
| |
| # From parent directory (hugegraph root) |
| mvn install -pl hugegraph-struct -am -DskipTests |
| ``` |
| |
| ### Dependency Chain |
| |
| This module is a **critical dependency** for distributed components: |
| |
| ```bash |
| # Correct build order for distributed components: |
| # 1. Build hugegraph-struct first |
| mvn install -pl hugegraph-struct -am -DskipTests |
| |
| # 2. Then build PD |
| mvn clean package -pl hugegraph-pd -am -DskipTests |
| |
| # 3. Then build Store |
| mvn clean package -pl hugegraph-store -am -DskipTests |
| ``` |
| |
| ## Code Architecture |
| |
| ### Package Structure |
| |
| ``` |
| org.apache.hugegraph/ |
| ├── struct/schema/ # Schema element definitions |
| │ ├── SchemaElement # Base class for all schema types |
| │ ├── VertexLabel # Vertex label definitions |
| │ ├── EdgeLabel # Edge label definitions |
| │ ├── PropertyKey # Property key definitions |
| │ ├── IndexLabel # Index label definitions |
| │ └── builder/ # Builder pattern implementations |
| ├── structure/ # Graph element structures |
| │ ├── BaseElement # Base class for vertices/edges |
| │ ├── BaseVertex # Vertex implementation |
| │ ├── BaseEdge # Edge implementation |
| │ ├── BaseProperty # Property implementation |
| │ └── builder/ # Element builders |
| ├── type/ # Type system |
| │ ├── HugeType # Enum for all graph types (VERTEX, EDGE, etc.) |
| │ ├── GraphType # Type interface |
| │ ├── Namifiable # Name-based types |
| │ ├── Idfiable # ID-based types |
| │ └── define/ # Type definitions (DataType, IdStrategy, etc.) |
| ├── id/ # ID generation and management |
| │ ├── Id # ID interface |
| │ ├── IdGenerator # ID generation utilities |
| │ ├── EdgeId # Edge-specific ID handling |
| │ └── IdUtil # ID utility methods |
| ├── serializer/ # Binary serialization |
| │ ├── BytesBuffer # Buffer for binary I/O |
| │ ├── BinaryElementSerializer # Element serialization |
| │ └── DirectBinarySerializer # Direct binary access |
| ├── query/ # Query abstractions |
| │ ├── Query # Base query interface |
| │ ├── ConditionQuery # Conditional queries |
| │ ├── IdQuery # ID-based queries |
| │ ├── Condition # Query conditions |
| │ └── Aggregate # Aggregation queries |
| ├── analyzer/ # Text analyzers (Chinese NLP) |
| │ ├── Analyzer # Base analyzer interface |
| │ ├── AnalyzerFactory # Factory for creating analyzers |
| │ ├── IKAnalyzer # IK Chinese word segmentation |
| │ ├── JiebaAnalyzer # Jieba segmentation |
| │ ├── HanLPAnalyzer # HanLP NLP |
| │ ├── AnsjAnalyzer # Ansj segmentation |
| │ ├── WordAnalyzer # Word-based analysis |
| │ ├── JcsegAnalyzer # Jcseg segmentation |
| │ ├── MMSeg4JAnalyzer # MMSeg4J segmentation |
| │ └── SmartCNAnalyzer # Lucene SmartCN |
| ├── auth/ # Authentication utilities |
| │ ├── TokenGenerator # JWT token generation |
| │ └── AuthConstant # Auth constants |
| ├── backend/ # Backend abstractions |
| │ ├── BinaryId # Binary ID representation |
| │ ├── BackendColumn # Column abstraction |
| │ └── Shard # Shard information |
| ├── options/ # Configuration options |
| │ ├── CoreOptions # Core configuration |
| │ └── AuthOptions # Auth configuration |
| ├── util/ # Utilities |
| │ ├── StringEncoding # String encoding utilities |
| │ ├── GraphUtils # Graph utility methods |
| │ ├── LZ4Util # LZ4 compression |
| │ ├── Blob # Binary blob handling |
| │ └── collection/ # Collection utilities (IdSet, CollectionFactory) |
| └── exception/ # Exception hierarchy |
| ├── HugeException # Base exception |
| ├── BackendException # Backend errors |
| ├── NotSupportException # Unsupported operations |
| ├── NotFoundException # Not found errors |
| └── NotAllowException # Permission errors |
| ``` |
| |
| ### Key Architectural Concepts |
| |
| #### 1. Two-Layer Schema System |
| |
| The module defines a dual schema hierarchy: |
| |
| - **`struct.schema.*`**: Schema element definitions (VertexLabel, EdgeLabel, etc.) - these are *metadata* about the graph structure |
| - **`structure.*`**: Actual graph elements (BaseVertex, BaseEdge, etc.) - these are *data* instances |
| |
| The schema layer defines the "blueprint" while the structure layer implements the "instances". |
| |
| #### 2. Type System |
| |
| The `HugeType` enum (type/HugeType.java) defines all possible types: |
| - Schema types: `VERTEX_LABEL`, `EDGE_LABEL`, `PROPERTY_KEY`, `INDEX_LABEL` |
| - Data types: `VERTEX`, `EDGE`, `PROPERTY`, `AGGR_PROPERTY_V`, `AGGR_PROPERTY_E` |
| - Special types: `META`, `COUNTER`, `TASK`, `OLAP`, `INDEX` |
| |
| #### 3. ID Management |
| |
| IDs are critical for distributed systems: |
| - `Id` interface provides abstraction over different ID types |
| - `IdGenerator` creates IDs based on strategy (AUTO_INCREMENT, PRIMARY_KEY, CUSTOMIZE) |
| - `EdgeId` uses special encoding: source vertex ID + edge label ID + sort values + target vertex ID |
| - Binary serialization optimizes ID storage |
| |
| #### 4. Binary Serialization |
| |
| `BytesBuffer` and serializers enable: |
| - Efficient storage in RocksDB and other backends |
| - Fast gRPC message passing between PD/Store/Server |
| - Compact on-disk and in-memory representation |
| |
| #### 5. Query Abstraction |
| |
| Query classes provide backend-agnostic query building: |
| - `Query`: Base interface with limit, offset, ordering |
| - `ConditionQuery`: Supports conditions (EQ, GT, LT, IN, CONTAINS, etc.) |
| - `IdQuery`: Direct ID-based lookups |
| - `Aggregate`: Aggregation operations (SUM, MAX, MIN, AVG) |
| |
| ## Dependencies |
| |
| ### Critical Dependencies |
| |
| - **hg-pd-client** (${project.version}): PD client for metadata coordination |
| - **hugegraph-common** (${project.version}): Shared utilities |
| - **Apache TinkerPop 3.5.1**: Graph computing framework |
| - **Guava 25.1-jre**: Google utilities |
| - **Eclipse Collections 10.4.0**: High-performance collections |
| - **fastutil 8.1.0**: Fast primitive collections |
| |
| ### Text Analysis Dependencies |
| |
| Multiple Chinese NLP libraries for different use cases: |
| - **jieba-analysis 1.0.2**: Popular Chinese word segmentation |
| - **IKAnalyzer 2012_u6**: IK word segmentation |
| - **HanLP portable-1.5.0**: Natural language processing |
| - **Ansj 5.1.6**: Ansj segmentation |
| - **Word 1.3**: APDPlat word segmentation |
| - **Jcseg 2.2.0**: Jcseg segmentation |
| - **mmseg4j-core 1.10.0**: MMSeg4J segmentation |
| - **lucene-analyzers-smartcn 7.4.0**: Lucene SmartCN |
| |
| ### Security Dependencies |
| |
| - **jjwt-api/impl/jackson 0.11.2**: JWT token handling |
| - **jbcrypt 0.4**: Password hashing |
| |
| ## Development Notes |
| |
| ### When Modifying This Module |
| |
| 1. **Understand the impact**: Changes here affect hugegraph-pd, hugegraph-store, and hugegraph-server |
| 2. **Rebuild dependent modules**: After modifying, rebuild PD and Store modules |
| 3. **Binary compatibility**: Serialization changes require careful version migration |
| 4. **ID changes**: Modifying ID generation can break existing data |
| |
| ### Working with Schema Elements |
| |
| When adding or modifying schema elements in `struct/schema/`: |
| - Extend `SchemaElement` base class |
| - Implement required interfaces (`Namifiable`, `Typifiable`) |
| - Add corresponding `HugeType` enum value if needed |
| - Update serialization logic in `BinaryElementSerializer` |
| - Verify schema builder patterns in `struct/schema/builder/` |
| |
| ### Working with Binary Serialization |
| |
| When modifying serialization: |
| - Changes to `BytesBuffer` format require version migration |
| - Test with all backends (RocksDB, HStore) |
| - Ensure backward compatibility or provide migration path |
| - Update both write and read paths consistently |
| |
| ### Adding Text Analyzers |
| |
| To add a new text analyzer: |
| 1. Implement the `Analyzer` interface in `analyzer/` |
| 2. Register in `AnalyzerFactory` |
| 3. Add dependency to pom.xml |
| 4. Test with Chinese text queries |
| |
| ## Common Patterns |
| |
| ### Creating Schema Elements |
| |
| ```java |
| // Schema elements use builders |
| PropertyKey propertyKey = schema.propertyKey("name") |
| .asText() |
| .valueSingle() |
| .create(); |
| ``` |
| |
| ### ID Generation |
| |
| ```java |
| // Generate IDs based on strategy |
| Id id = IdGenerator.of(value, IdType.LONG); |
| Id edgeId = EdgeId.parse(sourceId, direction, label, sortValues, targetId); |
| ``` |
| |
| ### Binary Serialization |
| |
| ```java |
| // Write to buffer |
| BytesBuffer buffer = BytesBuffer.allocate(size); |
| buffer.writeId(id); |
| buffer.writeString(name); |
| |
| // Read from buffer |
| Id id = buffer.readId(); |
| String name = buffer.readString(); |
| ``` |
| |
| ## Cross-Module References |
| |
| This module is referenced by: |
| - **hugegraph-pd**: Uses schema definitions for metadata management |
| - **hugegraph-store**: Uses serialization for storage and RPC |
| - **hugegraph-server/hugegraph-core**: Uses all abstractions for graph operations |
| - **hugegraph-server/hugegraph-api**: Uses structures for REST API serialization |
| |
| ## License and Compliance |
| |
| This module follows Apache Software Foundation guidelines: |
| - All files must have Apache 2.0 license headers |
| - Third-party dependencies require license documentation in `install-dist/release-docs/licenses/` |
| - Excluded from Apache RAT: None (all source files checked) |