hugegraph-ml
is a tool that integrates HugeGraph with popular graph learning libraries. It implements most graph learning algorithms, enabling users to perform end-to-end graph learning workflows directly from HugeGraph using hugegraph-ml
. Graph data can be read directly from HugeGraph
and used for tasks such as node embedding, node classification, and graph classification. The implemented algorithm models can be found in the models folder.
Start the HugeGraph database, you can do it via Docker/Binary packages. Refer to docker-link & deploy-doc for guidance
Clone this project
git clone https://github.com/apache/incubator-hugegraph-ai.git
Install hugegraph-python-client and hugegraph-ml
uv venv && source .venv/bin/activate # create and activate virtual environment cd ./hugegraph-ml/ # navigate to the hugegraph-ml directory uv pip install . # install dependencies using uv
Enter the project directory
cd ./hugegraph-ml/src
Cora
dataset using the DGI
modelMake sure that the Cora dataset is already in your HugeGraph database. If not, you can run the import_graph_from_dgl
function to import the Cora
dataset from DGL
into the HugeGraph
database.
from hugegraph_ml.utils.dgl2hugegraph_utils import import_graph_from_dgl import_graph_from_dgl("cora")
Run dgi_example.py to view the example.
python ./hugegraph_ml/examples/dgi_example.py
The specific process is as follows:
1. Graph data convert
Convert the graph from HugeGraph
to DGL
format.
from hugegraph_ml.data.hugegraph2dgl import HugeGraph2DGL from hugegraph_ml.models.dgi import DGI from hugegraph_ml.models.mlp import MLPClassifier from hugegraph_ml.tasks.node_classify import NodeClassify from hugegraph_ml.tasks.node_embed import NodeEmbed hg2d = HugeGraph2DGL() graph = hg2d.convert_graph(vertex_label="CORA_vertex", edge_label="CORA_edge")
2. Select model instance
model = DGI(n_in_feats=graph.ndata["feat"].shape[1])
3. Train model and node embedding
node_embed_task = NodeEmbed(graph=graph, model=model) embedded_graph = node_embed_task.train_and_embed(add_self_loop=True, n_epochs=300, patience=30)
4. Downstream tasks node classification using MLP
model = MLPClassifier( n_in_feat=embedded_graph.ndata["feat"].shape[1], n_out_feat=embedded_graph.ndata["label"].unique().shape[0] ) node_clf_task = NodeClassify(graph=embedded_graph, model=model) node_clf_task.train(lr=1e-3, n_epochs=400, patience=40) print(node_clf_task.evaluate())
5. Obtain the metrics
{'accuracy': 0.82, 'loss': 0.5714246034622192}
Cora
dataset using the GRAND
model.You can refer to the example in the grand_example.py
from hugegraph_ml.data.hugegraph2dgl import HugeGraph2DGL from hugegraph_ml.models.grand import GRAND from hugegraph_ml.tasks.node_classify import NodeClassify hg2d = HugeGraph2DGL() graph = hg2d.convert_graph(vertex_label="CORA_vertex", edge_label="CORA_edge") model = GRAND( n_in_feats=graph.ndata["feat"].shape[1], n_out_feats=graph.ndata["label"].unique().shape[0] ) node_clf_task = NodeClassify(graph, model) node_clf_task.train(lr=1e-2, weight_decay=5e-4, n_epochs=2000, patience=100) print(node_clf_task.evaluate())