hugegraph-ml is a tool that integrates HugeGraph with popular graph learning libraries. It implements most graph learning algorithms, enabling users to perform end-to-end graph learning workflows directly from HugeGraph using hugegraph-ml. Graph data can be read directly from HugeGraph and used for tasks such as node embedding, node classification, and graph classification. The implemented algorithm models can be found in the models folder.
Start the HugeGraph database, you can do it via Docker/Binary packages. Refer to docker-link & deploy-doc for guidance
Clone this project
git clone https://github.com/apache/incubator-hugegraph-ai.git
Install hugegraph-python-client and hugegraph-ml
cd ./incubator-hugegraph-ai # better to use virtualenv (source venv/bin/activate) pip install ./hugegraph-python-client cd ./hugegraph-ml/ pip install -e .
Enter the project directory
cd ./hugegraph-ml/src
Cora dataset using the DGI modelMake sure that the Cora dataset is already in your HugeGraph database. If not, you can run the import_graph_from_dgl function to import the Cora dataset from DGL into the HugeGraph database.
from hugegraph_ml.utils.dgl2hugegraph_utils import import_graph_from_dgl import_graph_from_dgl("cora")
Run dgi_example.py to view the example.
python ./hugegraph_ml/examples/dgi_example.py
The specific process is as follows:
1. Graph data convert
Convert the graph from HugeGraph to DGL format.
from hugegraph_ml.data.hugegraph2dgl import HugeGraph2DGL from hugegraph_ml.models.dgi import DGI from hugegraph_ml.models.mlp import MLPClassifier from hugegraph_ml.tasks.node_classify import NodeClassify from hugegraph_ml.tasks.node_embed import NodeEmbed hg2d = HugeGraph2DGL() graph = hg2d.convert_graph(vertex_label="CORA_vertex", edge_label="CORA_edge")
2. Select model instance
model = DGI(n_in_feats=graph.ndata["feat"].shape[1])
3. Train model and node embedding
node_embed_task = NodeEmbed(graph=graph, model=model) embedded_graph = node_embed_task.train_and_embed(add_self_loop=True, n_epochs=300, patience=30)
4. Downstream tasks node classification using MLP
model = MLPClassifier( n_in_feat=embedded_graph.ndata["feat"].shape[1], n_out_feat=embedded_graph.ndata["label"].unique().shape[0] ) node_clf_task = NodeClassify(graph=embedded_graph, model=model) node_clf_task.train(lr=1e-3, n_epochs=400, patience=40) print(node_clf_task.evaluate())
5. Obtain the metrics
{'accuracy': 0.82, 'loss': 0.5714246034622192}
Cora dataset using the GRAND model.You can refer to the example in the grand_example.py
from hugegraph_ml.data.hugegraph2dgl import HugeGraph2DGL from hugegraph_ml.models.grand import GRAND from hugegraph_ml.tasks.node_classify import NodeClassify hg2d = HugeGraph2DGL() graph = hg2d.convert_graph(vertex_label="CORA_vertex", edge_label="CORA_edge") model = GRAND( n_in_feats=graph.ndata["feat"].shape[1], n_out_feats=graph.ndata["label"].unique().shape[0] ) node_clf_task = NodeClassify(graph, model) node_clf_task.train(lr=1e-2, weight_decay=5e-4, n_epochs=2000, patience=100) print(node_clf_task.evaluate())