| # hugegraph-llm |
| |
| ## Summary |
| |
| The `hugegraph-llm` is a tool for the implementation and research related to large language models. |
| This project includes runnable demos, it can also be used as a third-party library. |
| |
| As we know, graph systems can help large models address challenges like timeliness and hallucination, |
| while large models can help graph systems with cost-related issues. |
| |
| With this project, we aim to reduce the cost of using graph systems, and decrease the complexity of |
| building knowledge graphs. This project will offer more applications and integration solutions for |
| graph systems and large language models. |
| 1. Construct knowledge graph by LLM + HugeGraph |
| 2. Use natural language to operate graph databases (gremlin) |
| 3. Knowledge graph supplements answer context (RAG) |
| |
| ## Environment Requirements |
| |
| - python 3.8+ |
| - hugegraph 1.0.0+ |
| |
| ## Preparation |
| |
| - Start the HugeGraph database, you can do it via Docker. Refer to [docker-link](https://hub.docker.com/r/hugegraph/hugegraph) & [deploy-doc](https://hugegraph.apache.org/docs/quickstart/hugegraph-server/#31-use-docker-container-convenient-for-testdev) for guidance |
| - Start the gradio interactive demo, you can start with the following command, and open http://127.0.0.1:8001 after starting |
| ```bash |
| # 0. clone the hugegraph-ai project & enter the root dir |
| # 1. configure the environment path |
| PROJECT_ROOT_DIR = "/path/to/hugegraph-ai" # root directory of hugegraph-ai |
| export PYTHONPATH=${PROJECT_ROOT_DIR}/hugegraph-llm/src:${PROJECT_ROOT_DIR}/hugegraph-python-client/src |
| |
| # 2. install the required packages/deps (better to use virtualenv(venv) to manage the environment) |
| cd hugegraph-llm |
| pip install -r requirements.txt # ensure the python/pip version is satisfied |
| # 2.1 set basic configs in the hugegraph-llm/config/config.ini (Optional, you can also set it in gradio) |
| |
| # 3. start the gradio server, wait for some time to initialize |
| python3 ./src/hugegraph_llm/utils/gradio_demo.py |
| ``` |
| - Configure HugeGraph database connection information & LLM information in the gradio interface, |
| click on `Initialize configs`, the complete and initialized configuration file will be overwritten. |
| - offline download NLTK stopwords |
| ```bash |
| python3 ./src/hugegraph_llm/operators/common_op/nltk_helper.py |
| ``` |
| |
| ## Examples |
| |
| ### 1.Build a knowledge graph in HugeGraph through LLM |
| |
| Run example like `python3 ./hugegraph-llm/examples/build_kg_test.py` |
| |
| The `KgBuilder` class is used to construct a knowledge graph. Here is a brief usage guide: |
| |
| 1. **Initialization**: The `KgBuilder` class is initialized with an instance of a language model. This can be obtained from the `LLMs` class. |
| |
| ```python |
| from hugegraph_llm.llms.init_llm import LLMs |
| from hugegraph_llm.operators.kg_construction_task import KgBuilder |
| |
| TEXT = "" |
| builder = KgBuilder(LLMs().get_llm()) |
| ( |
| builder |
| .import_schema(from_hugegraph="talent_graph").print_result() |
| .extract_triples(TEXT).print_result() |
| .disambiguate_word_sense().print_result() |
| .commit_to_hugegraph() |
| .run() |
| ) |
| ``` |
| |
| 2. **Import Schema**: The `import_schema` method is used to import a schema from a source. The source can be a HugeGraph instance, a user-defined schema or an extraction result. The method `print_result` can be chained to print the result. |
| |
| ```python |
| # Import schema from a HugeGraph instance |
| import_schema(from_hugegraph="xxx").print_result() |
| # Import schema from an extraction result |
| import_schema(from_extraction="xxx").print_result() |
| # Import schema from user-defined schema |
| import_schema(from_user_defined="xxx").print_result() |
| ``` |
| |
| 3. **Extract Triples**: The `extract_triples` method is used to extract triples from a text. The text should be passed as a string argument to the method. |
| |
| ```python |
| TEXT = "Meet Sarah, a 30-year-old attorney, and her roommate, James, whom she's shared a home with since 2010." |
| extract_triples(TEXT).print_result() |
| ``` |
| |
| 4. **Disambiguate Word Sense**: The `disambiguate_word_sense` method is used to disambiguate the sense of words in the extracted triples. |
| |
| ```python |
| disambiguate_word_sense().print_result() |
| ``` |
| |
| 5. **Commit to HugeGraph**: The `commit_to_hugegraph` method is used to commit the constructed knowledge graph to a HugeGraph instance. |
| |
| ```python |
| commit_to_hugegraph().print_result() |
| ``` |
| |
| 6. **Run**: The `run` method is used to execute the chained operations. |
| |
| ```python |
| run() |
| ``` |
| |
| The methods of the `KgBuilder` class can be chained together to perform a sequence of operations. |
| |
| ### 2. Retrieval augmented generation (RAG) based on HugeGraph |
| |
| Run example like `python3 ./hugegraph-llm/examples/graph_rag_test.py` |
| |
| The `GraphRAG` class is used to integrate HugeGraph with large language models to provide retrieval-augmented generation capabilities. |
| Here is a brief usage guide: |
| |
| 1. **Extract Keyword:**: Extract keywords and expand synonyms. |
| |
| ```python |
| graph_rag.extract_keyword(text="Tell me about Al Pacino.").print_result() |
| ``` |
| |
| 2. **Query Graph for Rag**: Retrieve the corresponding keywords and their multi-degree associated relationships from HugeGraph. |
| |
| ```python |
| graph_rag.query_graph_for_rag( |
| max_deep=2, |
| max_items=30 |
| ).print_result() |
| ``` |
| 3. **Synthesize Answer**: Summarize the results and organize the language to answer the question. |
| |
| ```python |
| graph_rag.synthesize_answer().print_result() |
| ``` |
| |
| 4. **Run**: The `run` method is used to execute the above operations. |
| |
| ```python |
| graph_rag.run(verbose=True) |
| ``` |