hugegraph-llm/README.md - incubator-hugegraph-ai - Git at Google

 # hugegraph-llm

 ## Summary

 The `hugegraph-llm` is a tool for the implementation and research related to large language models.
 This project includes runnable demos, it can also be used as a third-party library.

 As we know, graph systems can help large models address challenges like timeliness and hallucination,
 while large models can help graph systems with cost-related issues.

 With this project, we aim to reduce the cost of using graph systems, and decrease the complexity of
 building knowledge graphs. This project will offer more applications and integration solutions for
 graph systems and large language models.
 1.  Construct knowledge graph by LLM + HugeGraph
 2.  Use natural language to operate graph databases (gremlin)
 3.  Knowledge graph supplements answer context (RAG)

 ## Environment Requirements

 - python 3.8+
 - hugegraph 1.0.0+

 ## Preparation

 - Start the HugeGraph database, you can do it via Docker. Refer to [docker-link](https://hub.docker.com/r/hugegraph/hugegraph) & [deploy-doc](https://hugegraph.apache.org/docs/quickstart/hugegraph-server/#31-use-docker-container-convenient-for-testdev) for guidance
 - Start the gradio interactive demo, you can start with the following command, and open http://127.0.0.1:8001 after starting
     ```bash
     # 0. clone the hugegraph-ai project & enter the root dir
     # 1. configure the environment path
     PROJECT_ROOT_DIR = "/path/to/hugegraph-ai" # root directory of hugegraph-ai
     export PYTHONPATH=${PROJECT_ROOT_DIR}/hugegraph-llm/src:${PROJECT_ROOT_DIR}/hugegraph-python-client/src

     # 2. install the required packages/deps (better to use virtualenv(venv) to manage the environment)
     cd hugegraph-llm
     pip install -r requirements.txt # ensure the python/pip version is satisfied
     # 2.1 set basic configs in the hugegraph-llm/config/config.ini (Optional, you can also set it in gradio)

     # 3. start the gradio server, wait for some time to initialize
     python3 ./src/hugegraph_llm/utils/gradio_demo.py
    ```
 - Configure HugeGraph database connection information & LLM information in the gradio interface,
   click on `Initialize configs`, the complete and initialized configuration file will be overwritten.
 - offline download NLTK stopwords
     ```bash
     python3 ./src/hugegraph_llm/operators/common_op/nltk_helper.py
     ```

 ## Examples

 ### 1.Build a knowledge graph in HugeGraph through LLM

 Run example like `python3 ./hugegraph-llm/examples/build_kg_test.py`

 The `KgBuilder` class is used to construct a knowledge graph. Here is a brief usage guide:

 1. **Initialization**: The `KgBuilder` class is initialized with an instance of a language model. This can be obtained from the `LLMs` class.

     ```python
     from hugegraph_llm.llms.init_llm import LLMs
     from hugegraph_llm.operators.kg_construction_task import KgBuilder

     TEXT = ""
     builder = KgBuilder(LLMs().get_llm())
     (
         builder
         .import_schema(from_hugegraph="talent_graph").print_result()
         .extract_triples(TEXT).print_result()
         .disambiguate_word_sense().print_result()
         .commit_to_hugegraph()
         .run()
     )
     ```

 2. **Import Schema**: The `import_schema` method is used to import a schema from a source. The source can be a HugeGraph instance, a user-defined schema or an extraction result. The method `print_result` can be chained to print the result.

     ```python
     # Import schema from a HugeGraph instance
     import_schema(from_hugegraph="xxx").print_result()
     # Import schema from an extraction result
     import_schema(from_extraction="xxx").print_result()
     # Import schema from user-defined schema
     import_schema(from_user_defined="xxx").print_result()
     ```

 3. **Extract Triples**: The `extract_triples` method is used to extract triples from a text. The text should be passed as a string argument to the method.

     ```python
     TEXT = "Meet Sarah, a 30-year-old attorney, and her roommate, James, whom she's shared a home with since 2010."
     extract_triples(TEXT).print_result()
     ```

 4. **Disambiguate Word Sense**: The `disambiguate_word_sense` method is used to disambiguate the sense of words in the extracted triples.

     ```python
     disambiguate_word_sense().print_result()
     ```

 5. **Commit to HugeGraph**: The `commit_to_hugegraph` method is used to commit the constructed knowledge graph to a HugeGraph instance.

     ```python
     commit_to_hugegraph().print_result()
     ```

 6. **Run**: The `run` method is used to execute the chained operations.

     ```python
     run()
     ```

 The methods of the `KgBuilder` class can be chained together to perform a sequence of operations.

 ### 2. Retrieval augmented generation (RAG) based on HugeGraph

 Run example like `python3 ./hugegraph-llm/examples/graph_rag_test.py`

 The `GraphRAG` class is used to integrate HugeGraph with large language models to provide retrieval-augmented generation capabilities.
 Here is a brief usage guide:

 1. **Extract Keyword:**: Extract keywords and expand synonyms.

     ```python
     graph_rag.extract_keyword(text="Tell me about Al Pacino.").print_result()
     ```

 2. **Query Graph for Rag**: Retrieve the corresponding keywords and their multi-degree associated relationships from HugeGraph.

      ```python
      graph_rag.query_graph_for_rag(
         max_deep=2,
         max_items=30
      ).print_result()
      ```
 3. **Synthesize Answer**: Summarize the results and organize the language to answer the question.

     ```python
     graph_rag.synthesize_answer().print_result()
     ```

 4. **Run**: The `run` method is used to execute the above operations.

     ```python
     graph_rag.run(verbose=True)
     ```
	# hugegraph-llm

	## Summary

	The `hugegraph-llm` is a tool for the implementation and research related to large language models.
	This project includes runnable demos, it can also be used as a third-party library.

	As we know, graph systems can help large models address challenges like timeliness and hallucination,
	while large models can help graph systems with cost-related issues.

	With this project, we aim to reduce the cost of using graph systems, and decrease the complexity of
	building knowledge graphs. This project will offer more applications and integration solutions for
	graph systems and large language models.
	1. Construct knowledge graph by LLM + HugeGraph
	2. Use natural language to operate graph databases (gremlin)
	3. Knowledge graph supplements answer context (RAG)

	## Environment Requirements

	- python 3.8+
	- hugegraph 1.0.0+

	## Preparation

	- Start the HugeGraph database, you can do it via Docker. Refer to [docker-link](https://hub.docker.com/r/hugegraph/hugegraph) & [deploy-doc](https://hugegraph.apache.org/docs/quickstart/hugegraph-server/#31-use-docker-container-convenient-for-testdev) for guidance
	- Start the gradio interactive demo, you can start with the following command, and open http://127.0.0.1:8001 after starting
	```bash
	# 0. clone the hugegraph-ai project & enter the root dir
	# 1. configure the environment path
	PROJECT_ROOT_DIR = "/path/to/hugegraph-ai" # root directory of hugegraph-ai
	export PYTHONPATH=${PROJECT_ROOT_DIR}/hugegraph-llm/src:${PROJECT_ROOT_DIR}/hugegraph-python-client/src

	# 2. install the required packages/deps (better to use virtualenv(venv) to manage the environment)
	cd hugegraph-llm
	pip install -r requirements.txt # ensure the python/pip version is satisfied
	# 2.1 set basic configs in the hugegraph-llm/config/config.ini (Optional, you can also set it in gradio)

	# 3. start the gradio server, wait for some time to initialize
	python3 ./src/hugegraph_llm/utils/gradio_demo.py
	```
	- Configure HugeGraph database connection information & LLM information in the gradio interface,
	click on `Initialize configs`, the complete and initialized configuration file will be overwritten.
	- offline download NLTK stopwords
	```bash
	python3 ./src/hugegraph_llm/operators/common_op/nltk_helper.py
	```

	## Examples

	### 1.Build a knowledge graph in HugeGraph through LLM

	Run example like `python3 ./hugegraph-llm/examples/build_kg_test.py`

	The `KgBuilder` class is used to construct a knowledge graph. Here is a brief usage guide:

	1. Initialization: The `KgBuilder` class is initialized with an instance of a language model. This can be obtained from the `LLMs` class.

	```python
	from hugegraph_llm.llms.init_llm import LLMs
	from hugegraph_llm.operators.kg_construction_task import KgBuilder

	TEXT = ""
	builder = KgBuilder(LLMs().get_llm())
	(
	builder
	.import_schema(from_hugegraph="talent_graph").print_result()
	.extract_triples(TEXT).print_result()
	.disambiguate_word_sense().print_result()
	.commit_to_hugegraph()
	.run()
	)
	```

	2. Import Schema: The `import_schema` method is used to import a schema from a source. The source can be a HugeGraph instance, a user-defined schema or an extraction result. The method `print_result` can be chained to print the result.

	```python
	# Import schema from a HugeGraph instance
	import_schema(from_hugegraph="xxx").print_result()
	# Import schema from an extraction result
	import_schema(from_extraction="xxx").print_result()
	# Import schema from user-defined schema
	import_schema(from_user_defined="xxx").print_result()
	```

	3. Extract Triples: The `extract_triples` method is used to extract triples from a text. The text should be passed as a string argument to the method.

	```python
	TEXT = "Meet Sarah, a 30-year-old attorney, and her roommate, James, whom she's shared a home with since 2010."
	extract_triples(TEXT).print_result()
	```

	4. Disambiguate Word Sense: The `disambiguate_word_sense` method is used to disambiguate the sense of words in the extracted triples.

	```python
	disambiguate_word_sense().print_result()
	```

	5. Commit to HugeGraph: The `commit_to_hugegraph` method is used to commit the constructed knowledge graph to a HugeGraph instance.

	```python
	commit_to_hugegraph().print_result()
	```

	6. Run: The `run` method is used to execute the chained operations.

	```python
	run()
	```

	The methods of the `KgBuilder` class can be chained together to perform a sequence of operations.

	### 2. Retrieval augmented generation (RAG) based on HugeGraph

	Run example like `python3 ./hugegraph-llm/examples/graph_rag_test.py`

	The `GraphRAG` class is used to integrate HugeGraph with large language models to provide retrieval-augmented generation capabilities.
	Here is a brief usage guide:

	1. Extract Keyword:: Extract keywords and expand synonyms.

	```python
	graph_rag.extract_keyword(text="Tell me about Al Pacino.").print_result()
	```

	2. Query Graph for Rag: Retrieve the corresponding keywords and their multi-degree associated relationships from HugeGraph.

	```python
	graph_rag.query_graph_for_rag(
	max_deep=2,
	max_items=30
	).print_result()
	```
	3. Synthesize Answer: Summarize the results and organize the language to answer the question.

	```python
	graph_rag.synthesize_answer().print_result()
	```

	4. Run: The `run` method is used to execute the above operations.

	```python
	graph_rag.run(verbose=True)
	```