This application allows you to search arXiv for PDFs or import arbitrary PDF files and search over them using LLMs. For each file, the text is divided in chunks that are embedded with OpenAI and stored in Weaviate. When you query the system, the most relevant chunks are retrieved and a summary answer is generated using OpenAI.
The ingestion and retrieval steps are implemented as dataflows with Hamilton and are exposed via FastAPI endpoints. The frontend is built with Streamlit and exposes the different functionalities via a simple web UI. Everything is packaged as containers with docker compose.
This example draws from previous simpler examples (Knowledge Retrieval, Modular LLM Stack, PDF Summarizer).
Find below a list of references for the technical concepts found in this example
chmod +x build_app.sh./build_app.sh DOWNLOAD_DIRECTORY YOUR_OPENAI_API_KEYdocker compose up -d --builddocker compose downdocker compose logs -fgit clone https://github.com/dagworks-inc/hamilton.gitcd hamilton/examples/LLM_Workflows/retrieval_augmented_generation.env.template with cp .env.template .env.env with your OpenAI API key such that OPENAI_API_KEY=YOUR_API_KEYdocker compose up -d --builddocker compose up -d --build.docker compose down.docker compose logs -f to tail the logs (ctrl+c to stop tailing the logs).