Here's an extensible and production-ready PDF summarizer that you can run anywhere! The frontend uses streamlit, which communicates with a FastAPI backend powered by Hamilton. You give it a PDF file via the browser app and it returns you a text summary using the OpenAI API. If you want, you skip the browser inteface and directly access the /summarize endpoint with your document! Everything is containerized using Docker, so you should be able to run it where you please 🏃.
This project shows how easy it is to productionize Hamilton. Its function-centric declarative approach makes the code easy to read and extend. We invite you to clone the repo and customize to your needs! We are happy to help you via Slack and are excited to see what you build 😁
Here are a few ideas:
file_uploader to allow sending batches of files through the UISummaryResponse@config.when() decorator to add alternatives to the raw_text() function for PDFs The Hamilton execution DAG powering the backend
git clone https://github.com/dagworks-inc/hamilton.gitcd hamilton/examples/LLM_Workflows/pdf_summarizer.env (next to README.md and docker-compose.yaml) and add your OpenAI API key in such that OPENAI_API_KEY=YOUR_API_KEYdocker compose builddocker compose up -ddocker compose up -d --build.docker compose down.docker compose logs -f to tail the logs (ctrl+c to stop tailing the logs)..env file. E.g. DAGWORKS_API_KEY=YOUR_API_KEYrequirements.txt.sync_dr with the DAGWorks Driver.docker compose up -d --build.Yes, that‘s right, you can also run the exact same code on spark! It’s just a oneline code change. See the run_on_spark README for more details.