| # Streamlit + Hamilton |
| |
| > This example accompanies the documentation page for [Streamlit](https://hamilton.dagworks.io/en/latest/integrations/streamlit/) integration. |
| |
| Streamlit is an open-source Python library to create web applications with minimal effort. It's an effective solution to create simple dashboards, interactive data visualizations, and proof-of-concepts for data science, machine learning, and LLM applications. |
| |
| In this example, We will build a simple financial dashboard based on the Kaggle [Bank Marketing Dataset](https://www.kaggle.com/datasets/janiobachmann/bank-marketing-dataset). |
| |
| ## How to run |
| 1. Create virtual environment: `python -m venv ./venv` |
| 2. Activate virtual environment: `. venv/bin/activate` (or `source venv/bin/Scripts` on Windows) |
| 3. Install requirements: `pip install -r requirements.txt` |
| 4. Launch Streamlit application: `streamlit run app.py` |
| |
| |
| ## File organization |
| Adding Hamilton to your Streamlit application can provide a better separation between the dataflow and the UI logic. They pair nicely together because Hamilton is also stateless. Once defined, each call to `Driver.execute()` is independent. Therefore, on each Streamlit rerun, you use `Driver.execute()` to complete computations. Using Hamilton this way allows you to write your dataflow into Python modules and outside of the Streamlit. |
| |
| ### logic.py |
| Hamilton transformations are defined in the module `logic.py`. This includes downloading the data from the web, getting unique values for `job`, conducting groupby aggregates, and creating `plotly` figures. |
| |
| ### app.py |
| The Streamlit UI is defined in `app.py`. Notice a few things: |
| - `app.py` doesn't have to depend on `pandas` and `plotly`. |
| - `@cache_resource` allows to create the `Driver` only once. |
| - `@cache_data` on `_execute()` will automatically cache any Hamilton result based on the combination of arguments (`final_vars`, `inputs`, and `overrides`) |
| - `get_state_inputs()` and `get_state_overrides()` will collect values from user inputs. |
| - `execute()` parses the inputs and overrides from the state and call `_execute()`. |
| |
| |
| ## Benefits |
| - **Clearer scope**: the decoupling between `app.py` and `logic.py` makes it easier to add data transformations or extend UI, and debug errors associated with either. |
| - **Reusable code**: the module `logic.py` can be reused elsewhere with Hamilton. |
| - If you are building a proof-of-concept with Streamlit, your Hamilton module will be able to grow with your project and be useful for your production pipelines. |
| - If you are already building dataflows with Hamilton, using it with Streamlit ensures your dashboard metrics have the same implementation with your production pipeline (i.e., prevent [implementation skew](https://building.nubank.com.br/dealing-with-train-serve-skew-in-real-time-ml-models-a-short-guide/)) |
| - **Performance boost**: by caching the Hamilton Driver and its execution call, we are able to effectively cache all data operations in a few lines of code. Furthermore, Hamilton can scale further by using a remote task executor on a separate machine from the Streamlit application. |