Learn how to use the HamiltonTracker and the Hamilton UI to track a simple machine learning pipeline.
It also illustrates the following notions:
DataLoader and DataSaver objects to load & save data and collect extra metadata in the UI@subdag to fit different ML models with the same model training code in the same DAG run.First, you need to have the Hamilton UI running. You can either pip install the Hamilton UI (recommended) or run it as a Docker container.
Install the Python dependencies:
pip install "sf-hamilton[ui,sdk]"
then launch the Hamilton UI server:
hamilton ui # python -m hamilton.cli.__main__ ui # on windows
See https://hamilton.dagworks.io/en/latest/concepts/ui/ for details, here are the cliff notes:
git clone https://github.com/dagworks-inc/hamilton cd hamilton/ui/deployment ./run.sh
Then go to http://localhost:8242 to create (1) a username and (2) a project. See this video for a walkthrough.
Now that you have the Hamilton UI running, open another terminal tab to:
cd hamilton/examples/hamilton_ui pip install -r requirements.txt
run.py script. Providing the username and project ID to be able to log to the Hamilton UI.python run.py --username <username> --project_id <project_id>
Once you've run that, run this:
python run.py --username <username> --project_id <project_id> --load-from-parquet
raise ValueError("I'm an error").models.py change "data_set": source("data_set_v1"), to "data_set": source("data_set_v2"),, along with what is requested in run.py (i.e. change/add saving data_set_v2) and see how the lineage changes in the Hamilton UI.features.py and then to a dataset. Execute it and then compare the data observed in the Hamilton UI against a prior run.