In this example, were going to show how to run a simple data preprocessing -> model training -> model evaluation workflow using Hamilton within Prefect tasks.
Prefect is a heavy weight system that includes useful features for production system such as job queues, caching, retries, logging, and more. However, each additional “task” you create comes with execution overhead and needs to be manually added to the DAG structure. Opting for less granular tasks ultimately reduces visibility and maintainability, while opting for more creates more overhead.
In contrast, Hamilton automatically builds its DAG from the function definition itself, encouraging clean, small and single-purposed functions. By using Hamilton in a Prefect task, you gain this greater insight into lineage, reduced Prefect execution overhead and a simpler DAG to manage, and you get portable Python code you can run outside Prefect too.
This example illustrates a typical data science / machine learning workflow where we:
The code to support these steps is written as modular functions in prepare_data.py (steps 1, 2, 3), train_model.py (step 4), and evaluate_model.py (step 5). The modules can be loaded with the Hamilton Driver to automatically generate the execution DAG from the functions. In our scenario, we use Prefect because we want to be able to schedule and run our workflow daily. The file run.py contains the Prefect workflow definition, where each task has its own Hamilton Driver. (Learn more about Prefect scheduling).
You‘ll notice that the Prefect workflow has 2 tasks: prepare_data_task and train_and_evaluate_model_task even though there are 3 function modules. While each function module best describes a logical grouping of functions, the Prefect tasks can help define how/where they should be executed. For example, the data processing task might require CPU resources while the model training and evaluation would benefit from having access to a GPU. This split allows us to benefit from Prefect’s monitoring and tracking features, which make it easier to debug errors than having a single large Prefect task.
For another workflow, a large Hamilton DAG in a single Prefect task, or smaller Prefect tasks and Hamilton modules might have been preferred. This shows how the Hamilton Driver brings flexibility to where and how you run your code.
The easiest way to get this example running is to sign up for Prefect‘s free tier and follow the Prefect Cloud Quickstart section. Don’t want to sign up? Don't worry, as Prefect is open source, can instead opt to host and run everything yourself as described here. The steps to get started are:
python -m venv ./venv && . ./venv/bin/activate &&& pip install -r requirements.txt
prefect cloud loginpython run.py. You should see a new run appear on your dashboard at https://app.prefect.cloud/produce a local visualization file
dr = hamilton.driver.Driver(...)
visualization_path = "hamilton_dag"
# visualize_execution shares a similar signature to Driver.execute()
dr.visualize_execution(
    final_vars=...,
    inputs=...,
    output_file_path=visualization_path,
    render_kwargs={"format": "png"},
)
encode file as bytes
import b64
with open(f"{visualization_path}.png", "rb") as f:
    encoded_string = b64.b64encode(f.read())
push the byte string to a Prefect remote storage
from prefect.filesystems import RemoteFileSystem
remote_basepath =...
fs = RemoteFileSystem(basepath=remote_basepath)
remote_dag_file = ...
with fs.open(remote_dag_file, "wb") as f:
    f.write(encoded_string)
link the byte string artifact to the Prefect run using create_link_artifact
from prefect.artifacts import create_link_artifact
create_link_artifact(
    key=...,
    link=f"{remote_basepath}/{remote_dag_file},
    description="Hamilton execution DAG"
)