In this example, we‘re going to show you how easy it is to run Hamilton inside a dbt task. Making use of DBT’s exciting new python API, we can blend the two frameworks seamlessly.
While the two frameworks might look similar at first glance, DBT and Hamilton are actually quite complementary.
At a high-level, DBT can help you get the data/run large-scale operations in your warehouse, while Hamilton can help you make a model out of it.
To demonstrate this, we've taken one of our favorite examples of writing data science code xLaszlo's code quality for DS tutorial, and re-written it using a combination of DBT + Hamilton. This models the classic titanic problem.
In this case we're using FAL to help run python in dbt -- it enables us to manage environments, import packages happily, etc...
While the initial example is very simple, it should be enough for you to get started on your own!
To run the example, you'll need to do two things:
# Using pypi $ cd examples/dbt $ pip install -r requirements.txt
# Currently this has to be run from within the directory $ dbt run 00:53:20 Running with dbt=1.3.1 00:53:20 Found 2 models, 0 tests, 0 snapshots, 0 analyses, 292 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics 00:53:20 00:53:20 Concurrency: 1 threads (target='dev') 00:53:20 00:53:20 1 of 2 START sql table model main.raw_passengers ............................... [RUN] 00:53:20 1 of 2 OK created sql table model main.raw_passengers .......................... [OK in 0.06s] 00:53:20 2 of 2 START python table model main.predict ................................... [RUN] 00:53:21 2 of 2 OK created python table model main.predict .............................. [OK in 0.73s] 00:53:21 00:53:21 Finished running 2 table models in 0 hours 0 minutes and 0.84 seconds (0.84s). 00:53:21 00:53:21 Completed successfully 00:53:21 00:53:21 Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
This will modify a duckdb file. You can inspect the results using python or your favorite duckdb interface.
We've organized the code into two separate DBT models:
raw_passengers This is a simple select and join using duckdb and DBT. Due to the simplicity of DBT -- its just as you would write if it were embedded within a python program, or you were executing SQL on your own! It does, however, automatically get materialized.
train_and_infer This uses the data outputted by (1) to do quite a few things:
It outputs the inference set. Note it only runs a subset of the DAG -- we could easily add more tasks that output metrics, etc... We just wanted to keep it simple. DBT in python is still in beta, and we‘ll be opening issues/contributing to get it more advanced! We’re especially excited about FAL as it helps solve some of the uglier python problems we hit along the way.
Here is the DAG generated by Hamilton for the above example:
This is just a start, and we think that Hamilton + DBT have a long/exciting future together. In particular, we could:
If you're excited by any of this, drop on by! Some resources to get you help: