Here you'll find documentation about the sf-hamilton-contrib package that curates a collection of Hamilton Dataflows that are ready to be used in your own projects. They are user-contributed and maintained, with the goal of making it easier for you to get started with Hamilton.
We expect this collection to grow over time, so check back often! As dataflows become mature we will move them into the official sub-package of the respository and become maintained by the Hamilton team.
There are two ways to get access to dataflows in this package. For either approach, the assumption is that you have the requisite python dependencies installed on your system. You‘ll get import errors if you don’t. Don't know what you need, we have convenience functions to help!
This approach relies on you installing the package on your system. This is the recommended path for production purposes as you can version-lock your dependencies.
To install the package, run:
pip install sf-hamilton-contrib --upgrade
Once installed, you can import the dataflows as follows.
Things you need to know:
from hamilton import driver # from hamilton.contrib.dagworks import NAME_OF_DATAFLOW from hamilton.contrib.user.NAME_OF_USER import NAME_OF_DATAFLOW dr = ( driver.Builder() .with_config({}) # replace with configuration as appropriate .with_modules(NAME_OF_DATAFLOW) .build() ) # execute the dataflow, specifying what you want back. Will return a dictionary. result = dr.execute( [NAME_OF_DATAFLOW.FUNCTION_NAME, ...], # this specifies what you want back inputs={...} # pass in inputs as appropriate )
To find an example go to the hub.
Here we dynamically download the dataflow from the internet and execute it. This is useful for quickly iterating in a notebook and pulling in just the dataflow you need.
from hamilton import dataflows, driver # downloads into ~/.hamilton/dataflows and loads the module -- WARNING: ensure you know what code you're importing! # NAME_OF_DATAFLOW = dataflows.import_module("NAME_OF_DATAFLOW") # if using official DAGWorks dataflow NAME_OF_DATAFLOW = dataflows.import_module("NAME_OF_DATAFLOW", "NAME_OF_USER") dr = ( driver.Builder() .with_config({}) # replace with configuration as appropriate .with_modules(NAME_OF_DATAFLOW) .build() ) # execute the dataflow, specifying what you want back. Will return a dictionary. result = dr.execute( [NAME_OF_DATAFLOW.FUNCTION_NAME, ...], # this specifies what you want back inputs={...} # pass in inputs as appropriate )
To find an example go to the hub.
Getting started is one thing, but then modifying to your needs is another. So we have a prescribed flow to enable you to take a dataflow, and copy the code to a place of your choosing. This allows you to easily modify the dataflow as you see fit.
Run this in a notebook or python script to copy the dataflow to a directory of your choosing.
from hamilton import dataflows # dynamically pull and then copy NAME_OF_DATAFLOW = dataflows.import_module("NAME_OF_DATAFLOW", "NAME_OF_USER") dataflows.copy(NAME_OF_DATAFLOW, destination_path="PATH_TO_DIRECTORY") # copy from the installed library from hamilton.contrib.user.NAME_OF_USER import NAME_OF_DATAFLOW dataflows.copy(NAME_OF_DATAFLOW, destination_path="PATH_TO_DIRECTORY")
You can then modify/import the code as you see fit. See copy() for more details.
If you have a dataflow that you would like to share with the community, please submit a pull request to this repository. We will review your dataflow and if it meets our standards we will add it to the package. To submit a pull request please use this template . To access it, create a new Pull Request, then hit the preview tab, and click the link to append template=HAMILTON_CONTRIB_PR_TEMPLATE.md to the URL.
We want to ensure that the dataflows in this package are of high quality and are easy to use. To that end, we have a set of standards that we expect all dataflows to meet. If you have any questions, please reach out.
Standards:
To get started with development, you'll want to first fork the hamilton repository from the github UI.
Then, clone it locally and install the package in editable mode, ensuring you install any dependencies required for the initilization script
cd hamilton # Your fork pip install -e "./contrib[contribute]" # Note that this package lives under the `contrib` folder
Next, you need to initialize your dataflow. This will create the necessary files and directories for you to get started.
init-dataflow -u <your_github_username> -n <name_of_dataflow>
This will do the following:
contrib/hamilton/contrib/user/<your_github_username> with the appropriate files to describe youauthor.md -- this will describe you with links out to github/socials__init__.py -- this will be an empty file that allows you to import your dataflowcontrib/hamilton/contrib/user/<your_github_username>/<name_of_dataflow> with the appropriate files to describe your dataflow:README.md to describe the dataflow with the standard headings__init__.py to contain the Hamilton coderequirements.txt to contain the required packages outside of Hamiltontags.json to curate your dataflowvalid_configs.jsonl to specify the valid configurations for it to be rundag.png to show one possible configuration of your dataflowThese are all required. You do not have to use the initialization script -- you can always copy the files over directly. That said, it is idempotent (it will fill out any missing files), and will ensure that you have the correct structure.
To get started, you'll want to do the following:
__init__.py with the appropriate code -- see this issue if you want some inspiration for where to get startedREADME.md with the appropriate documentation -- follow one of the approved dataflowstags.json with the appropriate tags -- follow one of the approved dataflowsvalid_configs.jsonl with the appropriate configurations -- this is not necessary if you have no configurations that can change the shape of your DAGif __name__ == '__main__' block to do so:import __init__ as my_module from hamilton import base, driver dr = driver.Driver( {}, my_module, adapter=base.DefaultAdapter(), ) # create the DAG image dr.display_all_functions("dag", {"format": "png", "view": False})
As usernames map to packages, we need to ensure that they are valid. To that end, we have a few rules:
If the above apply, run the init-dataflow command with -s to specify a sanitized username.
Join our slack community to chat/ask Qs/etc.