tree: 2065e647d38ca6f0dce4b21eadd113f2fb9e0f80 [path history] [tgz]
  1. hamilton_notebook.ipynb
  2. my_functions_a.py
  3. my_functions_b.py
  4. README.md
  5. requirements.txt
  6. run.py
  7. scikit_transformer.png
examples/scikit-learn/README.md

Classic Hamilton Hello World

In this example we show you a custom scikit-learn Transformer class. This class should be compliant with scikit-learn transformers specifications. This class is meant to be used as part of broader scikit-learn pipelines. Scikit-learn estimators and pipelines allow for stateful objects, which are helpful when applying transformations on train-test splits notably. Also, all pipeline, estimator, and transformer objects should be picklable, enabling reproducible pipelines.

File organization:

  • my_functions_a.py and my_functions_b.py house the logic that we want to compute.
  • run.py runs the DAG and asserts the properties of the output for basic use cases.

To run things:

> python run.py

DAG Visualization:

Here is the visualization of the execution that the transformer currently performs if you run run.py:

scikit_transformer

Limitations and TODOs

  • The current implementation relies on Hamilton defaults' base.HamiltonGraphAdapter and base.PandasDataFrameResult which limits the compatibility with other computation engines supported by Hamilton
  • The current implementation could be improved for deeper object inspection. A particular challenge is that the Hamilton driver alters the number of columns / features of the input array, and does so by specifying the output columns. In contrast, scikit-learn typically reads columns / features of the input array and passes it down. It would be worth looking at the output feature naming convention and the ColumnTransformer class which aims to fulfill a similar objective to Hamilton.
  • The current implementation allows little direct access to the Hamilton driver. Currently, the driver accessible via the .driver_ attribute after calling .fit() or .fit_transform() which seems to be coherent with typical scikit-learn behavior.
  • The current implementation could be slightly modified to have no dependencies on scikit-learn itself. However, having inheriting from BaseEstimator and TransformerMixin provides some clarity about the class's purpose.