Classic Hamilton Hello World

In this example we show you a custom scikit-learn Transformer class. This class should be compliant with scikit-learn transformers specifications. This class is meant to be used as part of broader scikit-learn pipelines. Scikit-learn estimators and pipelines allow for stateful objects, which are helpful when applying transformations on train-test splits notably. Also, all pipeline, estimator, and transformer objects should be picklable, enabling reproducible pipelines.

File organization:

my_functions_a.py and my_functions_b.py house the logic that we want to compute.
run.py runs the DAG and asserts the properties of the output for basic use cases.

To run things:

> python run.py

DAG Visualization:

Here is the visualization of the execution that the transformer currently performs if you run run.py:

scikit_transformer

Limitations and TODOs

The current implementation relies on Hamilton defaults' base.HamiltonGraphAdapter and base.PandasDataFrameResult which limits the compatibility with other computation engines supported by Hamilton
The current implementation could be improved for deeper object inspection. A particular challenge is that the Hamilton driver alters the number of columns / features of the input array, and does so by specifying the output columns. In contrast, scikit-learn typically reads columns / features of the input array and passes it down. It would be worth looking at the output feature naming convention and the ColumnTransformer class which aims to fulfill a similar objective to Hamilton.
The current implementation allows little direct access to the Hamilton driver. Currently, the driver accessible via the .driver_ attribute after calling .fit() or .fit_transform() which seems to be coherent with typical scikit-learn behavior.
The current implementation could be slightly modified to have no dependencies on scikit-learn itself. However, having inheriting from BaseEstimator and TransformerMixin provides some clarity about the class's purpose.