In this example we show you a custom scikit-learn Transformer
class. This class should be compliant with scikit-learn transformers specifications. This class is meant to be used as part of broader scikit-learn pipelines. Scikit-learn estimators and pipelines allow for stateful objects, which are helpful when applying transformations on train-test splits notably. Also, all pipeline, estimator, and transformer objects should be picklable, enabling reproducible pipelines.
File organization:
my_functions_a.py
and my_functions_b.py
house the logic that we want to compute.run.py
runs the DAG and asserts the properties of the output for basic use cases.To run things:
> python run.py
Here is the visualization of the execution that the transformer currently performs if you run run.py
:
base.HamiltonGraphAdapter
and base.PandasDataFrameResult
which limits the compatibility with other computation engines supported by HamiltonColumnTransformer
class which aims to fulfill a similar objective to Hamilton.Hamilton driver
. Currently, the driver accessible via the .driver_
attribute after calling .fit()
or .fit_transform()
which seems to be coherent with typical scikit-learn behavior.BaseEstimator
and TransformerMixin
provides some clarity about the class's purpose.