In this example we show you a custom scikit-learn Transformer class. This class should be compliant with scikit-learn transformers specifications. This class is meant to be used as part of broader scikit-learn pipelines. Scikit-learn estimators and pipelines allow for stateful objects, which are helpful when applying transformations on train-test splits notably. Also, all pipeline, estimator, and transformer objects should be picklable, enabling reproducible pipelines.
File organization:
my_functions_a.py and my_functions_b.py house the logic that we want to compute.run.py runs the DAG and asserts the properties of the output for basic use cases.To run things:
> python run.py
Here is the visualization of the execution that the transformer currently performs if you run run.py:
base.HamiltonGraphAdapter and base.PandasDataFrameResult which limits the compatibility with other computation engines supported by HamiltonColumnTransformer class which aims to fulfill a similar objective to Hamilton.Hamilton driver. Currently, the driver accessible via the .driver_ attribute after calling .fit() or .fit_transform() which seems to be coherent with typical scikit-learn behavior.BaseEstimator and TransformerMixin provides some clarity about the class's purpose.