tree: 7a0a2b7670e8bd668f8d11ad404374386f869beb [path history] [tgz]
  1. column_dataflow.py
  2. columns.png
  3. ibis_feature_set.png
  4. README.md
  5. requirements.txt
  6. run.py
  7. table_dataflow.py
  8. tables.png
examples/ibis/feature_engineering/README.md

Ibis + Hamilton

Ibis is a portable dataframe library to write procedural data transformations in Python and be able to execute them directly on various SQL backends (DuckDB, Snowflake, Postgres, Flink, see full list). Hamilton provides a declarative way to define testable, modular, self-documenting dataflows, that encode lineage and metadata.

In this example, we‘ll show how to get started with creating feature transformations and training a machine learning model. You’ll learn about the basics of Ibis and IbisML and how they integrate with Hamilton.

column-level feature engineering

Running the example

Follow these steps to get the example working:

  1. create and activate virtual environment

    python -m venv venv & . venv/bin/activate
    
  2. install requirements

    pip install -r requirements.txt
    
  3. execute the Hamilton feature engineering dataflow at the table or column level

    python run.py --level [table, column]
    

Files

  • table_dataflow.py and column_dataflow.py include the same Ibis feature engineering dataflow, but with different level of granularity
  • tables.png and columns.png were generated by Hamilton directly from the code.
  • ibis_feature_set.png was generated by Ibis. It describes the atomic data transformations executed by the expression.

Resources