tree: 14b6af3a14d29b4c05ca10444e32ce78a5e9bb79 [path history] [tgz]
  1. air-quality-data.csv
  2. analysis_flow.py
  3. hamilton_notebook.ipynb
  4. my_file.dot.png
  5. README.md
  6. requirements.txt
  7. run_analysis.py
examples/numpy/air-quality-analysis/README.md

Air Quality Analysis

This is taken from the numpy tutorial https://github.com/numpy/numpy-tutorials/blob/main/content/tutorial-air-quality-analysis.md.

analysis_flow.py

Is where the analysis steps are defined as Apache Hamilton functions.

Versus doing this analysis in a notebook, the strength of Apache Hamilton here is in forcing concise definitions and language around steps in the analysis -- and then magically the analysis is pretty reusable / very easy to augment. E.g. add some @config.when or split things into python modules to be swapped out, to extend the analysis to new data sets, or new types of analyses.

Here is a simple visualization of the functions and thus the analysis: Analysis DAG

run_analysis.py

Is where the driver code lives to create the DAG and exercise it.

To exercise it:

python run_analysis.py

You can even run this example in Google Colab: Open In Colab

Caveat

The code found here was copied and pasted, and then tweaked to run with Apache Hamilton. If something from the modeling perspective isn't clear, please read https://github.com/numpy/numpy-tutorials/blob/main/content/tutorial-air-quality-analysis.md