In this tutorial, we will guide you through setting up Apache Liminal on your local machine and run a simple machine-learning workflow, based on the classic Iris dataset classification example.
More details in this link.
Note: Make sure kubernetes cluster is running in docker desktop
We will define the following steps and services to implement the Iris classification example:
Train, Validate & Deploy - Training and validation execution is managed by Liminal Airflow extension. The training task trains a regression model using a public dataset.
We then validate the model and deploy it to a model-store in mounted volume.
Inference - online inference is done using a Python Flask service running on the local Kubernetes in docker desktop. The service exposes the /predict
endpoint. It reads the model stored in the mounted drive and uses it to evaluate the request.
In the dev folder, clone the example code from liminal:
git clone https://github.com/apache/incubator-liminal
Note: You just cloned the entire Liminal Project, you actually only need examples folder.
Create a python virtual environment to isolate your runs:
cd incubator-liminal/examples/aws-ml-app-demo python3 -m venv env
Activate your virtual environment:
source env/bin/activate
Now we are ready to install liminal:
pip install apache-liminal
The build will create docker images based on the liminal.yml file in the images
section and will create a kubernetes local volume.
Be informed that all tasks use a mounted volume as defined in the pipeline YAML.
In our case the mounted volume will point to the liminal Iris Classification example. The training task trains a regression model using a public dataset. We then validate the model and deploy it to a model-store in the mounted volume.
liminal build
The deploy command deploys a liminal server and deploys any liminal.yml files in your working directory or any of its subdirectories to your liminal home directory.
liminal deploy --clean
Note: liminal home directory is located in the path defined in LIMINAL_HOME env variable. If the LIMINAL_HOME environemnet variable is not defined, home directory defaults to ~/liminal_home directory.
The start command spins up 3 containers that load the Apache Airflow stack. Liminal's Airflow extension is responsible to execute the workflows defined in the liminal.yml file as standard Airflow DAGs.
liminal start
It runs the following three containers:
Once liminal server has completed starting up, you can navigate to admin UI in your browser: http://localhost:8080
Important: Set off/on toggle to activate your pipeline (DAG), nothing will happen otherwise!
You can go to graph view to see all the tasks configured in the liminal.yml file: http://localhost:8080/admin/airflow/graph?dag_id=my_datascience_pipeline
Declaration of the mounted volume in your liminal YAML:
name: MyDataScienceApp owner: Bosco Albert Baracus volumes: - volume: gettingstartedvol claim_name: gettingstartedvol-pvc local: path: .
Declaration of the pipeline tasks flow in your liminal YAML:
pipelines: - pipeline: my_datascience_pipeline ... schedule: 0 * 1 * * tasks: - task: train type: python description: train model image: myorg/mydatascienceapp cmd: python -u training.py train ... - task: validate type: python description: validate model and deploy image: myorg/mydatascienceapp cmd: python -u training.py validate ...
pipelines: ... tasks: - task: train ... env: MOUNT_PATH: /mnt/gettingstartedvol mounts: - mount: mymount volume: gettingstartedvol path: /mnt/gettingstartedvol
MOUNT_PATH
in which we store the trained model.Once the Iris Classification model trainging is completed and model is deployed (to the mounted volume), you can launch a pod of the pre-built image which contains a flask server, by applying the following Kubernetes manifest configuration:
kubectl apply -f manifests/aws-ml-app-demo.yaml
Alternatively, create a Kubernetes pod from stdin:
cat <<EOF | kubectl apply -f - --- apiVersion: v1 kind: Pod metadata: name: aws-ml-app-demo spec: volumes: - name: task-pv-storage persistentVolumeClaim: claimName: gettingstartedvol-pvc containers: - name: task-pv-container imagePullPolicy: Never image: myorg/mydatascienceapp lifecycle: postStart: exec: command: ["/bin/bash", "-c", "apt update && apt install curl -y"] ports: - containerPort: 80 name: "http-server" volumeMounts: - mountPath: "/mnt/gettingstartedvol" name: task-pv-storage EOF
Check that the service is running:
kubectl get pods --namespace=default
Check that the service is up:
kubectl exec -it --namespace=default aws-ml-app-demo -- /bin/bash -c "curl localhost/healthcheck"
Check the prediction:
kubectl exec -it --namespace=default aws-ml-app-demo -- /bin/bash -c "curl -X POST -d '{\"petal_width\": \"2.1\"}' localhost/predict"
kubectl get pods will help you check your pod status:
kubectl get pods --namespace=default
kubectl logs will help you check your pods log:
kubectl logs --namespace=default aws-ml-app-demo
kubectl exec to get a shell to a running container:
kubectl exec --namespace=default aws-ml-app-demo -- bash
Then you can check the mounted volume df -h
and to verify the result of the model.
git clone https://github.com/apache/incubator-liminal cd examples/aws-ml-app-demo python3 -m venv env source env/bin/activate rm -rf ~/liminal_home pip uninstall apache-liminal pip install apache-liminal Liminal build Liminal create liminal deploy --clean liminal start
To make sure liminal containers are stopped use:
liminal stop
To deactivate the python virtual env use:
deactivate
To terminate the kubernetes pod:
kubectl delete pod --namespace=default aws-ml-app-demo