Apache Griffin Docker Guide

Griffin docker images are pre-built on docker hub, users can pull them to try griffin in docker.

Preparation

Environment preparation

Install docker and docker compose.
Increase vm.max_map_count of your local machine(linux), to use elasticsearch.
```
sysctl -w vm.max_map_count=262144
```
For macOS, please increase enough memory available for docker (For example, set more than 4 GB in docker->preferences->Advanced) or decrease memory for es instance(For example, set -Xms512m -Xmx512m in jvm.options)
For other platforms, please reference to this link from elastic.co max_map_count kernel setting
Pull griffin pre-built docker images, but if you access docker repository easily(NOT in China).
```
docker pull apachegriffin/griffin_spark2:0.3.0
docker pull apachegriffin/elasticsearch
docker pull apachegriffin/kafka
docker pull zookeeper:3.5
```
For Chinese users, you can pull the images from the following mirrors.
```
docker pull registry.docker-cn.com/apachegriffin/griffin_spark2:0.3.0
docker pull registry.docker-cn.com/apachegriffin/elasticsearch
docker pull registry.docker-cn.com/apachegriffin/kafka
docker pull registry.docker-cn.com/zookeeper:3.5
```
The docker images are the griffin environment images.
- apachegriffin/griffin_spark2: This image contains mysql, hadoop, hive, spark, livy, griffin service, griffin measure, and some prepared demo data, it works as a single node spark cluster, providing spark engine and griffin service.
- apachegriffin/elasticsearch: This image is based on official elasticsearch, adding some configurations to enable cors requests, to provide elasticsearch service for metrics persist.
- apachegriffin/kafka: This image contains kafka 0.8, and some demo streaming data, to provide streaming data source in streaming mode.
- zookeeper:3.5: This image is official zookeeper, to provide zookeeper service in streaming mode.

How to use griffin docker images in batch mode

Copy docker-compose-batch.yml to your work path.
In your work path, start docker containers by using docker compose, wait for about one minutes, then griffin service is ready.
```
docker-compose -f docker-compose-batch.yml up -d
```
Now you can try griffin APIs by using postman after importing the json files. In which you need to modify the environment BASE_PATH value to <your local IP address>:38080.
You can try the api Basic -> Get griffin version, to make sure griffin service has started up.
Add an accuracy measure through api Measures -> Add measure, to create a measure in griffin.
Add a job to through api jobs -> Add job, to schedule a job to execute the measure. In the example, the schedule interval is 5 minutes.

After some minutes, you can get the metrics from elasticsearch.

curl -XGET '<your local IP address>:39200/griffin/accuracy/_search?pretty&filter_path=hits.hits._source' -d '{"query":{"match_all":{}},  "sort": [{"tmst": {"order": "asc"}}]}'

How to use griffin docker images in streaming mode

Copy docker-compose-streaming.yml to your work path.
In your work path, start docker containers by using docker compose, wait for about one minutes, then griffin service is ready.
```
docker-compose -f docker-compose-streaming.yml up -d
```
Enter the griffin docker container.
```
docker exec -it griffin bash
```
Switch into the measure directory.
```
cd ~/measure
```
Execute the script of streaming-accu, to execute streaming accuracy measurement.
```
./streaming-accu.sh
```
You can trace the log in streaming-accu.log.
```
tail -f streaming-accu.log
```
Limited by the docker container resource, you can only execute accuracy or profiling separately. If you want to try streaming profiling measurement, please kill the streaming-accu process first.
```
kill -9 `ps -ef | awk '/griffin-measure/{print $2}'`
```
Then clear the checkpoint directory and other related directories of last streaming job.
```
./clear.sh
```
Execute the script of streaming-prof, to execute streaming profiling measurement.
```
./streaming-prof.sh
```
You can trace the log in streaming-prof.log.
```
tail -f streaming-prof.log
```