Apache Griffin Docker Guide

Griffin docker images are pre-built on docker hub, users can pull them to try griffin in docker.

Preparation

Environment preparation

  1. Install docker and docker compose.

  2. Increase vm.max_map_count of your local machine(linux), to use elasticsearch.

    sysctl -w vm.max_map_count=262144
    

    For macOS, please increase enough memory available for docker (For example, set more than 4 GB in docker->preferences->Advanced) or decrease memory for es instance(For example, set -Xms512m -Xmx512m in jvm.options)

    For other platforms, please reference to this link from elastic.co max_map_count kernel setting

  3. Pull griffin pre-built docker images, but if you access docker repository easily(NOT in China).

    docker pull apachegriffin/griffin_spark2:0.3.0
    docker pull apachegriffin/elasticsearch
    docker pull apachegriffin/kafka
    docker pull zookeeper:3.5
    

    For Chinese users, you can pull the images from the following mirrors.

    docker pull registry.docker-cn.com/apachegriffin/griffin_spark2:0.3.0
    docker pull registry.docker-cn.com/apachegriffin/elasticsearch
    docker pull registry.docker-cn.com/apachegriffin/kafka
    docker pull registry.docker-cn.com/zookeeper:3.5
    

    The docker images are the griffin environment images.

    • apachegriffin/griffin_spark2: This image contains mysql, hadoop, hive, spark, livy, griffin service, griffin measure, and some prepared demo data, it works as a single node spark cluster, providing spark engine and griffin service.
    • apachegriffin/elasticsearch: This image is based on official elasticsearch, adding some configurations to enable cors requests, to provide elasticsearch service for metrics persist.
    • apachegriffin/kafka: This image contains kafka 0.8, and some demo streaming data, to provide streaming data source in streaming mode.
    • zookeeper:3.5: This image is official zookeeper, to provide zookeeper service in streaming mode.

How to use griffin docker images in batch mode

  1. Copy docker-compose-batch.yml to your work path.
  2. In your work path, start docker containers by using docker compose, wait for about one minutes, then griffin service is ready.
    docker-compose -f docker-compose-batch.yml up -d
    
  3. Now you can try griffin APIs by using postman after importing the json files. In which you need to modify the environment BASE_PATH value to <your local IP address>:38080.
  4. You can try the api Basic -> Get griffin version, to make sure griffin service has started up.
  5. Add an accuracy measure through api Measures -> Add measure, to create a measure in griffin.
  6. Add a job to through api jobs -> Add job, to schedule a job to execute the measure. In the example, the schedule interval is 5 minutes.
  7. After some minutes, you can get the metrics from elasticsearch.
    curl -XGET '<your local IP address>:39200/griffin/accuracy/_search?pretty&filter_path=hits.hits._source' -d '{"query":{"match_all":{}},  "sort": [{"tmst": {"order": "asc"}}]}'
    

How to use griffin docker images in streaming mode

  1. Copy docker-compose-streaming.yml to your work path.
  2. In your work path, start docker containers by using docker compose, wait for about one minutes, then griffin service is ready.
    docker-compose -f docker-compose-streaming.yml up -d
    
  3. Enter the griffin docker container.
    docker exec -it griffin bash
    
  4. Switch into the measure directory.
    cd ~/measure
    
  5. Execute the script of streaming-accu, to execute streaming accuracy measurement.
    ./streaming-accu.sh
    
    You can trace the log in streaming-accu.log.
    tail -f streaming-accu.log
    
  6. Limited by the docker container resource, you can only execute accuracy or profiling separately. If you want to try streaming profiling measurement, please kill the streaming-accu process first.
    kill -9 `ps -ef | awk '/griffin-measure/{print $2}'`
    
    Then clear the checkpoint directory and other related directories of last streaming job.
    ./clear.sh
    
    Execute the script of streaming-prof, to execute streaming profiling measurement.
    ./streaming-prof.sh
    
    You can trace the log in streaming-prof.log.
    tail -f streaming-prof.log