Apache Griffin Docker Guide

Griffin docker images are pre-built on docker hub, users can pull them to try griffin in docker.

Preparation

Environment preparation

  1. Install docker and docker compose.

  2. Increase vm.max_map_count of your local machine(linux), to use elasticsearch.

    sysctl -w vm.max_map_count=262144
    

    For macOS, please increase enough memory available for docker (For example, set more than 4 GB in docker->preferences->Advanced) or decrease memory for es instance(For example, set -Xms512m -Xmx512m in jvm.options)

    For other platforms, please reference to this link from elastic.co max_map_count kernel setting

  3. Pull griffin pre-built docker images, but if you access docker repository easily(NOT in China).

    docker pull apachegriffin/griffin_spark2:0.3.0
    docker pull apachegriffin/elasticsearch
    docker pull apachegriffin/kafka
    docker pull zookeeper:3.5
    

    For Chinese users, you can pull the images from the following mirrors.

    docker pull registry.docker-cn.com/apachegriffin/griffin_spark2:0.3.0
    docker pull registry.docker-cn.com/apachegriffin/elasticsearch
    docker pull registry.docker-cn.com/apachegriffin/kafka
    docker pull zookeeper:3.5
    

    The docker images are the griffin environment images.

    • apachegriffin/griffin_spark2: This image contains mysql, hadoop, hive, spark, livy, griffin service, griffin measure, and some prepared demo data, it works as a single node spark cluster, providing spark engine and griffin service.
    • apachegriffin/elasticsearch: This image is based on official elasticsearch, adding some configurations to enable cors requests, to provide elasticsearch service for metrics persist.
    • apachegriffin/kafka: This image contains kafka 0.8, and some demo streaming data, to provide streaming data source in streaming mode.
    • zookeeper:3.5: This image is official zookeeper, to provide zookeeper service in streaming mode.

How to use griffin docker images in batch mode

  1. Copy docker-compose-batch.yml to your work path.
  2. In your work path, start docker containers by using docker compose, wait for about one minute, then griffin service is ready.
    $ docker-compose -f docker-compose-batch.yml up -d
    
    After approximate one minute, you can check result. If result looks like below, it means startup is successful.
    $ docker container ls
    CONTAINER ID        IMAGE                                COMMAND                  CREATED             STATUS              PORTS                                                                                                                                                                                                                                                                                                                                                    NAMES
    bfec3192096d        apachegriffin/griffin_spark2:0.3.0   "/etc/bootstrap-al..."   5 hours ago         Up 5 hours          6066/tcp, 8030-8033/tcp, 8040/tcp, 9000/tcp, 10020/tcp, 19888/tcp, 27017/tcp, 49707/tcp, 50010/tcp, 50020/tcp, 50070/tcp, 50075/tcp, 50090/tcp, 0.0.0.0:32122->2122/tcp, 0.0.0.0:33306->3306/tcp, 0.0.0.0:35432->5432/tcp, 0.0.0.0:38042->8042/tcp, 0.0.0.0:38080->8080/tcp, 0.0.0.0:38088->8088/tcp, 0.0.0.0:38998->8998/tcp, 0.0.0.0:39083->9083/tcp   griffin
    fb9d04285070        apachegriffin/elasticsearch          "/docker-entrypoin..."   5 hours ago         Up 5 hours          0.0.0.0:39200->9200/tcp, 0.0.0.0:39300->9300/tcp                                                                                                                                                                                                                                                                                                         es
    
  3. Now you can try griffin APIs by using any http client, here we use postman as example. We have prepared two postman configuration files, you can download them from json files.

    For sake of usage, you need to import two files into postman firstly.

    import

    And change the initial environment BASE_PATH value to <your local IP address>:38080.

    update env

  4. You can try the api Basic -> Get griffin version, to make sure griffin service has started up.

    update env

  5. Add an accuracy measure through api Measures -> Add measure, to create a measure in griffin.

    update env

  6. Add a job to through api jobs -> Add job, to schedule a job to execute the measure. In the example, the schedule interval is 5 minutes.

    update env

  7. After some minutes, you can get the metrics from elasticsearch.
    curl -XGET '<your local IP address>:39200/griffin/accuracy/_search?pretty&filter_path=hits.hits._source' -d '{"query":{"match_all":{}},  "sort": [{"tmst": {"order": "asc"}}]}'
    

How to use griffin docker images in streaming mode

  1. Copy docker-compose-streaming.yml to your work path.
  2. In your work path, start docker containers by using docker compose, wait for about one minutes, then griffin service is ready.
    docker-compose -f docker-compose-streaming.yml up -d
    
  3. Enter the griffin docker container.
    docker exec -it griffin bash
    
  4. Switch into the measure directory.
    cd ~/measure
    
  5. Execute the script of streaming-accu, to execute streaming accuracy measurement.
    ./streaming-accu.sh
    
    You can trace the log in streaming-accu.log.
    tail -f streaming-accu.log
    
  6. Limited by the docker container resource, you can only execute accuracy or profiling separately. If you want to try streaming profiling measurement, please kill the streaming-accu process first.
    kill -9 `ps -ef | awk '/griffin-measure/{print $2}'`
    
    Then clear the checkpoint directory and other related directories of last streaming job.
    ./clear.sh
    
    Execute the script of streaming-prof, to execute streaming profiling measurement.
    ./streaming-prof.sh
    
    You can trace the log in streaming-prof.log.
    tail -f streaming-prof.log