Apache Griffin Docker Guide

Griffin docker images are pre-built on docker hub, users can pull them to try Apache Griffin in docker.

Preparation

Environment preparation

  1. Install docker and docker compose.

  2. Increase vm.max_map_count of your local machine(linux), to use elasticsearch.

    sysctl -w vm.max_map_count=262144
    

    For macOS, please increase enough memory available for docker (For example, set more than 4 GB in docker->preferences->Advanced) or decrease memory for es instance(For example, set -Xms512m -Xmx512m in jvm.options)

    For other platforms, please reference to this link from elastic.co max_map_count kernel setting

  3. Pull Apache Griffin pre-built docker images, but if you access docker repository easily(NOT in China).

    docker pull apachegriffin/griffin_spark2:0.3.0
    docker pull apachegriffin/elasticsearch
    docker pull apachegriffin/kafka
    docker pull zookeeper:3.5
    

    For Chinese users, you can pull the images from the following mirrors.

    docker pull registry.docker-cn.com/apachegriffin/griffin_spark2:0.3.0
    docker pull registry.docker-cn.com/apachegriffin/elasticsearch
    docker pull registry.docker-cn.com/apachegriffin/kafka
    docker pull zookeeper:3.5
    

    The docker images are the Apache Griffin environment images.

    • apachegriffin/griffin_spark2: This image contains mysql, hadoop, hive, spark, livy, Apache Griffin service, Apache Griffin measure, and some prepared demo data, it works as a single node spark cluster, providing spark engine and Apache Griffin service.
    • apachegriffin/elasticsearch: This image is based on official elasticsearch, adding some configurations to enable cors requests, to provide elasticsearch service for metrics persist.
    • apachegriffin/kafka: This image contains kafka 0.8, and some demo streaming data, to provide streaming data source in streaming mode.
    • zookeeper:3.5: This image is official zookeeper, to provide zookeeper service in streaming mode.

How to use Apache Griffin docker images in batch mode

  1. Copy docker-compose-batch.yml to your work path.
  2. In your work path, start docker containers by using docker compose, wait for about one minute, then Apache Griffin service is ready.
    $ docker-compose -f docker-compose-batch.yml up -d
    
    After approximate one minute, you can check result. If result looks like below, it means startup is successful.
    $ docker container ls
    CONTAINER ID        IMAGE                                COMMAND                  CREATED             STATUS              PORTS                                                                                                                                                                                                                                                                                                                                                    NAMES
    bfec3192096d        apachegriffin/griffin_spark2:0.3.0   "/etc/bootstrap-al..."   5 hours ago         Up 5 hours          6066/tcp, 8030-8033/tcp, 8040/tcp, 9000/tcp, 10020/tcp, 19888/tcp, 27017/tcp, 49707/tcp, 50010/tcp, 50020/tcp, 50070/tcp, 50075/tcp, 50090/tcp, 0.0.0.0:32122->2122/tcp, 0.0.0.0:33306->3306/tcp, 0.0.0.0:35432->5432/tcp, 0.0.0.0:38042->8042/tcp, 0.0.0.0:38080->8080/tcp, 0.0.0.0:38088->8088/tcp, 0.0.0.0:38998->8998/tcp, 0.0.0.0:39083->9083/tcp   griffin
    fb9d04285070        apachegriffin/elasticsearch          "/docker-entrypoin..."   5 hours ago         Up 5 hours          0.0.0.0:39200->9200/tcp, 0.0.0.0:39300->9300/tcp                                                                                                                                                                                                                                                                                                         es
    
  3. Now you can try Apache Griffin APIs by using any http client, here we use postman as example. We have prepared two postman configuration files, you can download them from json files.

    For sake of usage, you need to import two files into postman firstly.

    import

    And change the initial environment BASE_PATH value to <your local IP address>:38080.

    update env

  4. You can try the api Basic -> Get griffin version, to make sure Apache Griffin service has started up.

    update env

  5. Add an accuracy measure through api Measures -> Add measure, to create a measure in Apache Griffin.

    update env

  6. Add a job to through api jobs -> Add job, to schedule a job to execute the measure. In the example, the schedule interval is 5 minutes.

    update env

  7. After some minutes, you can get the metrics from elasticsearch.
    curl -XGET '<IP Address>:39200/griffin/accuracy/_search?pretty&filter_path=hits.hits._source' \
    -d '{"query":{"match_all":{}},  "sort": [{"tmst": {"order": "asc"}}]}'
    
    # json metric data returned
    {
      "hits" : {
        "hits" : [
          {
            "_source" : {
              "name" : "metricName",
              "tmst" : 1509599811123,
              "value" : {
                "__tmst" : 1509599811123,
                "miss" : 11,
                "total" : 125000,
                "matched" : 124989
              }
            }
          },
          {
            "_source" : {
              "name" : "metricName",
              "tmst" : 1509599811123,
              "value" : {
                "__tmst" : 1509599811123,
                "miss" : 11,
                "total" : 125000,
                "matched" : 124989
              }
            }
          }
        ]
      }
    }
    

How to use Apache Griffin docker images in streaming mode

  1. Copy docker-compose-streaming.yml to your work path.
  2. In your work path, start docker containers by using docker compose, wait for about one minutes, then Apache Griffin service is ready.
    docker-compose -f docker-compose-streaming.yml up -d
    
  3. Enter the Apache Griffin docker container.
    docker exec -it griffin bash
    
  4. Switch into the measure directory.
    cd ~/measure
    
  5. Execute the script of streaming-accu, to execute streaming accuracy measurement.
    ./streaming-accu.sh
    
    You can trace the log in streaming-accu.log.
    tail -f streaming-accu.log
    
  6. Limited by the docker container resource, you can only execute accuracy or profiling separately. If you want to try streaming profiling measurement, please kill the streaming-accu process first.
    kill -9 `ps -ef | awk '/griffin-measure/{print $2}'`
    
    Then clear the checkpoint directory and other related directories of last streaming job.
    ./clear.sh
    
    Execute the script of streaming-prof, to execute streaming profiling measurement.
    ./streaming-prof.sh
    
    You can trace the log in streaming-prof.log.
    tail -f streaming-prof.log