Apache Amaterasu (incubating) Build Status

Apache Amaterasu

Apache Amaterasu is an open-source framework providing configuration management and deployment of containerized data pipelines. Amaterasu allows developers and data scientists to write, collaborate and easily deploy data pipelines to different cluster environments. Amaterasu allows them manage configuration and dependencies for different environments.

Main concepts

Repo

Amaterasu jobs are defined within and Amaterasu repository. A repository is a filesystem structure stored in a git repository that contains definitions for the following components:

Actions

Put simply, an action is a process that is being managed by Amaterasu. In order to deploy and manage an actions Amaterasu is creating a container with the action, its dependencies and configuration, and deploys it on a cluster (currently only Apache Mesos and YARN clusters are supported with Kubernetes planned for later version).

Frameworks

Apache Amaterasu is able to configure and interact with different data processing frameworks. Supported frameworks can be easily configured for deployment, and also integrate seamlessly with custom APIs. For more information about supported frameworks and how to support additional frameworks seeour Frameworks section.

Configuration and Environments

One of the main objectives of Amaterasu is to manage configuration configuration for data pipelines. Amaterasu configurations are stored per environment allowing the same pipeline to be deployed with a configuration that fits it's environment.

Deployments

Amaterasu deployments are stored in a maki.yml or maki.yaml file in the root of the amaterasu repository. The deployment definition contains the different actions, and their order of deployment and execution.

Setting up Amaterasu

Download

Amaterasu is available for download download page. You need to download Amaterasu and extract it on to a node in the cluster. Once you do that, you are just a couple of easy steps away from running your first job.

Configuration

Configuring amaterasu is simply done buy editing the amaterasu.properties file in the top-level amaterasu directory.

Because Amaterasu supports several cluster environments (currently it supports Apache Mesos and Apache YARN)

Apache Mesos

propertyDescriptionValue
ModeThe cluster manager to be usedmesos
zkThe ZooKeeper connection
string to be used by
amaterasu
The address of a zookeeper node
masterThe clusters' Mesos masterThe address of the Mesos Master
userThe user that will be used
to run amaterasu
root
pythonPathThe path to the Python3 executablepython3 (/usr/bin/python3)

Apache YARN

Note: Different Hadoop distributions need different variations of the YARN configuration. Amaterasu is currently tested regularly with HDP and Amazon EMR.

propertyDescriptionValue
ModeThe cluster manager to be usedmesos
zkThe ZooKeeper connection
string to be used by
amaterasu
The address of a zookeeper node
pythonPathThe path to the Python3 executablepython3 (/usr/bin/python3)

Running a Job

To run an amaterasu job, run the following command in the top-level amaterasu directory:

ama-start.sh --repo="https://github.com/shintoio/amaterasu-job-sample.git" --branch="master" --env="test" --report="code" 

We recommend you either fork or clone the job sample repo and use that as a starting point for creating your first job.