Apache Amaterasu is an open-source framework providing configuration management and deployment of containerized data pipelines. Amaterasu allows developers and data scientists to write, collaborate and easily deploy data pipelines to different cluster environments. Amaterasu allows them manage configuration and dependencies for different environments.
Amaterasu jobs are defined within and Amaterasu repository. A repository is a filesystem structure stored in a git repository that contains definitions for the following components:
Put simply, an action is a process that is being managed by Amaterasu. In order to deploy and manage an actions Amaterasu is creating a container with the action, its dependencies and configuration, and deploys it on a cluster (currently only Apache Mesos and YARN clusters are supported with Kubernetes planned for later version).
Apache Amaterasu is able to configure and interact with different data processing frameworks. Supported frameworks can be easily configured for deployment, and also integrate seamlessly with custom APIs. For more information about supported frameworks and how to support additional frameworks seeour Frameworks section.
One of the main objectives of Amaterasu is to manage configuration configuration for data pipelines. Amaterasu configurations are stored per environment allowing the same pipeline to be deployed with a configuration that fits it's environment.
Amaterasu deployments are stored in a maki.yml
or maki.yaml
file in the root of the amaterasu repository. The deployment definition contains the different actions, and their order of deployment and execution.
Amaterasu is available for download download page. You need to download Amaterasu and extract it on to a node in the cluster. Once you do that, you are just a couple of easy steps away from running your first job.
Configuring amaterasu is simply done buy editing the amaterasu.properties
file in the top-level amaterasu directory.
Because Amaterasu supports several cluster environments (currently it supports Apache Mesos and Apache YARN)
property | Description | Value |
---|---|---|
Mode | The cluster manager to be used | mesos |
zk | The ZooKeeper connection string to be used by amaterasu | The address of a zookeeper node |
master | The clusters' Mesos master | The address of the Mesos Master |
user | The user that will be used to run amaterasu | root |
pythonPath | The path to the Python3 executable | python3 (/usr/bin/python3) |
Note: Different Hadoop distributions need different variations of the YARN configuration. Amaterasu is currently tested regularly with HDP and Amazon EMR.
property | Description | Value |
---|---|---|
Mode | The cluster manager to be used | mesos |
zk | The ZooKeeper connection string to be used by amaterasu | The address of a zookeeper node |
pythonPath | The path to the Python3 executable | python3 (/usr/bin/python3) |
To run an amaterasu job, run the following command in the top-level amaterasu directory:
ama-start.sh --repo="https://github.com/shintoio/amaterasu-job-sample.git" --branch="master" --env="test" --report="code"
We recommend you either fork or clone the job sample repo and use that as a starting point for creating your first job.