How to write Aurora configuration files, including feature descriptions and best practices. When writing a configuration file, make use of aurora job inspect
. It takes the same job key and configuration file arguments as aurora job create
or aurora update start
. It first ensures the configuration parses, then outputs it in human-readable form.
You should read this after going through the general Aurora Tutorial.
To run a job on Aurora, you must specify a configuration file that tells Aurora what it needs to know to schedule the job, what Mesos needs to run the tasks the job is made up of, and what Thermos needs to run the processes that make up the tasks. This file must have a.aurora
suffix.
A configuration file defines a collection of objects, along with parameter values for their attributes. An Aurora configuration file contains the following three types of objects:
A configuration also specifies a list of Job
objects assigned to the variable jobs
.
The .aurora
file format is just Python. However, Job
, Task
, Process
, and other classes are defined by a type-checked dictionary templating library called Pystachio, a powerful tool for configuration specification and reuse. Pystachio objects are tailored via {{}} surrounded templates.
When writing your .aurora
file, you may use any Pystachio datatypes, as well as any objects shown in the Aurora Configuration Reference, without import
statements - the Aurora config loader injects them automatically. Other than that, an .aurora
file works like any other Python script.
Aurora Configuration Reference has a full reference of all Aurora/Thermos defined Pystachio objects.
A well-structured configuration starts with structural templates (if any). Structural templates encapsulate in their attributes all the differences between Jobs in the configuration that are not directly manipulated at the Job
level, but typically at the Process
or Task
level. For example, if certain processes are invoked with slightly different settings or input.
After structural templates, define, in order, Process
es, Task
s, and Job
s.
Structural template names should be UpperCamelCased and their instantiations are typically UPPER_SNAKE_CASED. Process
, Task
, and Job
names are typically lower_snake_cased. Indentation is typically 2 spaces.
The following is a typical configuration file. Don‘t worry if there are parts you don’t understand yet, but you may want to refer back to this as you read about its individual parts. Note that names surrounded by curly braces {{}} are template variables, which the system replaces with bound values for the variables.
# --- templates here --- class Profile(Struct): package_version = Default(String, 'live') java_binary = Default(String, '/usr/lib/jvm/java-1.7.0-openjdk/bin/java') extra_jvm_options = Default(String, '') parent_environment = Default(String, 'prod') parent_serverset = Default(String, '/foocorp/service/bird/{{parent_environment}}/bird') # --- processes here --- main = Process( name = 'application', cmdline = '{{profile.java_binary}} -server -Xmx1792m ' '{{profile.extra_jvm_options}} ' '-jar application.jar ' '-upstreamService {{profile.parent_serverset}}' ) # --- tasks --- base_task = SequentialTask( name = 'application', processes = [ Process( name = 'fetch', cmdline = 'curl -O https://packages.foocorp.com/{{profile.package_version}}/application.jar'), ] ) # not always necessary but often useful to have separate task # resource classes staging_task = base_task(resources = Resources(cpu = 1.0, ram = 2048*MB, disk = 1*GB)) production_task = base_task(resources = Resources(cpu = 4.0, ram = 2560*MB, disk = 10*GB)) # --- job template --- job_template = Job( name = 'application', role = 'myteam', contact = 'myteam-team@foocorp.com', instances = 20, service = True, task = production_task ) # -- profile instantiations (if any) --- PRODUCTION = Profile() STAGING = Profile( extra_jvm_options = '-Xloggc:gc.log', parent_environment = 'staging' ) # -- job instantiations -- jobs = [ job_template(cluster = 'cluster1', environment = 'prod') .bind(profile = PRODUCTION), job_template(cluster = 'cluster2', environment = 'prod') .bind(profile = PRODUCTION), job_template(cluster = 'cluster1', environment = 'staging', service = False, task = staging_task, instances = 2) .bind(profile = STAGING), ]
Processes are handled by the Thermos system. A process is a single executable step run as a part of an Aurora task, which consists of a bash-executable statement.
The key (and required) Process
attributes are:
name
: Any string which is a valid Unix filename (no slashes, NULLs, or leading periods). The name
value must be unique relative to other Processes in a Task
.cmdline
: A command line run in a bash subshell, so you can use bash scripts. Nothing is supplied for command-line arguments, so $*
is unspecified.Many tiny processes make managing configurations more difficult. For example, the following is a bad way to define processes.
copy = Process( name = 'copy', cmdline = 'curl -O https://packages.foocorp.com/app.zip' ) unpack = Process( name = 'unpack', cmdline = 'unzip app.zip' ) remove = Process( name = 'remove', cmdline = 'rm -f app.zip' ) run = Process( name = 'app', cmdline = 'java -jar app.jar' ) run_task = Task( processes = [copy, unpack, remove, run], constraints = order(copy, unpack, remove, run) )
Since cmdline
runs in a bash subshell, you can chain commands with &&
or ||
.
When defining a Task
that is just a list of Processes run in a particular order, use SequentialTask
, as described in the Defining Task
Objects section. The following simplifies and combines the above multiple Process
definitions into just two.
stage = Process( name = 'stage', cmdline = 'curl -O https://packages.foocorp.com/app.zip && ' 'unzip app.zip && rm -f app.zip') run = Process(name = 'app', cmdline = 'java -jar app.jar') run_task = SequentialTask(processes = [stage, run])
Process
also has optional attributes to customize its behaviour. Details can be found in the Aurora Configuration Reference.
When using Aurora, you need to get your executable code into its “sandbox”, specifically the Task sandbox where the code executes for the Processes that make up that Task.
Each Task has a sandbox created when the Task starts and garbage collected when it finishes. All of a Task's processes run in its sandbox, so processes can share state by using a shared current working directory.
Typically, you save this code somewhere. You then need to define a Process in your .aurora
configuration file that fetches the code from that somewhere to where the agent can see it. For a public cloud, that can be anywhere public on the Internet, such as S3. For a private cloud internal storage, you need to put in on an accessible HDFS cluster or similar storage.
The template for this Process is:
<name> = Process( name = '<name>' cmdline = '<command to copy and extract code archive into current working directory>' )
Note: Be sure the extracted code archive has an executable.
Every time a process is forked the Thermos executor checks for the existence of the .thermos_profile
file, if the .thermos_profile
file exists it will be sourced. You can utilize this process to pass environment variables to the sandbox.
An example for this Process is:
setup_env = Process( name = 'setup', cmdline = ( 'cat <<EOF > .thermos_profile\n' 'export RESULT=hello\n' 'EOF\n' ) ) read_env = Process( name = 'read' cmdline = 'echo $RESULT' )
Tasks are handled by Mesos. A task is a collection of processes that runs in a shared sandbox. It's the fundamental unit Aurora uses to schedule the datacenter; essentially what Aurora does is find places in the cluster to run tasks.
The key (and required) parts of a Task are:
name
: A string giving the Task's name. By default, if a Task is not given a name, it inherits the first name in its Process list.
processes
: An unordered list of Process objects bound to the Task. The value of the optional constraints
attribute affects the contents as a whole. Currently, the only constraint, order
, determines if the processes run in parallel or sequentially.
resources
: A Resource
object defining the Task's resource footprint. A Resource
object has three attributes: - cpu
: A Float, the fractional number of cores the Task requires. - ram
: An Integer, RAM bytes the Task requires. - disk
: An integer, disk bytes the Task requires.
A basic Task definition looks like:
Task( name="hello_world", processes=[Process(name = "hello_world", cmdline = "echo hello world")], resources=Resources(cpu = 1.0, ram = 1*GB, disk = 1*GB))
A Task has optional attributes to customize its behaviour. Details can be found in the Aurora Configuration Reference
By default, a Task with several Processes runs them in parallel. There are two ways to run Processes sequentially:
Include an order
constraint in the Task definition‘s constraints
attribute whose arguments specify the processes’ run order:
Task( ... processes=[process1, process2, process3], constraints = order(process1, process2, process3), ...)
Use SequentialTask
instead of Task
; it automatically runs processes in the order specified in the processes
attribute. No constraint
parameter is needed:
SequentialTask( ... processes=[process1, process2, process3] ...)
For quickly creating simple tasks, use the SimpleTask
helper. It creates a basic task from a provided name and command line using a default set of resources. For example, in a .aurora
configuration file:
SimpleTask(name="hello_world", command="echo hello world")
is equivalent to
Task(name="hello_world", processes=[Process(name = "hello_world", cmdline = "echo hello world")], resources=Resources(cpu = 1.0, ram = 1*GB, disk = 1*GB))
The simplest idiomatic Job configuration thus becomes:
import os hello_world_job = Job( task=SimpleTask(name="hello_world", command="echo hello world"), role=os.getenv('USER'), cluster="cluster1")
When written to hello_world.aurora
, you invoke it with a simple aurora job create cluster1/$USER/test/hello_world hello_world.aurora
.
Tasks.concat
(synonym,concat_tasks
) and Tasks.combine
(synonym,combine_tasks
) merge multiple Task definitions into a single Task. It may be easier to define complex Jobs as smaller constituent Tasks. But since a Job only includes a single Task, the subtasks must be combined before using them in a Job. Smaller Tasks can also be reused between Jobs, instead of having to repeat their definition for multiple Jobs.
With both methods, the merged Task takes the first Task‘s name. The difference between the two is the result Task’s process ordering.
Tasks.combine
runs its subtasks' processes in no particular order. The new Task‘s resource consumption is the sum of all its subtasks’ consumption.
Tasks.concat
runs its subtasks in the order supplied, with each subtask‘s processes run serially between tasks. It is analogous to the order
constraint helper, except at the Task level instead of the Process level. The new Task’s resource consumption is the maximum value specified by any subtask for each Resource attribute (cpu, ram and disk).
For example, given the following:
setup_task = Task( ... processes=[download_interpreter, update_zookeeper], # It is important to note that {{Tasks.concat}} has # no effect on the ordering of the processes within a task; # hence the necessity of the {{order}} statement below # (otherwise, the order in which {{download_interpreter}} # and {{update_zookeeper}} run will be non-deterministic) constraints=order(download_interpreter, update_zookeeper), ... ) run_task = SequentialTask( ... processes=[download_application, start_application], ... ) combined_task = Tasks.concat(setup_task, run_task)
The Tasks.concat
command merges the two Tasks into a single Task and ensures all processes in setup_task
run before the processes in run_task
. Conceptually, the task is reduced to:
task = Task( ... processes=[download_interpreter, update_zookeeper, download_application, start_application], constraints=order(download_interpreter, update_zookeeper, download_application, start_application), ... )
In the case of Tasks.combine
, the two schedules run in parallel:
task = Task( ... processes=[download_interpreter, update_zookeeper, download_application, start_application], constraints=order(download_interpreter, update_zookeeper) + order(download_application, start_application), ... )
In the latter case, each of the two sequences may operate in parallel. Of course, this may not be the intended behavior (for example, if the start_application
Process implicitly relies upon download_interpreter
). Make sure you understand the difference between using one or the other.
A job is a group of identical tasks that Aurora can run in a Mesos cluster.
A Job
object is defined by the values of several attributes, some required and some optional. The required attributes are:
task
: Task object to bind to this job. Note that a Job can only take a single Task.
role
: Job's role account; in other words, the user account to run the job as on a Mesos cluster machine. A common value is os.getenv('USER')
; using a Python command to get the user who submits the job request. The other common value is the service account that runs the job, e.g. www-data
.
environment
: Job's environment, typical values are devel
, test
, or prod
.
cluster
: Aurora cluster to schedule the job in, defined in /etc/aurora/clusters.json
or ~/.clusters.json
. You can specify jobs where the only difference is the cluster
, then at run time only run the Job whose job key includes your desired cluster's name.
You usually see a name
parameter. By default, name
inherits its value from the Job's associated Task object, but you can override this default. For these four parameters, a Job definition might look like:
foo_job = Job( name = 'foo', cluster = 'cluster1', role = os.getenv('USER'), environment = 'prod', task = foo_task)
In addition to the required attributes, there are several optional attributes. Details can be found in the Aurora Configuration Reference.
At the end of your .aurora
file, you need to specify a list of the file's defined Jobs. For example, the following exports the jobs job1
, job2
, and job3
.
jobs = [job1, job2, job3]
This allows the aurora client to invoke commands on those jobs, such as starting, updating, or killing them.
These are provided to give a basic understanding of simple Aurora jobs.
Put the following in a file named hello_world.aurora
, substituting your own values for values such as cluster
s.
import os hello_world_process = Process(name = 'hello_world', cmdline = 'echo hello world') hello_world_task = Task( resources = Resources(cpu = 0.1, ram = 16 * MB, disk = 16 * MB), processes = [hello_world_process]) hello_world_job = Job( cluster = 'cluster1', role = os.getenv('USER'), task = hello_world_task) jobs = [hello_world_job]
Then issue the following commands to create and kill the job, using your own values for the job key.
aurora job create cluster1/$USER/test/hello_world hello_world.aurora aurora job kill cluster1/$USER/test/hello_world
Put the following in a file named hello_world_productionized.aurora
, substituting your own values for values such as cluster
s.
include('hello_world.aurora') production_resources = Resources(cpu = 1.0, ram = 512 * MB, disk = 2 * GB) staging_resources = Resources(cpu = 0.1, ram = 32 * MB, disk = 512 * MB) hello_world_template = hello_world( name = "hello_world-{{cluster}}" task = hello_world(resources=production_resources)) jobs = [ # production jobs hello_world_template(cluster = 'cluster1', instances = 25), hello_world_template(cluster = 'cluster2', instances = 15), # staging jobs hello_world_template( cluster = 'local', instances = 1, task = hello_world(resources=staging_resources)), ]
Then issue the following commands to create and kill the job, using your own values for the job key
aurora job create cluster1/$USER/test/hello_world-cluster1 hello_world_productionized.aurora aurora job kill cluster1/$USER/test/hello_world-cluster1