| Aurora + Thermos Configuration Reference |
| ======================================== |
| |
| - [Aurora + Thermos Configuration Reference](#aurora--thermos-configuration-reference) |
| - [Introduction](#introduction) |
| - [Process Schema](#process-schema) |
| - [Process Objects](#process-objects) |
| - [name](#name) |
| - [cmdline](#cmdline) |
| - [max_failures](#max_failures) |
| - [daemon](#daemon) |
| - [ephemeral](#ephemeral) |
| - [min_duration](#min_duration) |
| - [final](#final) |
| - [logger](#logger) |
| - [Task Schema](#task-schema) |
| - [Task Object](#task-object) |
| - [name](#name-1) |
| - [processes](#processes) |
| - [constraints](#constraints) |
| - [resources](#resources) |
| - [max_failures](#max_failures-1) |
| - [max_concurrency](#max_concurrency) |
| - [finalization_wait](#finalization_wait) |
| - [Constraint Object](#constraint-object) |
| - [Resource Object](#resource-object) |
| - [Job Schema](#job-schema) |
| - [Job Objects](#job-objects) |
| - [Services](#services) |
| - [Revocable Jobs](#revocable-jobs) |
| - [UpdateConfig Objects](#updateconfig-objects) |
| - [HealthCheckConfig Objects](#healthcheckconfig-objects) |
| - [Announcer Objects](#announcer-objects) |
| - [Container Objects](#container) |
| - [LifecycleConfig Objects](#lifecycleconfig-objects) |
| - [Specifying Scheduling Constraints](#specifying-scheduling-constraints) |
| - [Template Namespaces](#template-namespaces) |
| - [mesos Namespace](#mesos-namespace) |
| - [thermos Namespace](#thermos-namespace) |
| - [Basic Examples](#basic-examples) |
| - [hello_world.aurora](#hello_worldaurora) |
| - [Environment Tailoring](#environment-tailoring) |
| - [hello_world_productionized.aurora](#hello_world_productionizedaurora) |
| |
| Introduction |
| ============ |
| |
| Don't know where to start? The Aurora configuration schema is very |
| powerful, and configurations can become quite complex for advanced use |
| cases. |
| |
| For examples of simple configurations to get something up and running |
| quickly, check out the [Tutorial](/documentation/0.11.0/tutorial/). When you feel comfortable with the basics, move |
| on to the [Configuration Tutorial](/documentation/0.11.0/configuration-tutorial/) for more in-depth coverage of |
| configuration design. |
| |
| For additional basic configuration examples, see [the end of this document](#BasicExamples). |
| |
| Process Schema |
| ============== |
| |
| Process objects consist of required `name` and `cmdline` attributes. You can customize Process |
| behavior with its optional attributes. Remember, Processes are handled by Thermos. |
| |
| ### Process Objects |
| |
| **Attribute Name** | **Type** | **Description** |
| ------------------- | :---------: | --------------------------------- |
| **name** | String | Process name (Required) |
| **cmdline** | String | Command line (Required) |
| **max_failures** | Integer | Maximum process failures (Default: 1) |
| **daemon** | Boolean | When True, this is a daemon process. (Default: False) |
| **ephemeral** | Boolean | When True, this is an ephemeral process. (Default: False) |
| **min_duration** | Integer | Minimum duration between process restarts in seconds. (Default: 15) |
| **final** | Boolean | When True, this process is a finalizing one that should run last. (Default: False) |
| **logger** | Logger | Struct defining the log behavior for the process. (Default: Empty) |
| |
| #### name |
| |
| The name is any valid UNIX filename string (specifically no |
| slashes, NULLs or leading periods). Within a Task object, each Process name |
| must be unique. |
| |
| #### cmdline |
| |
| The command line run by the process. The command line is invoked in a bash |
| subshell, so can involve fully-blown bash scripts. However, nothing is |
| supplied for command-line arguments so `$*` is unspecified. |
| |
| #### max_failures |
| |
| The maximum number of failures (non-zero exit statuses) this process can |
| have before being marked permanently failed and not retried. If a |
| process permanently fails, Thermos looks at the failure limit of the task |
| containing the process (usually 1) to determine if the task has |
| failed as well. |
| |
| Setting `max_failures` to 0 makes the process retry |
| indefinitely until it achieves a successful (zero) exit status. |
| It retries at most once every `min_duration` seconds to prevent |
| an effective denial of service attack on the coordinating Thermos scheduler. |
| |
| #### daemon |
| |
| By default, Thermos processes are non-daemon. If `daemon` is set to True, a |
| successful (zero) exit status does not prevent future process runs. |
| Instead, the process reinvokes after `min_duration` seconds. |
| However, the maximum failure limit still applies. A combination of |
| `daemon=True` and `max_failures=0` causes a process to retry |
| indefinitely regardless of exit status. This should be avoided |
| for very short-lived processes because of the accumulation of |
| checkpointed state for each process run. When running in Mesos |
| specifically, `max_failures` is capped at 100. |
| |
| #### ephemeral |
| |
| By default, Thermos processes are non-ephemeral. If `ephemeral` is set to |
| True, the process' status is not used to determine if its containing task |
| has completed. For example, consider a task with a non-ephemeral |
| webserver process and an ephemeral logsaver process |
| that periodically checkpoints its log files to a centralized data store. |
| The task is considered finished once the webserver process has |
| completed, regardless of the logsaver's current status. |
| |
| #### min_duration |
| |
| Processes may succeed or fail multiple times during a single task's |
| duration. Each of these is called a *process run*. `min_duration` is |
| the minimum number of seconds the scheduler waits before running the |
| same process. |
| |
| #### final |
| |
| Processes can be grouped into two classes: ordinary processes and |
| finalizing processes. By default, Thermos processes are ordinary. They |
| run as long as the task is considered healthy (i.e., no failure |
| limits have been reached.) But once all regular Thermos processes |
| finish or the task reaches a certain failure threshold, it |
| moves into a "finalization" stage and runs all finalizing |
| processes. These are typically processes necessary for cleaning up the |
| task, such as log checkpointers, or perhaps e-mail notifications that |
| the task completed. |
| |
| Finalizing processes may not depend upon ordinary processes or |
| vice-versa, however finalizing processes may depend upon other |
| finalizing processes and otherwise run as a typical process |
| schedule. |
| |
| #### logger |
| |
| The default behavior of Thermos is to allow stderr/stdout logs to grow unbounded. In the event |
| that you have large log volume, you may want to configure Thermos to automatically rotate logs |
| after they grow to a certain size, which can prevent your job from using more than its allocated |
| disk space. |
| |
| A Logger union consists of a mode enum and a rotation policy. Rotation policies only apply to |
| loggers whose mode is `rotate`. The acceptable values for the LoggerMode enum are `standard` |
| and `rotate`. The rotation policy applies to both stderr and stdout. |
| |
| By default, all processes use the `standard` LoggerMode. |
| |
| **Attribute Name** | **Type** | **Description** |
| ------------------- | :----------: | --------------------------------- |
| **mode** | LoggerMode | Mode of the logger. (Required) |
| **rotate** | RotatePolicy | An optional rotation policy. |
| |
| A RotatePolicy describes log rotation behavior for when `mode` is set to `rotate`. It is ignored |
| otherwise. |
| |
| **Attribute Name** | **Type** | **Description** |
| ------------------- | :----------: | --------------------------------- |
| **log_size** | Integer | Maximum size (in bytes) of an individual log file. (Default: 100 MiB) |
| **backups** | Integer | The maximum number of backups to retain. (Default: 5) |
| |
| An example process configuration is as follows: |
| |
| process = Process( |
| name='process', |
| logger=Logger( |
| mode=LoggerMode('rotate'), |
| rotate=RotatePolicy(log_size=5*MB, backups=5) |
| ) |
| ) |
| |
| Task Schema |
| =========== |
| |
| Tasks fundamentally consist of a `name` and a list of Process objects stored as the |
| value of the `processes` attribute. Processes can be further constrained with |
| `constraints`. By default, `name`'s value inherits from the first Process in the |
| `processes` list, so for simple `Task` objects with one Process, `name` |
| can be omitted. In Mesos, `resources` is also required. |
| |
| ### Task Object |
| |
| **param** | **type** | **description** |
| --------- | :---------: | --------------- |
| ```name``` | String | Process name (Required) (Default: ```processes0.name```) |
| ```processes``` | List of ```Process``` objects | List of ```Process``` objects bound to this task. (Required) |
| ```constraints``` | List of ```Constraint``` objects | List of ```Constraint``` objects constraining processes. |
| ```resources``` | ```Resource``` object | Resource footprint. (Required) |
| ```max_failures``` | Integer | Maximum process failures before being considered failed (Default: 1) |
| ```max_concurrency``` | Integer | Maximum number of concurrent processes (Default: 0, unlimited concurrency.) |
| ```finalization_wait``` | Integer | Amount of time allocated for finalizing processes, in seconds. (Default: 30) |
| |
| #### name |
| `name` is a string denoting the name of this task. It defaults to the name of the first Process in |
| the list of Processes associated with the `processes` attribute. |
| |
| #### processes |
| |
| `processes` is an unordered list of `Process` objects. To constrain the order |
| in which they run, use `constraints`. |
| |
| ##### constraints |
| |
| A list of `Constraint` objects. Currently it supports only one type, |
| the `order` constraint. `order` is a list of process names |
| that should run in the order given. For example, |
| |
| process = Process(cmdline = "echo hello {{name}}") |
| task = Task(name = "echoes", |
| processes = [process(name = "jim"), process(name = "bob")], |
| constraints = [Constraint(order = ["jim", "bob"])) |
| |
| Constraints can be supplied ad-hoc and in duplicate. Not all |
| Processes need be constrained, however Tasks with cycles are |
| rejected by the Thermos scheduler. |
| |
| Use the `order` function as shorthand to generate `Constraint` lists. |
| The following: |
| |
| order(process1, process2) |
| |
| is shorthand for |
| |
| [Constraint(order = [process1.name(), process2.name()])] |
| |
| The `order` function accepts Process name strings `('foo', 'bar')` or the processes |
| themselves, e.g. `foo=Process(name='foo', ...)`, `bar=Process(name='bar', ...)`, |
| `constraints=order(foo, bar)`. |
| |
| |
| #### resources |
| |
| Takes a `Resource` object, which specifies the amounts of CPU, memory, and disk space resources |
| to allocate to the Task. |
| |
| #### max_failures |
| |
| `max_failures` is the number of failed processes needed for the `Task` to be |
| marked as failed. |
| |
| For example, assume a Task has two Processes and a `max_failures` value of `2`: |
| |
| template = Process(max_failures=10) |
| task = Task( |
| name = "fail", |
| processes = [ |
| template(name = "failing", cmdline = "exit 1"), |
| template(name = "succeeding", cmdline = "exit 0") |
| ], |
| max_failures=2) |
| |
| The `failing` Process could fail 10 times before being marked as permanently |
| failed, and the `succeeding` Process could succeed on the first run. However, |
| the task would succeed despite only allowing for two failed processes. To be more |
| specific, there would be 10 failed process runs yet 1 failed process. Both processes |
| would have to fail for the Task to fail. |
| |
| |
| |
| #### max_concurrency |
| |
| For Tasks with a number of expensive but otherwise independent |
| processes, you may want to limit the amount of concurrency |
| the Thermos scheduler provides rather than artificially constraining |
| it via `order` constraints. For example, a test framework may |
| generate a task with 100 test run processes, but wants to run it on |
| a machine with only 4 cores. You can limit the amount of parallelism to |
| 4 by setting `max_concurrency=4` in your task configuration. |
| |
| For example, the following task spawns 180 Processes ("mappers") |
| to compute individual elements of a 180 degree sine table, all dependent |
| upon one final Process ("reducer") to tabulate the results: |
| |
| def make_mapper(id): |
| return Process( |
| name = "mapper%03d" % id, |
| cmdline = "echo 'scale=50;s(%d\*4\*a(1)/180)' | bc -l > |
| temp.sine_table.%03d" % (id, id)) |
| |
| def make_reducer(): |
| return Process(name = "reducer", cmdline = "cat temp.\* | nl \> sine\_table.txt |
| && rm -f temp.\*") |
| |
| processes = map(make_mapper, range(180)) |
| |
| task = Task( |
| name = "mapreduce", |
| processes = processes + [make\_reducer()], |
| constraints = [Constraint(order = [mapper.name(), 'reducer']) for mapper |
| in processes], |
| max_concurrency = 8) |
| |
| #### finalization_wait |
| |
| Tasks have three active stages: `ACTIVE`, `CLEANING`, and `FINALIZING`. The |
| `ACTIVE` stage is when ordinary processes run. This stage lasts as |
| long as Processes are running and the Task is healthy. The moment either |
| all Processes have finished successfully or the Task has reached a |
| maximum Process failure limit, it goes into `CLEANING` stage and send |
| SIGTERMs to all currently running Processes and their process trees. |
| Once all Processes have terminated, the Task goes into `FINALIZING` stage |
| and invokes the schedule of all Processes with the "final" attribute set to True. |
| |
| This whole process from the end of `ACTIVE` stage to the end of `FINALIZING` |
| must happen within `finalization_wait` seconds. If it does not |
| finish during that time, all remaining Processes are sent SIGKILLs |
| (or if they depend upon uncompleted Processes, are |
| never invoked.) |
| |
| Client applications with higher priority may force a shorter |
| finalization wait (e.g. through parameters to `thermos kill`), so this |
| is mostly a best-effort signal. |
| |
| |
| ### Constraint Object |
| |
| Current constraint objects only support a single ordering constraint, `order`, |
| which specifies its processes run sequentially in the order given. By |
| default, all processes run in parallel when bound to a `Task` without |
| ordering constraints. |
| |
| param | type | description |
| ----- | :----: | ----------- |
| order | List of String | List of processes by name (String) that should be run serially. |
| |
| ### Resource Object |
| |
| Specifies the amount of CPU, Ram, and disk resources the task needs. See the |
| [Resource Isolation document](/documentation/0.11.0/resources/) for suggested values and to understand how |
| resources are allocated. |
| |
| param | type | description |
| ----- | :----: | ----------- |
| ```cpu``` | Float | Fractional number of cores required by the task. |
| ```ram``` | Integer | Bytes of RAM required by the task. |
| ```disk``` | Integer | Bytes of disk required by the task. |
| |
| |
| Job Schema |
| ========== |
| |
| ### Job Objects |
| |
| name | type | description |
| ------ | :-------: | ------- |
| ```task``` | Task | The Task object to bind to this job. Required. |
| ```name``` | String | Job name. (Default: inherited from the task attribute's name) |
| ```role``` | String | Job role account. Required. |
| ```cluster``` | String | Cluster in which this job is scheduled. Required. |
| ```environment``` | String | Job environment, default ```devel```. Must be one of ```prod```, ```devel```, ```test``` or ```staging<number>```. |
| ```contact``` | String | Best email address to reach the owner of the job. For production jobs, this is usually a team mailing list. |
| ```instances```| Integer | Number of instances (sometimes referred to as replicas or shards) of the task to create. (Default: 1) |
| ```cron_schedule``` | String | Cron schedule in cron format. May only be used with non-service jobs. See [Cron Jobs](/documentation/0.11.0/cron-jobs/) for more information. Default: None (not a cron job.) |
| ```cron_collision_policy``` | String | Policy to use when a cron job is triggered while a previous run is still active. KILL_EXISTING Kill the previous run, and schedule the new run CANCEL_NEW Let the previous run continue, and cancel the new run. (Default: KILL_EXISTING) |
| ```update_config``` | ```UpdateConfig``` object | Parameters for controlling the rate and policy of rolling updates. |
| ```constraints``` | dict | Scheduling constraints for the tasks. See the section on the [constraint specification language](#Specifying-Scheduling-Constraints) |
| ```service``` | Boolean | If True, restart tasks regardless of success or failure. (Default: False) |
| ```max_task_failures``` | Integer | Maximum number of failures after which the task is considered to have failed (Default: 1) Set to -1 to allow for infinite failures |
| ```priority``` | Integer | Preemption priority to give the task (Default 0). Tasks with higher priorities may preempt tasks at lower priorities. |
| ```production``` | Boolean | Whether or not this is a production task that may [preempt](/documentation/0.11.0/resources/#task-preemption) other tasks (Default: False). Production job role must have the appropriate [quota](/documentation/0.11.0/resources/#resource-quota). |
| ```health_check_config``` | ```HealthCheckConfig``` object | Parameters for controlling a task's health checks. HTTP health check is only used if a health port was assigned with a command line wildcard. |
| ```container``` | ```Container``` object | An optional container to run all processes inside of. |
| ```lifecycle``` | ```LifecycleConfig``` object | An optional task lifecycle configuration that dictates commands to be executed on startup/teardown. HTTP lifecycle is enabled by default if the "health" port is requested. See [LifecycleConfig Objects](#lifecycleconfig-objects) for more information. |
| ```tier``` | String | Task tier type. When set to `revocable` requires the task to run with Mesos revocable resources. This is work [in progress](https://issues.apache.org/jira/browse/AURORA-1343) and is currently only supported for the revocable tasks. The ultimate goal is to simplify task configuration by hiding various configuration knobs behind a task tier definition. See AURORA-1343 and AURORA-1443 for more details. |
| |
| ### Services |
| |
| Jobs with the `service` flag set to True are called Services. The `Service` |
| alias can be used as shorthand for `Job` with `service=True`. |
| Services are differentiated from non-service Jobs in that tasks |
| always restart on completion, whether successful or unsuccessful. |
| Jobs without the service bit set only restart up to |
| `max_task_failures` times and only if they terminated unsuccessfully |
| either due to human error or machine failure. |
| |
| ### Revocable Jobs |
| |
| **WARNING**: This feature is currently in alpha status. Do not use it in production clusters! |
| |
| Mesos [supports a concept of revocable tasks](http://mesos.apache.org/documentation/latest/oversubscription/) |
| by oversubscribing machine resources by the amount deemed safe to not affect the existing |
| non-revocable tasks. Aurora now supports revocable jobs via a `tier` setting set to `revocable` |
| value. |
| |
| More implementation details in this [ticket](https://issues.apache.org/jira/browse/AURORA-1343). |
| |
| Scheduler must be [configured](/documentation/0.11.0/deploying-aurora-scheduler/#configuring-resource-oversubscription) |
| to receive revocable offers from Mesos and accept revocable jobs. If not configured properly |
| revocable tasks will never get assigned to hosts and will stay in PENDING. |
| |
| ### UpdateConfig Objects |
| |
| Parameters for controlling the rate and policy of rolling updates. |
| |
| | object | type | description |
| | ---------------------------- | :------: | ------------ |
| | ```batch_size``` | Integer | Maximum number of shards to be updated in one iteration (Default: 1) |
| | ```restart_threshold``` | Integer | Maximum number of seconds before a shard must move into the ```RUNNING``` state before considered a failure (Default: 60) |
| | ```watch_secs``` | Integer | Minimum number of seconds a shard must remain in ```RUNNING``` state before considered a success (Default: 45) |
| | ```max_per_shard_failures``` | Integer | Maximum number of restarts per shard during update. Increments total failure count when this limit is exceeded. (Default: 0) |
| | ```max_total_failures``` | Integer | Maximum number of shard failures to be tolerated in total during an update. Cannot be greater than or equal to the total number of tasks in a job. (Default: 0) |
| | ```rollback_on_failure``` | boolean | When False, prevents auto rollback of a failed update (Default: True) |
| | ```wait_for_batch_completion```| boolean | When True, all threads from a given batch will be blocked from picking up new instances until the entire batch is updated. This essentially simulates the legacy sequential updater algorithm. (Default: False) |
| | ```pulse_interval_secs``` | Integer | Indicates a [coordinated update](/documentation/0.11.0/client-commands/#coordinated-job-updates). If no pulses are received within the provided interval the update will be blocked. Beta-updater only. Will fail on submission when used with client updater. (Default: None) |
| |
| ### HealthCheckConfig Objects |
| |
| *Note: ```endpoint```, ```expected_response``` and ```expected_response_code``` are deprecated from ```HealthCheckConfig``` and must be definied in ```HttpHealthChecker```.* |
| |
| Parameters for controlling a task's health checks via HTTP or a shell command. |
| |
| | param | type | description |
| | ------- | :-------: | -------- |
| | *```endpoint```* | String | HTTP endpoint to check (Default: /health) **Deprecated.** |
| | *```expected_response```* | String | If not empty, fail the HTTP health check if the response differs. Case insensitive. (Default: ok) **Deprecated.** |
| | *```expected_response_code```* | Integer | If not zero, fail the HTTP health check if the response code differs. (Default: 0) **Deprecated.** |
| | ```health_checker``` | HealthCheckerConfig | Configure what kind of health check to use. |
| | ```initial_interval_secs``` | Integer | Initial delay for performing a health check. (Default: 15) |
| | ```interval_secs``` | Integer | Interval on which to check the task's health. (Default: 10) |
| | ```max_consecutive_failures``` | Integer | Maximum number of consecutive failures that will be tolerated before considering a task unhealthy (Default: 0) |
| | ```timeout_secs``` | Integer | Health check timeout. (Default: 1) |
| |
| ### HealthCheckerConfig Objects |
| | param | type | description |
| | ------- | :-------: | -------- |
| | ```http``` | HttpHealthChecker | Configure health check to use HTTP. (Default) |
| | ```shell``` | ShellHealthChecker | Configure health check via a shell command. |
| |
| |
| ### HttpHealthChecker Objects |
| | param | type | description |
| | ------- | :-------: | -------- |
| | ```endpoint``` | String | HTTP endpoint to check (Default: /health) |
| | ```expected_response``` | String | If not empty, fail the HTTP health check if the response differs. Case insensitive. (Default: ok) |
| | ```expected_response_code``` | Integer | If not zero, fail the HTTP health check if the response code differs. (Default: 0) |
| |
| ### ShellHealthChecker Objects |
| | param | type | description |
| | ------- | :-------: | -------- |
| | ```shell_command``` | String | An alternative to HTTP health checking. Specifies a shell command that will be executed. Any non-zero exit status will be interpreted as a health check failure. |
| |
| |
| ### Announcer Objects |
| |
| If the `announce` field in the Job configuration is set, each task will be |
| registered in the ServerSet `/aurora/role/environment/jobname` in the |
| zookeeper ensemble configured by the executor. If no Announcer object is specified, |
| no announcement will take place. For more information about ServerSets, see the [User Guide](/documentation/0.11.0/user-guide/). |
| |
| | object | type | description |
| | ------- | :-------: | -------- |
| | ```primary_port``` | String | Which named port to register as the primary endpoint in the ServerSet (Default: `http`) |
| | ```portmap``` | dict | A mapping of additional endpoints to announced in the ServerSet (Default: `{ 'aurora': '{{primary_port}}' }`) |
| |
| ### Port aliasing with the Announcer `portmap` |
| |
| The primary endpoint registered in the ServerSet is the one allocated to the port |
| specified by the `primary_port` in the `Announcer` object, by default |
| the `http` port. This port can be referenced from anywhere within a configuration |
| as `{{thermos.ports[http]}}`. |
| |
| Without the port map, each named port would be allocated a unique port number. |
| The `portmap` allows two different named ports to be aliased together. The default |
| `portmap` aliases the `aurora` port (i.e. `{{thermos.ports[aurora]}}`) to |
| the `http` port. Even though the two ports can be referenced independently, |
| only one port is allocated by Mesos. Any port referenced in a `Process` object |
| but which is not in the portmap will be allocated dynamically by Mesos and announced as well. |
| |
| It is possible to use the portmap to alias names to static port numbers, e.g. |
| `{'http': 80, 'https': 443, 'aurora': 'http'}`. In this case, referencing |
| `{{thermos.ports[aurora]}}` would look up `{{thermos.ports[http]}}` then |
| find a static port 80. No port would be requested of or allocated by Mesos. |
| |
| Static ports should be used cautiously as Aurora does nothing to prevent two |
| tasks with the same static port allocations from being co-scheduled. |
| External constraints such as slave attributes should be used to enforce such |
| guarantees should they be needed. |
| |
| ### Container Object |
| |
| *Note: The only container type currently supported is "docker". Docker support is currently EXPERIMENTAL.* |
| *Note: In order to correctly execute processes inside a job, the Docker container must have python 2.7 installed.* |
| |
| Describes the container the job's processes will run inside. |
| |
| param | type | description |
| ----- | :----: | ----------- |
| ```docker``` | Docker | A docker container to use. |
| |
| ### Docker Object |
| |
| param | type | description |
| ----- | :----: | ----------- |
| ```image``` | String | The name of the docker image to execute. If the image does not exist locally it will be pulled with ```docker pull```. |
| ```parameters``` | List(Parameter) | Additional parameters to pass to the docker containerizer. |
| |
| ### Docker Parameter Object |
| |
| Docker CLI parameters. This needs to be enabled by the scheduler `enable_docker_parameters` option. |
| See [Docker Command Line Reference](https://docs.docker.com/reference/commandline/run/) for valid parameters. |
| |
| param | type | description |
| ----- | :----: | ----------- |
| ```name``` | String | The name of the docker parameter. E.g. volume |
| ```value``` | String | The value of the parameter. E.g. /usr/local/bin:/usr/bin:rw |
| |
| ### LifecycleConfig Objects |
| |
| *Note: The only lifecycle configuration supported is the HTTP lifecycle via the HTTPLifecycleConfig.* |
| |
| param | type | description |
| ----- | :----: | ----------- |
| ```http``` | HTTPLifecycleConfig | Configure the lifecycle manager to send lifecycle commands to the task via HTTP. |
| |
| ### HTTPLifecycleConfig Objects |
| |
| param | type | description |
| ----- | :----: | ----------- |
| ```port``` | String | The named port to send POST commands (Default: health) |
| ```graceful_shutdown_endpoint``` | String | Endpoint to hit to indicate that a task should gracefully shutdown. (Default: /quitquitquit) |
| ```shutdown_endpoint``` | String | Endpoint to hit to give a task its final warning before being killed. (Default: /abortabortabort) |
| |
| #### graceful_shutdown_endpoint |
| |
| If the Job is listening on the port as specified by the HTTPLifecycleConfig |
| (default: `health`), a HTTP POST request will be sent over localhost to this |
| endpoint to request that the task gracefully shut itself down. This is a |
| courtesy call before the `shutdown_endpoint` is invoked a fixed amount of |
| time later. |
| |
| #### shutdown_endpoint |
| |
| If the Job is listening on the port as specified by the HTTPLifecycleConfig |
| (default: `health`), a HTTP POST request will be sent over localhost to this |
| endpoint to request as a final warning before being shut down. If the task |
| does not shut down on its own after this, it will be forcefully killed |
| |
| |
| Specifying Scheduling Constraints |
| ================================= |
| |
| In the `Job` object there is a map `constraints` from String to String |
| allowing the user to tailor the schedulability of tasks within the job. |
| |
| Each slave in the cluster is assigned a set of string-valued |
| key/value pairs called attributes. For example, consider the host |
| `cluster1-aaa-03-sr2` and its following attributes (given in key:value |
| format): `host:cluster1-aaa-03-sr2` and `rack:aaa`. |
| |
| The constraint map's key value is the attribute name in which we |
| constrain Tasks within our Job. The value is how we constrain them. |
| There are two types of constraints: *limit constraints* and *value |
| constraints*. |
| |
| | constraint | description |
| | ------------- | -------------- |
| | Limit | A string that specifies a limit for a constraint. Starts with <code>'limit:</code> followed by an Integer and closing single quote, such as ```'limit:1'```. |
| | Value | A string that specifies a value for a constraint. To include a list of values, separate the values using commas. To negate the values of a constraint, start with a ```!``` ```.``` |
| |
| You can also control machine diversity using constraints. The below |
| constraint ensures that no more than two instances of your job may run |
| on a single host. Think of this as a "group by" limit. |
| |
| constraints = { |
| 'host': 'limit:2', |
| } |
| |
| Likewise, you can use constraints to control rack diversity, e.g. at |
| most one task per rack: |
| |
| constraints = { |
| 'rack': 'limit:1', |
| } |
| |
| Use these constraints sparingly as they can dramatically reduce Tasks' schedulability. |
| |
| Template Namespaces |
| =================== |
| |
| Currently, a few Pystachio namespaces have special semantics. Using them |
| in your configuration allow you to tailor application behavior |
| through environment introspection or interact in special ways with the |
| Aurora client or Aurora-provided services. |
| |
| ### mesos Namespace |
| |
| The `mesos` namespace contains variables which relate to the `mesos` slave |
| which launched the task. The `instance` variable can be used |
| to distinguish between Task replicas. |
| |
| | variable name | type | description |
| | --------------- | :--------: | ------------- |
| | ```instance``` | Integer | The instance number of the created task. A job with 5 replicas has instance numbers 0, 1, 2, 3, and 4. |
| | ```hostname``` | String | The instance hostname that the task was launched on. |
| |
| ### thermos Namespace |
| |
| The `thermos` namespace contains variables that work directly on the |
| Thermos platform in addition to Aurora. This namespace is fully |
| compatible with Tasks invoked via the `thermos` CLI. |
| |
| | variable | type | description | |
| | :----------: | --------- | ------------ | |
| | ```ports``` | map of string to Integer | A map of names to port numbers | |
| | ```task_id``` | string | The task ID assigned to this task. | |
| |
| The `thermos.ports` namespace is automatically populated by Aurora when |
| invoking tasks on Mesos. When running the `thermos` command directly, |
| these ports must be explicitly mapped with the `-P` option. |
| |
| For example, if '{{`thermos.ports[http]`}}' is specified in a `Process` |
| configuration, it is automatically extracted and auto-populated by |
| Aurora, but must be specified with, for example, `thermos -P http:12345` |
| to map `http` to port 12345 when running via the CLI. |
| |
| Basic Examples |
| ============== |
| |
| These are provided to give a basic understanding of simple Aurora jobs. |
| |
| ### hello_world.aurora |
| |
| Put the following in a file named `hello_world.aurora`, substituting your own values |
| for values such as `cluster`s. |
| |
| import os |
| hello_world_process = Process(name = 'hello_world', cmdline = 'echo hello world') |
| |
| hello_world_task = Task( |
| resources = Resources(cpu = 0.1, ram = 16 * MB, disk = 16 * MB), |
| processes = [hello_world_process]) |
| |
| hello_world_job = Job( |
| cluster = 'cluster1', |
| role = os.getenv('USER'), |
| task = hello_world_task) |
| |
| jobs = [hello_world_job] |
| |
| Then issue the following commands to create and kill the job, using your own values for the job key. |
| |
| aurora job create cluster1/$USER/test/hello_world hello_world.aurora |
| |
| aurora job kill cluster1/$USER/test/hello_world |
| |
| ### Environment Tailoring |
| |
| #### hello_world_productionized.aurora |
| |
| Put the following in a file named `hello_world_productionized.aurora`, substituting your own values |
| for values such as `cluster`s. |
| |
| include('hello_world.aurora') |
| |
| production_resources = Resources(cpu = 1.0, ram = 512 * MB, disk = 2 * GB) |
| staging_resources = Resources(cpu = 0.1, ram = 32 * MB, disk = 512 * MB) |
| hello_world_template = hello_world( |
| name = "hello_world-{{cluster}}" |
| task = hello_world(resources=production_resources)) |
| |
| jobs = [ |
| # production jobs |
| hello_world_template(cluster = 'cluster1', instances = 25), |
| hello_world_template(cluster = 'cluster2', instances = 15), |
| |
| # staging jobs |
| hello_world_template( |
| cluster = 'local', |
| instances = 1, |
| task = hello_world(resources=staging_resources)), |
| ] |
| |
| Then issue the following commands to create and kill the job, using your own values for the job key |
| |
| aurora job create cluster1/$USER/test/hello_world-cluster1 hello_world_productionized.aurora |
| |
| aurora job kill cluster1/$USER/test/hello_world-cluster1 |