-partition_aware
that is disabled by default. When Mesos support is improved and the new behavior is vetted in production clusters, we'll enable this by default.-offer_set_module
scheduler flag. To take advantage of this feature, you will need to implement the OfferSet
interface.executor_config
field to the Job object of the DSL which will populate JobConfiguration.TaskConfig.ExecutorConfig
. This allows for using custom executors defined through the --custom_executor_config
scheduler flag. See our custom-executors documentation for more information.--enable_mesos_disk_collector
flag, in which case Observer will use the agent's containers HTTP API to query the amount of used bytes for each container. Note that disk isolation should be enabled in Mesos agent. This feature is not compatible with authentication enabled agents.numCpus
, ramMb
, diskMb
, requestedPorts
).-offer_order_modules
scheduler flag related to custom injectable offer orderings, since this will now be subsumed under custom OfferSet
implementations (see the comment above):graceful_shutdown_wait_secs
and shutdown_wait_secs
fields in HttpLifecycleConfig
respectively. Previously, the executor would only wait 5 seconds between steps (adding up to a total of 10 seconds as there are 2 steps). The overall waiting period is bounded by the executor‘s stop timeout, which can be configured using the executor’s stop_timeout_in_secs
flag.thrift_method_interceptor_modules
scheduler flag that lets cluster operators inject custom Thrift method interceptors.-zk_connection_timeout
to control the connection timeout of ZooKeeper connections.-hold_offers_forever
, suitable for use in clusters where Aurora is the only framework. This setting disables other options such as -min_offer_hold_time
, and allows the scheduler to more efficiently cache scheduling attempts.-zk_use_curator
, removing the choice to use the legacy ZooKeeper client.rewriteConfigs
thrift API call in the scheduler. This was a last-ditch mechanism to modify scheduler state on the fly. It was considered extremely risky to use since its inception, and is safer to abandon due to its lack of use and likelihood for code rot.allowed_job_environments
option. By default allowing any of devel
, test
, production
, and any value matching the regular expression staging[0-9]*
.-use_beta_db_task_store
-enable_db_metrics
-slow_query_log_threshold
-db_row_gc_interval
-db_lock_timeout
-db_max_active_connection_count
-db_max_idle_connection_count
-snapshot_hydrate_stores
-enable_h2_console
killTasks
RPC.prune_tasks
endpoint to aurora_admin
. See aurora_admin prune_tasks -h
for usage information.-mesos_driver
flag to the scheduler with three possible options: SCHEDULER_DRIVER
, V0_MESOS
, V1_MESOS
. The first uses the original driver and the latter two use two new drivers from libmesos
. V0_MESOS
uses the SCHEDULER_DRIVER
under the hood and V1_MESOS
uses a new HTTP API aware driver. Users that want to use the HTTP API should use V1_MESOS
. Performance sensitive users should stick with the SCHEDULER_DRIVER
or V0_MESOS
drivers.task scp
command to the CLI client for easy transferring of files to/from/between task instances. See here for details. Currently only fully supported for Mesos containers (you can copy files from the Docker container sandbox but you cannot send files to it).task_assigner_modules
and preemption_slot_finder_modules
options.CPU
, MEMORY
, DISK
, RANDOM
or REVOCABLE_CPU
. You can also compose secondary sorts by combining orders together: e.g. to bin-pack by CPU and MEMORY you could supply ‘CPU,MEMORY’. The current default is RANDOM
, which has the strong advantage that users can (usually) relocate their tasks due to noisy neighbors or machine issues with a task restart. When you have deterministic bin-packing, they may always end up on the same agent. So be careful enabling this without proper monitoring and remediation of host failures.--snapshot_hydrate_stores
that controls which H2-backed stores to write fully hydrated into the Scheduler snapshot. Can lead to significantly lower snapshot times for large clusters if you set this flag to an empty list. Old behavior is preserved by default, but see org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl for which stores we currently have duplicate writes for.TaskInfo
proto.--ip
option to bind the Thermos observer to a specific rather than all interfaces.-enable_revocable_ram
.-allow_container_volumes
flag.{{docker.image[name][tag]}}
binder that can be used in the Aurora job configuration to resolve a docker image specified by its name:tag
to a concrete identifier specified by its registry/name@digest
. It requires version 2 of the Docker Registry.RUNNING
state to indicate that the task is healthy and behaving as expected. Job updates can now rely purely on health checks rather than watch_secs
timeout when deciding an individial instance update state, by setting watch_secs
to 0. A service will remain in STARTING
state util min_consecutive_successes
consecutive health checks have passed.-serverset_endpoint_name=https
you can ensure the Aurora client will correctly discover HTTPS support via the ZooKeeper-based discovery mechanism.production
attribute in Job
thrift struct. The scheduler is queried for tier configurations and the user's choice of tier
and production
attributes is revised, if necessary. If tier
is already set, the production
attribute might be adjusted to match the tier
selection. Otherwise, tier
is selected based on the value of production
attribute. If a matching tier is not found, the default
tier from tier configuration file (tiers.json
) is used./offers
endpoint has been modified to display attributes of resource offers as received from Mesos. This has affected rendering of some of the existing attributes. Furthermore, it now dumps additional offer attributes including reservations and persistent volumes.Content-Type
header, or a Content-Type
header of application/x-thrift
or application/json
or application/vnd.apache.thrift.json
the request is treated as thrift JSON. If a request is sent with a Content-Type
header of application/vnd.apache.thrift.binary
the request is treated as binary thrift. If the Accept
header of the request is application/vnd.apache.thrift.binary
then the response will be binary thrift. Any other value for Accept
will result in thrift JSON.-custom_executor_config
flag must point to a JSON file which contains at least one valid executor configuration as detailed in the configuration documentation.-zk_use_curator
now defaults to true
and care should be taken when upgrading from a configuration that does not pass the flag. The scheduler upgrade should be performed by bringing all schedulers down, and then bringing upgraded schedulers up. A rolling upgrade would result in no leading scheduler for the duration of the roll which could be confusing to monitor and debug.aurora_admin reconcile_tasks
is now available on the Aurora admin client that can trigger implicit and explicit task reconciliations.-enable_revocable_ram
.-framework_name
to ‘Aurora’.production
is now deprecated. To achieve the same scheduling behavior that production=true
used to provide, users should elect a tier
for the job with attributes preemptible=false
and revocable=false
. For example, the preferred
tier in the default tier configuration file (tiers.json
) matches the above criteria.ExecutorInfo.source
field is deprecated and has been replaced with a label named source
. It will be removed from Mesos in a future release.-zk_use_curator
has been deprecated. If you have never set the flag and are upgrading you should take care as described in the note above.key
argument of getJobUpdateDetails
has been deprecated. Use the query
argument instead.aurora job restart
has been removed.Upgraded Mesos to 0.27.2
Added a new optional Apache Curator backend for performing scheduler leader election. You can enable this with the new -zk_use_curator
scheduler argument.
Adding --nosetuid-health-checks flag to control whether the executor runs health checks as the job‘s role’s user.
New scheduler command line argument -offer_filter_duration
to control the time after which we expect Mesos to re-offer unused resources. A short duration improves scheduling performance in smaller clusters, but might lead to resource starvation for other frameworks if you run multiple ones in your cluster. Uses the Mesos default of 5s.
New scheduler command line option -framework_name
to change the name used for registering the Aurora framework with Mesos. The current default value is ‘TwitterScheduler’.
Added experimental support for launching tasks using filesystem images and the Mesos unified containerizer. See that linked documentation for details on configuring Mesos to use the unified containerizer. Note that earlier versions of Mesos do not fully support the unified containerizer. Mesos 0.28.x or later is recommended for anyone adopting task images via the Mesos containerizer.
Upgraded to pystachio 0.8.1 to pick up support for the new Choice type.
The container
property of a Job
is now a Choice of either a Container
holder, or a direct reference to either a Docker
or Mesos
container.
New scheduler command line argument -ip
to control what ip address to bind the schedulers http server to.
Added experimental support for Mesos GPU resource. This feature will be available in Mesos 1.0 and is disabled by default. Use -allow_gpu_resource
flag to enable it.
IMPORTANT: once this feature is enabled, creating jobs with GPU resource will make scheduler snapshot backwards incompatible. Scheduler will be unable to read snapshot if rolled back to previous version. If rollback is absolutely necessary, perform the following steps:
-allow_gpu_resource
to false-history_prune_threshold=1mins
and -history_max_per_job_threshold=0
/h2console
endpoint or reduce job update pruning thresholds, e.g.: -job_update_history_pruning_threshold=1mins
and -job_update_history_per_job_threshold=0
aurora_admin scheduler_snapshot <cluster>
Experimental support for a webhook feature which POSTs all task state changes to a user defined endpoint.
Added support for specifying the default tier name in tier configuration file (tiers.json
). The default
property is required and is initialized with the preemptible
tier (preemptible
tier tasks can be preempted but their resources cannot be revoked).
--restart-threshold
option in the aurora job restart
command to match the job updater behavior. This option has no effect now and will be removed in the future release.-framework_name
default argument ‘TwitterScheduler’. In a future release this will change to ‘aurora’. Please be aware that depending on your usage of Mesos, this will be a backward incompatible change. For details, see MESOS-703.-thermos_observer_root
command line arg has been removed from the scheduler. This was a relic from the time when executor checkpoints were written globally, rather than into a task's sandbox.container
property of a Job
to a Container
holder is deprecated in favor of setting it directly to the appropriate (i.e. Docker
or Mesos
) container type.numCpus
, ramMb
and diskMb
fields in TaskConfig
and ResourceAggregate
thrift structs. Use set<Resource> resources
to specify task resources or quota values./slaves
is deprecated. Please use /agents
instead.production
field in TaskConfig
thrift struct. Use tier
field to specify task scheduling and resource handling behavior.resources_*_ram_gb
and resources_*_disk_gb
metrics have been renamed to resources_*_ram_mb
and resources_*_disk_mb
respectively. Note the unit change: GB -> MB.aurora job add
client command to scale out an existing job.--announcer-hostname
to thermos executor to override hostname in service registry endpoint. See here for details.-thermos_home_in_sandbox
to the scheduler for optionally changing HOME to the sandbox during thermos executor/runner execution. This is useful in cases where the root filesystem inside of the container is read-only, as it moves PEX extraction into the sandbox. See here for more detail.-require_docker_use_executor
that indicates whether the scheduler should accept tasks that use the Docker containerizer without an executor (experimental).--populate_discovery_info
. If set to true, Aurora will start to populate DiscoveryInfo field on TaskInfo of Mesos. This could be used for alternative service discovery solution like Mesos-DNS.Identity.role
TaskConfig.environment
TaskConfig.jobName
TaskQuery.owner
AddInstancesConfig
parameter to addInstances
RPC.-announcer-enable
, which was a no-op in 0.12.0.acquireLock
releaseLock
getLocks
Lock
parameters to RPCscreateJob
scheduleCronJob
descheduleCronJob
restartShards
killTasks
addInstances
replaceCronTemplate
-mesos_role
to Aurora scheduler at start time. This enables resource reservation for Aurora when running in a shared Mesos cluster.org.apache.aurora.metadata.
to prevent clashes with other, external label sources.-default_docker_parameters
to allow a cluster operator to specify a universal set of parameters that should be used for every container that does not have parameters explicitly configured at the job level.--preserve_env
to thermos.--read-json
, aurora can now load multiple jobs from one json file, similar to the usual pystachio structure: {"jobs": [job1, job2, ...]}
. The older single-job json format is also still supported.aurora config list
command now supports --read-json
-shiro_after_auth_filter
. Optionally specify a class implementing javax.servlet.Filter that will be included in the Filter chain following the Shiro auth filters.addInstances
thrift RPC does now increase job instance count (scale out) based on the task template pointed by instance key
.AddInstancesConfig
argument in addInstances
thrift RPC.TaskQuery
argument in killTasks
thrift RPC to disallow killing tasks across multiple roles. The new safer approach is using JobKey
with instances
instead.HealthCheckConfig
client-side configuration fields: endpoint
, expected_response
, expected_response_code
. These are now set exclusively in like-named fields of HttpHealthChecker.
aurora job restart --restart-threshold=[seconds]
.--announcer-enable
. Enabling the announcer previously required both flags --announcer-enable
and --announcer-ensemble
, but now only --announcer-ensemble
must be set. --announcer-enable
is a no-op flag now and will be removed in future version.-enable_cors_support
. Enabling CORS is now implicit by setting the argument -enable_cors_for
.-deduplicate_snapshots
and -deflate_snapshots
. These features are good to always enable.-enable_job_updates
and -enable_job_creation
-extra_modules
-logtostderr
, -alsologtostderr
, -vlog
, -vmodule
, and use_glog_formatter
. Removed in favor of the new logback configuration.HealthCheckConfig
schema has been restructured to more cleanly allow configuring varied health checkers.--custom_executor_config
which will override all other the command line arguments and default values pertaining to the executor.HealthCheckConfig
are now deprecated: endpoint
, expected_response
, expected_response_code
in favor of setting them as part of an HttpHealthChecker.
apache.thermos
package has been removed.apache.gen.aurora
package has been renamed to apache.aurora.thrift
.apache.gen.thermos
package has been renamed to apache.thermos.thrift
.apache.thermos.runner
package has been introduced, providing the thermos_runner
binary.apache.aurora.kerberos
package has been introduced, containing the Kerberos-supporting versions of aurora
and aurora_admin
(kaurora
and kaurora_admin
).src/main
have been removed, see here for details.--root
option from the observer.ConfigGroup.instanceIds
field has been deprecated. Use ConfigGroup.instances instead.SessionValidator
and CapabilityValidator
interfaces have been removed. All SessionKey
-typed arguments are now nullable and ignored by the scheduler Thrift API.-enable_legacy_constraints
has been removed, and the scheduler no longer automatically injects host
and rack
constraints for production services. (AURORA-1074)...nonprod_ms
to ...ms_nonprod
(AURORA-1350).--mesos-root
This must point to the same path as --work_dir
on the mesos slave.src/main/python/apache/aurora/tools:thermos
src/main/python/apache/aurora/tools:thermos_observer