blob: 98dc04467252e57fee2a7391e95aaf6edfb38d0d [file] [log] [blame]
Release Notes - Mesos - Version 1.11.0
-------------------------------------------
This release contains the following highlights:
* Mesos Containerizer now supports using pre-provisioned external CSI storage
volumes by means of the new `volume/csi` isolator; the latter significantly
extends the range of compatible 3rd party CSI plugins compared to the
already existing SLRP-based solution (MESOS-10141).
* The Scheduler API adds an interface allowing frameworks to put constraints
on agent attributes in resource offers to help "picky" frameworks
significantly reduce scheduling latency when close to being out of quota
(MESOS-10161).
* The CMake build becomes usable for deploying in production (MESOS-898).
Additional API Changes:
* **Breaking change** Deprecated authentication credential text format support.
Unresolved Critical Issues:
* [MESOS-10194] - Mesos master failure "Check failed: 'get_(role)' Must be SOME"
* [MESOS-10186] - Segmentation fault while running mesos in SSL mode
* [MESOS-10146] - Removing task from slave when framework is disconnected causes master to crash
* [MESOS-10066] - mesos-docker-executor process dies when agent stops. Recovery fails when agent returns
* [MESOS-10011] - Operation feedback with stale agent ID crashes the master
* [MESOS-9967] - Authorization header is missing when using a default registry
* [MESOS-9579] - ExecutorHttpApiTest.HeartbeatCalls is flaky.
* [MESOS-9536] - Nested container launched with non-root user may not be able to write to its sandbox via the environment variable `MESOS_SANDBOX`
* [MESOS-9500] - spark submit with docker image on mesos cluster fails.
* [MESOS-9426] - ZK master detection can become forever pending.
* [MESOS-9393] - Fetcher crashes extracting archives with non-ASCII filenames.
* [MESOS-9365] - Windows - GET_CONTAINERS API call causes the Mesos agent to fail
* [MESOS-9355] - Persistence volume does not unmount correctly with wrong artifact URI
* [MESOS-9352] - Data in persistent volume deleted accidentally when using Docker container and Persistent volume
* [MESOS-9053] - Network ports isolator can falsely trigger while destroying containers.
* [MESOS-9006] - The agent's GET_AGENT leaks resource information when using authorization
* [MESOS-8840] - `cpu.cfs_quota_us` may be accidentally set for command task using docker during agent recovery.
* [MESOS-8803] - Libprocess deadlocks in a test.
* [MESOS-8679] - "If the first KILL stuck in the default executor, all other KILLs will be ignored."
* [MESOS-8608] - RmdirContinueOnErrorTest.RemoveWithContinueOnError fails.
* [MESOS-8257] - "Unified Containerizer ""leaks"" a target container mount path to the host FS when the target resolves to an absolute path"
* [MESOS-8256] - Libprocess can silently deadlock due to worker thread exhaustion.
* [MESOS-8096] - Enqueueing events in MockHTTPScheduler can lead to segfaults.
* [MESOS-8038] - Launching GPU task sporadically fails.
* [MESOS-7971] - PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky
* [MESOS-7911] - Non-checkpointing framework's tasks should not be marked LOST when agent disconnects.
* [MESOS-7748] - Slow subscribers of streaming APIs can lead to Mesos OOMing.
* [MESOS-7721] - Master's agent removal rate limit also applies to agent unreachability.
* [MESOS-7566] - Master crash due to failed check in DRFSorter::remove
* [MESOS-7386] - Executor not cleaning up existing running docker containers if external logrotate/logger processes die/killed
* [MESOS-6285] - Agents may OOM during recovery if there are too many tasks or executors
* [MESOS-5989] - Libevent SSL Socket downgrade code accesses uninitialized memory / assumes single peek is sufficient.
All Resolved Issues:
** Bug
* [MESOS-7485] - Add verbose logging for curl commands used in fetcher/puller
* [MESOS-7834] - CMake does not set default --launcher_dir correctly
* [MESOS-9609] - Master check failure when marking agent unreachable.
* [MESOS-10126] - Docker volume isolator needs to clean up the `info` struct regardless the result of unmount operation
* [MESOS-10134] - Race between concurrent `javah` runs trying to create `java/jni` output directory.
* [MESOS-10137] - Mesos failed to build due to error C2668 on windows with MSVC
* [MESOS-10169] - Reintroduce image fetch deduplication while keeping it possible to destroy UCR containers in PROVISIONING state.
* [MESOS-10192] - Recent Nvidia CUDA changes break Mesos GPU support
** Epic
* [MESOS-898] - Introduce CMake as an alternative build system.
* [MESOS-10141] - CSI External Volume Support
* [MESOS-10161] - Constraints-based offer filtering
** Improvement
* [MESOS-6692] - Install module dependencies during build
* [MESOS-6771] - Add and vet `install` target
** Task
* [MESOS-10142] - CSI External Volumes MVP Design Doc
* [MESOS-10147] - Introduce a new volume type `CSI` into the `Volume` protobuf message
* [MESOS-10148] - Update the `CSIPluginInfo` protobuf message for supporting 3rd party CSI plugins
* [MESOS-10149] - Improve CSI service manager to support unmanaged CSI plugins
* [MESOS-10150] - Refactor CSI volume manager to support pre-provisioned CSI volumes
* [MESOS-10151] - Introduce a new agent flag `--csi_plugin_config_dir`
* [MESOS-10152] - Implement the `create` method of the `volume/csi` isolator
* [MESOS-10153] - Implement the `prepare` method of the `volume/csi` isolator
* [MESOS-10154] - Implement the `cleanup` method of the `volume/csi` isolator
* [MESOS-10155] - Implement the `recover` method of the `volume/csi` isolator
* [MESOS-10156] - Enable the `volume/csi` isolator in UCR
* [MESOS-10157] - Add documentation for the `volume/csi` isolator
* [MESOS-10162] - Constraints-based offer filtering design doc
* [MESOS-10163] - Implement a new component to launch CSI plugins as standalone containers and make CSI gRPC calls
* [MESOS-10166] - Avoid sending framework updates to agents and subscribers when frameworkInfo/pid didn't change.
* [MESOS-10168] - Add secrets support to the CSI volume managers
* [MESOS-10170] - Bundle RE2 into Mesos
* [MESOS-10171] - Groundwork for constraints-based filtering using `Exists/NotExists` attribute constraint as an example.
* [MESOS-10172] - Add offer constraints on (pseudo)attribute value equality
* [MESOS-10173] - Add offer constraints on (pseudo)attribute (not) matching RE2 regex
* [MESOS-10175] - Improve CSI service manager to set node ID for managed CSI plugins
* [MESOS-10177] - Add an endpoint for offer constraints debug
* [MESOS-10179] - Expose framework's OfferConstraints via master API endpoints
* [MESOS-10189] - Pass offer constraints through the V0 scheduler driver and its Java bindings.
** Documentation
* [MESOS-10193] - Add documentation for offer constraints.
Release Notes - Mesos - Version 1.10.1 (WIP)
-------------------------------------------
* This is a bug fix release.
** Bug
* [MESOS-9609] - Master check failure when marking agent unreachable.
* [MESOS-10126] - Docker volume isolator needs to clean up the `info` struct regardless the result of unmount operation
* [MESOS-10134] - Race between concurrent `javah` runs trying to create `java/jni` output directory.
* [MESOS-10169] - Reintroduce image fetch deduplication while keeping it possible to destroy UCR containers in PROVISIONING state.
Release Notes - Mesos - Version 1.10.0
--------------------------------------------
This release contains the following highlights:
* Container resource bursting has been supported on Linux. Frameworks are
now able to specify CPU and memory limits for tasks (separately from
resource requests) and also the level of isolation they desire when
launching task groups - CPU and memory may be isolated at the executor
container level, or the task container level (MESOS-10001).
* Executors can now use a Unix domain socket to connect to an agent, instead
of connecting via TCP (MESOS-10034).
* Existing reservations can now be modified via the RESERVE_RESOURCES
master API call (MESOS-9981).
* Performance of read-only V1 operator API calls has been improved by
introducing direct serialization into JSON/protobuf and extending the
batching mechanism to parallel processing of these calls by the master
(similarly to `/state` endpoint). This brings V1 operator API performance
on par with older HTTP endpoints (MESOS-10026, MESOS-9497).
* **Breaking change** for authorizer modules: authorizers are now required
to implement a method for returning `ObjectApprover`s that are valid
throughout all of their lifetime. For framework and operator API subscriber
principals the set of `ObjectAprover`s is now requested from the authorizer
only once per subscription (MESOS-10056, MESOS-10057).
Additional API Changes:
* Quota can now be set on the default `*` role.
* Quota consumption metrics are now exposed by the allocator.
Unresolved Critical Issues:
* [MESOS-10066] - mesos-docker-executor process dies when agent stops. Recovery fails when agent returns
* [MESOS-10011] - Operation feedback with stale agent ID crashes the master
* [MESOS-9967] - Authorization header is missing when using a default registry
* [MESOS-9609] - Master check failure when marking agent unreachable
* [MESOS-9579] - ExecutorHttpApiTest.HeartbeatCalls is flaky.
* [MESOS-9536] - Nested container launched with non-root user may not be able to write to its sandbox via the environment variable `MESOS_SANDBOX`
* [MESOS-9500] - spark submit with docker image on mesos cluster fails.
* [MESOS-9426] - ZK master detection can become forever pending.
* [MESOS-9393] - Fetcher crashes extracting archives with non-ASCII filenames.
* [MESOS-9365] - Windows - GET_CONTAINERS API call causes the Mesos agent to fail
* [MESOS-9355] - Persistence volume does not unmount correctly with wrong artifact URI
* [MESOS-9352] - Data in persistent volume deleted accidentally when using Docker container and Persistent volume
* [MESOS-9053] - Network ports isolator can falsely trigger while destroying containers.
* [MESOS-9006] - The agent's GET_AGENT leaks resource information when using authorization
* [MESOS-8840] - `cpu.cfs_quota_us` may be accidentally set for command task using docker during agent recovery.
* [MESOS-8803] - Libprocess deadlocks in a test.
* [MESOS-8679] - "If the first KILL stuck in the default executor, all other KILLs will be ignored."
* [MESOS-8608] - RmdirContinueOnErrorTest.RemoveWithContinueOnError fails.
* [MESOS-8257] - "Unified Containerizer ""leaks"" a target container mount path to the host FS when the target resolves to an absolute path"
* [MESOS-8256] - Libprocess can silently deadlock due to worker thread exhaustion.
* [MESOS-8096] - Enqueueing events in MockHTTPScheduler can lead to segfaults.
* [MESOS-8038] - Launching GPU task sporadically fails.
* [MESOS-7971] - PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky
* [MESOS-7911] - Non-checkpointing framework's tasks should not be marked LOST when agent disconnects.
* [MESOS-7748] - Slow subscribers of streaming APIs can lead to Mesos OOMing.
* [MESOS-7721] - Master's agent removal rate limit also applies to agent unreachability.
* [MESOS-7566] - Master crash due to failed check in DRFSorter::remove
* [MESOS-7386] - Executor not cleaning up existing running docker containers if external logrotate/logger processes die/killed
* [MESOS-6285] - Agents may OOM during recovery if there are too many tasks or executors
* [MESOS-5989] - Libevent SSL Socket downgrade code accesses uninitialized memory / assumes single peek is sufficient.
All Resolved Issues:
** Bug
* [MESOS-621] - `HierarchicalAllocatorProcess::removeSlave` doesn't properly handle framework allocations/resources
* [MESOS-4996] - 'containerizer->update' will always fail after killing a docker container.
* [MESOS-7217] - CgroupsIsolatorTest.ROOT_CGROUPS_CFS_EnableCfs is flaky.
* [MESOS-7639] - Oversubscription could crash the master due to CHECK failure in the allocator
* [MESOS-8537] - Default executor doesn't wait for status updates to be ack'd before shutting down
* [MESOS-8877] - Docker container's resources will be wrongly enlarged in cgroups after agent recovery
* [MESOS-9337] - Hook manager implementation is missing mutex acquisition in several places.
* [MESOS-9847] - Docker executor doesn't wait for status updates to be ack'd before shutting down.
* [MESOS-9889] - Master CPU high due to unexpected foreachkey behaviour in Master::__reregisterSlave.
* [MESOS-9958] - New CLI is not included in distribution tarball
* [MESOS-9965] - agent should not send `TASK_GONE_BY_OPERATOR` if the framework is not partition aware.
* [MESOS-9968] - WWWAuthenticate header parsing fails when commas are in (quoted) realm
* [MESOS-9971] - 'dist' and 'distcheck' cmake targets are implemented as shell scripts, so fail on Windows/MSVC.
* [MESOS-9975] - Sorter may leak clients allocations.
* [MESOS-9978] - Nvml isolator cannot be disabled which makes it impossible to exclude non-free code
* [MESOS-9980] - HierarchicalAllocatorTest.MaintenanceInverseOffers is flaky
* [MESOS-10007] - Command executor can miss exit status for short-lived commands due to double-reaping.
* [MESOS-10008] - Very large quota values can crash master.
* [MESOS-10015] - updateAllocation() can stall the allocator with a huge number of reservations on an agent.
* [MESOS-10018] - Duplicate tasks if agent partitioned during maintenance down
* [MESOS-10023] - Allocator method dispatches can be reordered (relative to scheduler API calls which triggered them).
* [MESOS-10041] - Libprocess SSL verification can leak memory
* [MESOS-10083] - Authorizing invalid operation can result in declined authorization.
* [MESOS-10084] - Detecting whether executor is generated for command task should work when the launcher_dir changes
* [MESOS-10090] - Mesos build on Windows appears to be broken.
* [MESOS-10092] - Cannot pull image from docker registry which does not reply with 'scope'/'service' in WWW-Authenticate header
* [MESOS-10094] - Master's agent draining VLOG prints incorrect task counts.
* [MESOS-10096] - Reactivating a draining agent leaves the agent in draining state.
* [MESOS-10097] - After HTTP framework disconnects, heartbeater idle-loops instead of being deleted.
* [MESOS-10098] - Mesos agent fails to start on outdated systemd.
* [MESOS-10100] - Recently introduced PathTest.Relative and PathTest.PathIteration fail on windows.
* [MESOS-10102] - MasterAPITest.ReservationUpdate is flaky
* [MESOS-10103] - MSVC build can segfault when composing authorization Action for updating reservation.
* [MESOS-10107] - containeriser: failed to remove cgroup - EBUSY
* [MESOS-10109] - After failover, master crashes on re-adding an agent with maintenance schedule set.
* [MESOS-10110] - Libprocess ignores most protobuf (de)serialisation failure cases.
* [MESOS-10111] - Failed check in libevent_ssl_socket.cpp: 'self->bev' Must be non NULL
* [MESOS-10113] - OpenSSLSocketImpl with 'support_downgrade' waits for incoming bytes before accepting new connection.
* [MESOS-10114] - OpenSSLSocketImpl with 'support_downgrade' can silently stop accepting sockets.
* [MESOS-10116] - Attempt to reactivate disconnected agent crashes the master
* [MESOS-10118] - Agent incorrectly handles draining when empty
* [MESOS-10120] - Authorization for /logging/toggle and /metrics/snapshot is skipped on Windows.
* [MESOS-10123] - Windows overlapped IO discard handling can drop data.
* [MESOS-10124] - OpenSSLSocketImpl on Windows with 'support_downgrade' is incorrectly polling for read readiness.
* [MESOS-10125] - Web UI roles tree files are missing from automake install.
* [MESOS-10128] - Performance regression in HierarchicalAllocations_BENCHMARK_Test.PersistentVolumes
** Epic
* [MESOS-9981] - Introduce a Mesos API to update reservations
* [MESOS-10001] - Resource Limits and Requests
* [MESOS-10034] - Agent/executor domain socket communication
** Improvement
* [MESOS-7245] - Add a Windows segfault handler for stacktraces
* [MESOS-9123] - Expose quota consumption metrics.
* [MESOS-9497] - Parallel reads for expensive master v1 read-only calls.
* [MESOS-9914] - Refactor `MesosTest::StartSlave` in favour of builder style interface
* [MESOS-9948] - master::Slave::hasExecutor occupies 37% of a 150 second perf sample.
* [MESOS-9964] - Support destroying UCR containers in provisioning state
* [MESOS-9972] - Update Names for TLS-related environment variables in libprocess.
* [MESOS-10016] - Add a benchmark for HierarchicalAllocatorProcess::updateAllocation()
* [MESOS-10017] - Log all reverse DNS lookup failures in 'legacy' TLS (SSL) hostname validation scheme.
* [MESOS-10026] - Improve v1 operator API read performance.
* [MESOS-10056] - Perform synchronous authorization for scheduler calls.
* [MESOS-10057] - Perform synchronous authorization for outgoing events on event stream.
* [MESOS-10095] - Agent draining logging makes it hard to tell which tasks did not terminate.
* [MESOS-10112] - Log peer address during TLS handshake failures.
** Wish
* [MESOS-9630] - Consider moving linter setup to pre-commit
** Task
* [MESOS-3938] - Consider allowing setting quotas for the default '*' role.
* [MESOS-6084] - Deprecate and remove the included MPI framework
* [MESOS-8503] - Improve UI when displaying frameworks with many roles.
* [MESOS-9843] - Implement tests for the `containerizer/debug` endpoint.
* [MESOS-9949] - Track allocated/offered in the allocator's role tree.
* [MESOS-9974] - Remove support/mesos-style.py transition script
* [MESOS-9982] - Add a 'source' field to operator API ReserveResources protobuf
* [MESOS-9983] - Intermediate rejection of Reserve operations with source set
* [MESOS-9984] - Provide a function to compute a common "reservation ancestor" between two 'Resources'
* [MESOS-9985] - Update validation of 'ReserveResources' for 'source'
* [MESOS-9986] - Update 'getConsumedResources' and 'getResourceConversions' for 'source' in reservations
* [MESOS-9987] - Update 'Master::Http::_reserve' to also require 'source' resources
* [MESOS-9988] - Add 'source' field to scheduler reservation API
* [MESOS-9989] - Update 'Master::Http::_reserve' to pass 'source' into generated operation
* [MESOS-9990] - Consolidate 'Master::authorizeReserveResources' overloads
* [MESOS-9991] - Update 'Master::authorizeReserveResources' for re-reservations
* [MESOS-9992] - Add end-to-end test excercising re-reservation operator API
* [MESOS-9993] - Update operator API documentation for re-reservations
* [MESOS-10002] - Design doc for container bursting
* [MESOS-10009] - Implement glue code for the Windows event loop and OpenSSL's basic I/O abstraction
* [MESOS-10010] - Implement an SSL socket for Windows, using OpenSSL directly
* [MESOS-10033] - Design per-task cgroup isolation
* [MESOS-10035] - Implement `enable_http_executor_domain_sockets` agent flag
* [MESOS-10036] - Implement agent code to create a domain socket on startup
* [MESOS-10037] - Create code to bind-mount domain sockets into mesos-type executor containers
* [MESOS-10038] - Implement agent code to listen on a domain socket
* [MESOS-10039] - Let the default executor connect through a domain socket when available
* [MESOS-10043] - Add resource limits into the protobuf message `TaskInfo`
* [MESOS-10044] - Add a new capability `TASK_RESOURCE_LIMITS` into Mesos agent
* [MESOS-10045] - Validate task's resources limits and the `share_cgroups` field
* [MESOS-10046] - Launch executor container with resource limits
* [MESOS-10047] - Update the CPU subsystem in the cgroup isolator to set container's CPU resource limits
* [MESOS-10048] - Update the memory subsystem in the cgroup isolator to set container's memory resource limits and `oom_score_adj`
* [MESOS-10049] - Add a new reason in `TaskStatus::Reason` for the case that a task is OOM-killed due to exceeding its memory request
* [MESOS-10050] - Update the `update()` method of containerizer to handle container resource limits
* [MESOS-10051] - Update the `LaunchContainer` agent API to support container resource limits
* [MESOS-10053] - Update Docker executor to set Docker container's resource limits and `oom_score_adj`
* [MESOS-10054] - Update Docker containerizer to set Docker container's resource limits and `oom_score_adj`
* [MESOS-10055] - Update Mesos UI to display the resource limits of tasks
* [MESOS-10061] - Implement chmod() support for stout
* [MESOS-10062] - Implement relative path computation for stout
* [MESOS-10063] - Update default executor to call `LAUNCH_CONTAINER` to launch nested containers
* [MESOS-10064] - Accommodate the "Infinity" value in JSON
* [MESOS-10065] - Update the `update()` method of isolator interface to handle container resource limits
* [MESOS-10067] - Update the `update()` method of cgroups subsystem interface to handle container resource limits
* [MESOS-10073] - Implement SSL downgrade on the native SSL socket
* [MESOS-10074] - Adapt design for executor domain sockets for agent restarts
* [MESOS-10075] - Add the `shared_cgroups` field into the protobuf message `LinuxInfo`
* [MESOS-10076] - Cgroups isolator: create nested cgroups
* [MESOS-10077] - Cgroups isolator: allow updating and isolating resources for nested cgroups
* [MESOS-10079] - Cgroups isolator: recover nested cgroups
* [MESOS-10086] - Add support for systemd socket activation for mesos domain sockets
* [MESOS-10087] - Update master & agent's HTTP endpoints for showing resource limits
* [MESOS-10115] - Add documentation for task resource limits
* [MESOS-10117] - Update the `usage()` method of containerizer to set resource limits in the `ResourceStatistics` protobuf message
** Documentation
* [MESOS-9938] - Standalone container documentation
* [MESOS-9979] - Add docs for FrameworkInfo updates and the UPDATE_FRAMEWORK call.
Release Notes - Mesos - Version 1.9.1 (WIP)
-------------------------------------------
* This is a bug fix release.
** Bug
* [MESOS-9609] - Master check failure when marking agent unreachable.
* [MESOS-9964] - Support destroying UCR containers in provisioning state.
* [MESOS-9965] - Agent should not send `TASK_GONE_BY_OPERATOR` if the framework is not partition aware.
* [MESOS-9966] - Agent crashes when trying to destroy orphaned nested container if root container is orphaned as well.
* [MESOS-9968] - WWWAuthenticate header parsing fails when commas are in (quoted) realm
* [MESOS-9972] - Update Names for TLS-related environment variables in libprocess.
* [MESOS-10007] - Command executor can miss exit status for short-lived commands due to double-reaping.
* [MESOS-10008] - Very large quota values can crash master.
* [MESOS-10015] - updateAllocation() can stall the allocator with a huge number of reservations on an agent.
* [MESOS-10041] - Libprocess SSL verification can leak memory.
* [MESOS-10094] - Master's agent draining VLOG prints incorrect task counts.
* [MESOS-10096] - Reactivating a draining agent leaves the agent in draining state.
* [MESOS-10118] - Agent incorrectly handles draining when empty.
* [MESOS-10126] - Docker volume isolator needs to clean up the `info` struct regardless the result of unmount operation
* [MESOS-10134] - Race between concurrent `javah` runs trying to create `java/jni` output directory.
* [MESOS-10169] - Reintroduce image fetch deduplication while keeping it possible to destroy UCR containers in PROVISIONING state.
** Improvement
* [MESOS-9889] - Master CPU high due to unexpected foreachkey behaviour in Master::__reregisterSlave.
* [MESOS-9948] - master::Slave::hasExecutor occupies 37% of a 150 second perf sample.
* [MESOS-10017] - Log all reverse DNS lookup failures in 'legacy' TLS (SSL) hostname validation scheme.
* [MESOS-10095] - Agent draining logging makes it hard to tell which tasks did not terminate.
* [MESOS-10112] - Log peer address during TLS handshake failures.
Release Notes - Mesos - Version 1.9.0
-------------------------------------
This release contains the following highlights:
* Maintenance:
* Added new APIs to support automatic node draining via operator APIs.
This serves as an alternative to framework-assisted draining using
maintenance primitives. (MESOS-9753)
* Resource Management:
* Support for quota limits has been added. The existing quota guarantees
are deprecated in favor of using limits (and in the future, priorities).
* Security
* A new libprocess flag `--hostname_validation_scheme` has been added.
This allows users to enable a new RFC 6125-compliant hostname verification
scheme based on primitives provided by OpenSSL. This will also improve
performance by getting rid of all reverse DNS lookups. (MESOS-9784)
* The use of anonymous cipher suites is now disallowed when TLS certificate
verification is enabled. (MESOS-9810)
* Containerization:
* A new `--docker_ignore_runtime` flag has been added. This causes the agent
to ignore any runtime configuration present in Docker images. (MESOS-9760)
* Add no-new-privileges isolator. A new Linux isolator has been added to
support enabling the no_new_privs process control flag. (MESOS-9770)
* The Mesos containerizer now masks sensitive paths in `/proc` for
containers that do not share the host's PID namespace. (MESOS-9771)
* The Mesos containerizer now supports configurable IPC namespace and
/dev/shm. Container can be configured to have a private IPC namespace
and /dev/shm or share them from its parent, and the size of its private
/dev/shm is also configurable. (MESOS-9795)
* The Mesos containerizer now includes ephemeral overlayfs storage in the
task disk quota as well as sandbox storage. (MESOS-9900)
* A new `/containerizer/debug` HTTP endpoint has been added. This endpoint
exposes debug information for the Mesos containerizer. At the moment, it
returns a list of pending operations related to Isolators and Launchers.
(MESOS-9756)
Additional API Changes:
* Mesos components will now forego TLS certificate validation for incoming
connections, unless `LIBPROCESS_SSL_REQUIRE_CERT` is set to true.
* The `Socket::connect(const Address&)` member function will now abort the
program when called on a `LibeventSSLSocket`. Instead, the new overload
`Socket::connect(const Address&, const TLSClientConfig&)` must be used.
NOTE: This new overload is only available when libprocess is compiled
with `--enable-ssl`.
Unresolved Critical Issues:
* MESOS-9889 - Master CPU high due to unexpected foreachkey behaviour in Master::__reregisterSlave
* MESOS-9697 - Release RPMs are not uploaded to bintray
* MESOS-9579 - ExecutorHttpApiTest.HeartbeatCalls is flaky.
* MESOS-9536 - Nested container launched with non-root user may not be able to write to its sandbox via the environment variable `MESOS_SANDBOX`
* MESOS-9520 - IOTest.Read hangs on Windows
* MESOS-9500 - spark submit with docker image on mesos cluster fails.
* MESOS-9426 - ZK master detection can become forever pending.
* MESOS-9393 - Fetcher crashes extracting archives with non-ASCII filenames.
* MESOS-9365 - Windows - GET_CONTAINERS API call causes the Mesos agent to fail
* MESOS-9355 - Persistence volume does not unmount correctly with wrong artifact URI
* MESOS-9352 - Data in persistent volume deleted accidentally when using Docker container and Persistent volume
* MESOS-9053 - Network ports isolator can falsely trigger while destroying containers.
* MESOS-9006 - The agent's GET_AGENT leaks resource information when using authorization
* MESOS-8877 - Docker container's resources will be wrongly enlarged in cgroups after agent recovery
* MESOS-8840 - `cpu.cfs_quota_us` may be accidentally set for command task using docker during agent recovery.
* MESOS-8803 - Libprocess deadlocks in a test.
* MESOS-8679 - If the first KILL stuck in the default executor, all other KILLs will be ignored.
* MESOS-8608 - RmdirContinueOnErrorTest.RemoveWithContinueOnError fails.
* MESOS-8257 - Unified Containerizer "leaks" a target container mount path to the host FS when the target resolves to an absolute path
* MESOS-8256 - Libprocess can silently deadlock due to worker thread exhaustion.
* MESOS-8096 - Enqueueing events in MockHTTPScheduler can lead to segfaults.
* MESOS-8038 - Launching GPU task sporadically fails.
* MESOS-7971 - PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky
* MESOS-7911 - Non-checkpointing framework's tasks should not be marked LOST when agent disconnects.
* MESOS-7748 - Slow subscribers of streaming APIs can lead to Mesos OOMing.
* MESOS-7721 - Master's agent removal rate limit also applies to agent unreachability.
* MESOS-7566 - Master crash due to failed check in DRFSorter::remove
* MESOS-7386 - Executor not cleaning up existing running docker containers if external logrotate/logger processes die/killed
* MESOS-6285 - Agents may OOM during recovery if there are too many tasks or executors
* MESOS-5989 - Libevent SSL Socket downgrade code accesses uninitialized memory / assumes single peek is sufficient.
All Resolved Issues:
** Bug
* [MESOS-2842] - Master crashes when framework changes principal on re-registration
* [MESOS-5804] - ExamplesTest.DynamicReservationFramework is flaky
* [MESOS-6382] - Add option to enable parallel test runner for cmake builds
* [MESOS-6605] - configure looks for wrong header file for elfio
* [MESOS-8968] - Wire `UPDATE_QUOTA` call.
* [MESOS-9353] - libprocess triggers deprecation warnings when built against openssl 1.1.
* [MESOS-9395] - Check failure on `StorageLocalResourceProviderProcess::applyCreateDisk`.
* [MESOS-9482] - Resource provider manager can crash on invalid data from resource providers
* [MESOS-9560] - ContentType/AgentAPITest.MarkResourceProviderGone/1 is flaky
* [MESOS-9594] - Test `StorageLocalResourceProviderTest.RetryRpcWithExponentialBackoff` is flaky.
* [MESOS-9609] - Master check failure when marking agent unreachable
* [MESOS-9616] - `Filters.refuse_seconds` declines resources not in offers.
* [MESOS-9667] - Check failure when executor for task using resource provider resources subscribes before agent is registered
* [MESOS-9698] - DroppedOperationStatusUpdate test is flaky
* [MESOS-9707] - Calling link::lo() may cause runtime error
* [MESOS-9711] - Avoid shutting down executors registering before a required resource provider.
* [MESOS-9712] - StorageLocalResourceProviderTest.CsiPluginRpcMetrics is flaky.
* [MESOS-9719] - Test `AgentFailoverHTTPExecutorUsingResourceProviderResources` is flaky.
* [MESOS-9727] - Heartbeat calls from executor to agent are reported as errors
* [MESOS-9733] - Random sorter generates non-uniform result for hierarchical roles.
* [MESOS-9750] - Agent V1 GET_STATE response may report a complete executor's tasks as non-terminal after a graceful agent shutdown
* [MESOS-9765] - Test `ROOT_CreateDestroyPersistentMountVolumeWithReboot` is flaky.
* [MESOS-9766] - /__processes__ endpoint can hang.
* [MESOS-9779] - `UPDATE_RESOURCE_PROVIDER_CONFIG` agent call returns 404 ambiguously.
* [MESOS-9782] - Random sorter fails to clear removed clients.
* [MESOS-9785] - Frameworks recovered from reregistered agents are not reported to master `/api/v1` subscribers.
* [MESOS-9786] - Race between two REMOVE_QUOTA calls crashes the master.
* [MESOS-9803] - Memory leak caused by an infinite chain of futures in `UriDiskProfileAdaptor`.
* [MESOS-9808] - libprocess can deadlock on termination (cleanup() vs use() + terminate())
* [MESOS-9811] - Don't use reverse DNS for hostname validation
* [MESOS-9831] - Master should not report disconnected resource providers.
* [MESOS-9835] - `QuotaRoleAllocateNonQuotaResource` is failing.
* [MESOS-9836] - Docker containerizer overwrites `/mesos/slave` cgroups.
* [MESOS-9852] - Slow memory growth in master due to deferred deletion of offer filters and timers.
* [MESOS-9854] - /roles endpoint should return both guarantees and limits.
* [MESOS-9856] - REVIVE call with specified role(s) clears filters for all roles of a framework.
* [MESOS-9861] - Make PushGauges support floating point stats.
* [MESOS-9870] - Simultaneous adding/removal of a role from framework's roles and its suppressed roles crashes the master.
* [MESOS-9875] - Mesos did not respond correctly when operations should fail
* [MESOS-9881] - StorageLocalResourceProviderTest.RetryOperationStatusUpdateAfterRecovery is flaky.
* [MESOS-9882] - Mesos.UpdateFrameworkV0Test.SuppressedRoles is flaky.
* [MESOS-9886] - RoleTest.RolesEndpointContainsConsumedQuota is flaky.
* [MESOS-9887] - Race condition between two terminal task status updates for Docker/Command executor.
* [MESOS-9888] - /roles and GET_ROLES do not expose roles with only static reservations
* [MESOS-9890] - /roles and GET_ROLES does not always expose parent roles.
* [MESOS-9893] - `volume/secret` isolator should cleanup the stored secret from runtime directory when the container is destroyed
* [MESOS-9894] - Mesos failed to build due to fatal error C1083 on Windows using MSVC.
* [MESOS-9895] - SlaveTest.DrainingAgentRejectLaunch is flaky
* [MESOS-9901] - jsonify uses non-standard mapping for protobuf map fields.
* [MESOS-9902] - Mesos failed to build due to error C2280 on windows with MSVC
* [MESOS-9906] - Libprocess tests hangs on arm
* [MESOS-9909] - Mesos agent crashes after recovery when there is nested container joins a CNI network
* [MESOS-9922] - MasterQuotaTest.RescindOffersEnforcingLimits is flaky
* [MESOS-9925] - Default executor takes a couple of seconds to start and subscribe Mesos agent
* [MESOS-9930] - DRF sorter may omit clients in sorting after removing an inactive leaf node.
* [MESOS-9934] - Master does not handle returning unreachable agents as draining/deactivated
* [MESOS-9935] - The agent crashes after the disk du isolator supporting rootfs checks.
* [MESOS-9952] - ExampleTest.DiskFullFramework is slow
* [MESOS-9956] - CSI plugins reporting duplicated volumes will crash the agent.
** Epic
* [MESOS-9534] - CSI Spec v1.0 Support.
* [MESOS-9756] - Introduce a container debug endpoint.
* [MESOS-9784] - Client side SSL certificate verification in Libprocess.
* [MESOS-9795] - Support configurable /dev/shm and IPC namespace.
** Improvement
* [MESOS-7258] - Provide scheduler calls to subscribe to additional roles and unsubscribe from roles.
* [MESOS-8456] - Allocator should allow roles to burst above guarantees but below limits.
* [MESOS-8789] - /roles and webui roles table should display distinct offered and allocated resources.
* [MESOS-9254] - Make SLRP be able to update its volumes and storage pools.
* [MESOS-9545] - Marking an unreachable agent as gone should transition the tasks to terminal state
* [MESOS-9618] - Display quota consumption in the webui.
* [MESOS-9640] - Add authorization support for `UPDATE_QUOTA` call.
* [MESOS-9668] - Add authorization support for the new `GET_QUOTA` call.
* [MESOS-9669] - Deprecate v0 quota calls.
* [MESOS-9695] - Remove the duplicate pid check in Docker containerizer
* [MESOS-9701] - Allocator's roles map should track reservations.
* [MESOS-9724] - Flatten the weighted shuffling in the random sorter.
* [MESOS-9758] - Take ports out of the GET_ROLES endpoints.
* [MESOS-9759] - Log required quota headroom and available quota headroom in the allocator.
* [MESOS-9760] - Decouple Docker runtime isolator manifest configuration from image provider
* [MESOS-9769] - Add direct containerized support for filesystem operations.
* [MESOS-9770] - Add no-new-privileges isolator.
* [MESOS-9771] - Mask sensitive procfs paths.
* [MESOS-9778] - Randomized the agents in the second allocation stage.
* [MESOS-9787] - Log slow SSL (TLS) peer reverse DNS lookup.
* [MESOS-9791] - Libprocess does not support server only SSL certificate verification.
* [MESOS-9799] - Adopt container file operations in secrets volumes.
* [MESOS-9802] - Remove quota role sorter in the allocator.
* [MESOS-9805] - Run cgroup subsystems before moving the target PID.
* [MESOS-9806] - Address allocator performance regression due to the addition of quota limits.
* [MESOS-9807] - Introduce a `struct Quota` wrapper.
* [MESOS-9812] - Add achievability validation for update quota call.
* [MESOS-9820] - Add `updateQuota()` method to the allocator.
* [MESOS-9833] - Introduce an agent flag for the default `/dev/shm` size
* [MESOS-9876] - Use geteuid to determine subprocess' user when launching task.
* [MESOS-9878] - Enable libprocess users to pass a custom SSL context when using Socket
* [MESOS-9900] - Include overlayfs upperdir in disk quota accounting.
* [MESOS-9908] - Introduce a new agent flag and support docker volume chown to task user.
* [MESOS-9917] - Store a role tree in the allocator.
* [MESOS-9932] - Removal of a role from the suppression list should be equivalent to REVIVE.
** Task
* [MESOS-8486] - Webui should display role limits.
* [MESOS-9485] - Unit test for master operation authorization.
* [MESOS-9565] - Unit tests for creating and destroying persistent volumes in SLRP.
* [MESOS-9598] - Update GET `/quota` to return both guarantees and limits.
* [MESOS-9599] - Update `GET_QUOTA` to return both guarantees and limits.
* [MESOS-9600] - Deprecate `SET_QUOTA` and `REMOVE_QUOTA` calls in favor of `UPDATE_QUOTA`.
* [MESOS-9601] - Persist `QuotaConfig`s in the registry.
* [MESOS-9602] - Provide backward compatibility for old quota configurations.
* [MESOS-9603] - Add quota limits metrics.
* [MESOS-9627] - Test CSI v1 in SLRP unit tests.
* [MESOS-9699] - Pull in glog 0.4.0
* [MESOS-9710] - Add tests to ensure random sorter performs correct weighted sorting.
* [MESOS-9715] - Support specifying output file name for curl fetcher plugin
* [MESOS-9754] - Design doc for agent draining
* [MESOS-9757] - Design doc for container debug endpoint.
* [MESOS-9775] - Design doc for UCR shared memory.
* [MESOS-9788] - Configurable IPC namespace and shared memory in `namespaces/ipc` isolator
* [MESOS-9793] - Implement UPDATE_FRAMEWORK call in V0 API for C++/Java
* [MESOS-9809] - Use OpenSSL built-in functions for hostname validation
* [MESOS-9810] - Reject certificate-less ciphers when certificate verification is enabled
* [MESOS-9814] - Implement DrainAgent master/operator call with associated registry actions
* [MESOS-9816] - Add draining state information to master state endpoints
* [MESOS-9817] - Add minimum master capability for draining and deactivation states
* [MESOS-9818] - Implement minimal agent-side draining handler
* [MESOS-9821] - Agent kills all tasks when draining
* [MESOS-9822] - Agent recovery code for task draining
* [MESOS-9823] - Agent should modify status updates while draining
* [MESOS-9825] - Introduce an agent flag to disallow sharing the IPC namespace from the host.
* [MESOS-9826] - Set up `/dev/shm` in `filesystem/linux` isolator only when `namespaces/ipc` isolator is not enabled
* [MESOS-9827] - Introduce the configurable shm protobuf API.
* [MESOS-9828] - Document the IPC namespace and shm on UCR.
* [MESOS-9829] - Implement the container debug endpoint on slave/http.cpp
* [MESOS-9837] - Implement `FutureTracker` class along with helper functions.
* [MESOS-9839] - Implement `IsolatorTracker` class.
* [MESOS-9840] - Implement `LauncherTracker` class.
* [MESOS-9841] - Integrate `IsolatorTracker` and `LinuxLauncher` with Mesos containerizer.
* [MESOS-9842] - Implement tests for the `FutureTracker` class and for its helper functions.
* [MESOS-9845] - Add docs for automatic agent draining
* [MESOS-9846] - Update UI for agent draining
* [MESOS-9849] - Add support for per-role REVIVE / SUPPRESS to V0 scheduler driver.
* [MESOS-9853] - Update Docker executor to allow kill policy overrides
* [MESOS-9860] - Agent should erase DrainInfo when draining complete
* [MESOS-9862] - Agent should fail task launches while draining
* [MESOS-9871] - Expose quota consumption in /roles endpoint.
* [MESOS-9874] - Add environment variable `MESOS_ALLOCATION_ROLE` to the task/container.
* [MESOS-9892] - Test various agent state transitions involving agent draining
* [MESOS-9907] - Retain agent draining start time in master
** Documentation
* [MESOS-9427] - Revisit quota documentation.
Release Notes - Mesos - Version 1.8.2 (WIP)
-------------------------------------------
* This is a bug fix release.
** Bug
* [MESOS-9609] - Master check failure when marking agent unreachable.
* [MESOS-9785] - Frameworks recovered from reregistered agents are not reported to master `/api/v1` subscribers.
* [MESOS-9836] - Docker containerizer overwrites `/mesos/slave` cgroups.
* [MESOS-9868] - NetworkInfo from the agent /state endpoint is not correct.
* [MESOS-9887] - Race condition between two terminal task status updates for Docker/Command executor.
* [MESOS-9893] - `volume/secret` isolator should cleanup the stored secret from runtime directory when the container is destroyed.
* [MESOS-9925] - Default executor takes a couple of seconds to start and subscribe Mesos agent.
* [MESOS-9964] - Support destroying UCR containers in provisioning state.
* [MESOS-9966] - Agent crashes when trying to destroy orphaned nested container if root container is orphaned as well.
* [MESOS-9968] - WWWAuthenticate header parsing fails when commas are in (quoted) realm
* [MESOS-10007] - Command executor can miss exit status for short-lived commands due to double-reaping.
* [MESOS-10015] - updateAllocation() can stall the allocator with a huge number of reservations on an agent.
* [MESOS-10126] - Docker volume isolator needs to clean up the `info` struct regardless the result of unmount operation
* [MESOS-10134] - Race between concurrent `javah` runs trying to create `java/jni` output directory.
* [MESOS-10169] - Reintroduce image fetch deduplication while keeping it possible to destroy UCR containers in PROVISIONING state.
** Improvement
* [MESOS-9889] - Master CPU high due to unexpected foreachkey behaviour in Master::__reregisterSlave.
* [MESOS-9948] - master::Slave::hasExecutor occupies 37% of a 150 second perf sample.
* [MESOS-10017] - Log all reverse DNS lookup failures in 'legacy' TLS (SSL) hostname validation scheme.
Release Notes - Mesos - Version 1.8.1
-------------------------------------
* This is a bug fix release.
** Bug
* [MESOS-9395] - Check failure on `StorageLocalResourceProviderProcess::applyCreateDisk`.
* [MESOS-9616] - `Filters.refuse_seconds` declines resources not in offers.
* [MESOS-9730] - Executors cannot reconnect with agents using TLS1.3
* [MESOS-9750] - Agent V1 GET_STATE response may report a complete executor's tasks as non-terminal after a graceful agent shutdown.
* [MESOS-9766] - /__processes__ endpoint can hang.
* [MESOS-9779] - `UPDATE_RESOURCE_PROVIDER_CONFIG` agent call returns 404 ambiguously.
* [MESOS-9782] - Random sorter fails to clear removed clients.
* [MESOS-9786] - Race between two REMOVE_QUOTA calls crashes the master.
* [MESOS-9803] - Memory leak caused by an infinite chain of futures in `UriDiskProfileAdaptor`.
* [MESOS-9831] - Master should not report disconnected resource providers.
* [MESOS-9852] - Slow memory growth in master due to deferred deletion of offer filters and timers.
* [MESOS-9856] - REVIVE call with specified role(s) clears filters for all roles of a framework.
* [MESOS-9870] - Simultaneous adding/removal of a role from framework's roles and its suppressed roles crashes the master.
** Improvement
* [MESOS-9695] - Remove the duplicate pid check in Docker containerizer
* [MESOS-9759] - Log required quota headroom and available quota headroom in the allocator.
* [MESOS-9787] - Log slow SSL (TLS) peer reverse DNS lookup.
Release Notes - Mesos - Version 1.8.0
-------------------------------------
This release contains the following highlights:
* Performance Improvements:
* Frameworks can now specify the minimum resource quantities needed
in an offer, which acts as an override of the global
`--min_allocatable_resources` master flag. Updating schedulers to
specify this field improves multi-scheduler scalability as it
reduces the amount of offers declined from having insufficient
resource quantities. Note that this feature currently requires that
the scheduler re-subscribes each time it wants to mutate the
minimum resource quantity offer filter information, see MESOS-7258.
* The batching mechanism used for requests to the master's `/state`
endpoint was extending to other read-only master endpoints like
`/state-summary`, `/frameworks`, `/roles`, etc. (see MESOS-9158)
In addition, responses for multiple concurrent requests to read-only master
endpoints are now only computed once in cases where it can be guaranteed
that all responses would be equal. (see MESOS-9224)
This should significantly increase master responsiveness under
heavy load.
* Allocator cycle time is significantly decreased (around 40% for a
small size cluster and up to 70% for larger clusters) when quota is
used. This greatly narrows the allocator performance gap between
quota and non-quota usage scenarios.
* CLI
* The new Mesos CLI now offers the task subcommand. The first
command, attach, allows you to attach your terminal to a running
task launched with a tty. The second command, exec, launches a
new nested container inside a running task. To build the CLI,
use the flag `--enable-new-cli` with Autotools and
`-DENABLE_NEW_CLI=1` with CMake on MacOS or Linux.
* Operation Feedback:
* V1 schedulers can now receive operation feedback for operations on agent
default resources, i.e. normal cpu, memory, and disk. This means that the
v1 scheduler API's operation feedback feature can now be used for all
non-task-launch operations (any offer operations except for LAUNCH and
LAUNCH_GROUP) on any type of resources.
* The experimental operation feedback API for v1 schedulers made a breaking
change: the RECONCILE_OPERATIONS call no longer returns a 200 OK response
with a body containing the full reconciliation results. Instead, a
successful request now returns 202 Accepted, and a series of operation
status updates are sent on the scheduler's event stream to satisfy the
reconciliation request. This is similar to the way in which the master
replies to requests for task status reconciliation.
* Containerization:
* [MESOS-9029] - New `linux/seccomp` isolator: Containers launched
by Mesos containerizer can be sandboxed by enabling filtering of
system calls using a configurable policy.
* [MESOS-9675] - Support pulling docker images with docker manifest
V2 Schema2 on Mesos Containerizer.
* [MESOS-9133] - Support custom port range option to the `network/ports`
isolator. Added the `--container_ports_isolated_range` flag to the
`network/ports` isolator. This allows the operator to specify a custom
port range to be protected by the isolator.
* [MESOS-5158] - Support XFS quota for persistent volumes. Added
persistent volume support to the `disk/xfs` isolator.
* [MESOS-9009] - Support an option to create non-existing host
paths for host path volume in Mesos Containerizer. Added a new
agent flag `--host_path_volume_force_creation` for the
`volume/host_path` isolator.
* Container Storage Interface (CSI):
* **Experimental** Supported the new CSI v1 API. Operators can deploy
plugins that are compatible to either CSI v0 or v1 to create persistent
volumes through storage local resource providers, and Mesos will
automatically detect which CSI versions are supported by the plugins.
Additional API Changes:
* [MESOS-9540] - Improved the experimental `DESTROY_DISK` operations so
frameworks can now deprovision any unwanted pre-provisioned CSI volume
directly, if they are authorized to perform `DESTROY_RAW_DISK` actions.
Unresolved Critical Issues:
* [MESOS-9697] - Release RPMs are not uploaded to bintray
* [MESOS-9672] - Docker containerizer should ignore pids of executors that do not pass the connection check.
* [MESOS-9654] - `PUBLISH_RESOURCES` should fail if the resource version changes.
* [MESOS-9616] - `Filters.refuse_seconds` declines resources not in offers.
* [MESOS-9609] - Master check failure when marking agent unreachable
* [MESOS-9579] - ExecutorHttpApiTest.HeartbeatCalls is flaky.
* [MESOS-9560] - ContentType/AgentAPITest.MarkResourceProviderGone/1 is flaky
* [MESOS-9536] - Nested container launched with non-root user may not be able to write to its sandbox via the environment variable
* [MESOS-9520] - IOTest.Read hangs on Windows
* [MESOS-9500] - spark submit with docker image on mesos cluster fails.
* [MESOS-9426] - ZK master detection can become forever pending.
* [MESOS-9393] - Fetcher crashes extracting archives with non-ASCII filenames.
* [MESOS-9365] - Windows - GET_CONTAINERS API call causes the Mesos agent to fail
* [MESOS-9355] - Persistence volume does not unmount correctly with wrong artifact URI
* [MESOS-9352] - Data in persistent volume deleted accidentally when using Docker container and Persistent volume
* [MESOS-9306] - Mesos containerizer can get stuck during cgroup cleanup
* [MESOS-9180] - tasks get stuck in TASK_KILLING on the default executor
* [MESOS-9053] - Network ports isolator can falsely trigger while destroying containers.
* [MESOS-9006] - The agent's GET_AGENT leaks resource information when using authorization
* [MESOS-8946] - CURL 7.58 causes Mesos to fail decoding raw responses.
* [MESOS-8840] - `cpu.cfs_quota_us` may be accidentally set for command task using docker during agent recovery.
* [MESOS-8803] - Libprocess deadlocks in a test.
* [MESOS-8769] - Agent crashes when CNI config not defined
* [MESOS-8679] - If the first KILL stuck in the default executor, all other KILLs will be ignored.
* [MESOS-8608] - RmdirContinueOnErrorTest.RemoveWithContinueOnError fails.
* [MESOS-8257] - Unified Containerizer "leaks" a target container mount path to the host FS when the target resolves to an absolute path
* [MESOS-8256] - Libprocess can silently deadlock due to worker thread exhaustion.
* [MESOS-8096] - Enqueueing events in MockHTTPScheduler can lead to segfaults.
* [MESOS-8038] - Launching GPU task sporadically fails.
* [MESOS-7971] - PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky
* [MESOS-7911] - Non-checkpointing framework's tasks should not be marked LOST when agent disconnects.
* [MESOS-7748] - Slow subscribers of streaming APIs can lead to Mesos OOMing.
* [MESOS-7721] - Master's agent removal rate limit also applies to agent unreachability.
* [MESOS-7566] - Master crash due to failed check in DRFSorter::remove
* [MESOS-7386] - Executor not cleaning up existing running docker containers if external logrotate/logger processes die/killed
* [MESOS-5989] - Libevent SSL Socket downgrade code accesses uninitialized memory / assumes single peek is sufficient.
* [MESOS-5754] - CommandInfo.user not honored in docker containerizer
* [MESOS-2842] - Master crashes when framework changes principal on re-registration
All Resolved Issues:
** Bug
* [MESOS-5048] - MesosContainerizerSlaveRecoveryTest.ResourceStatistics is flaky
* [MESOS-5189] - SSLTest.ProtocolMismatch is slow
* [MESOS-6874] - Agent silently ignores FS isolation when protobuf is malformed
* [MESOS-6949] - SchedulerTest.MasterFailover is flaky
* [MESOS-6990] - PartitionTest.TaskCompletedOnPartitionedAgent is flaky.
* [MESOS-7042] - Send SIGKILL after SIGTERM to IOSwitchboard after container termination.
* [MESOS-7076] - libprocess tests fail when using libevent 2.1.8
* [MESOS-7474] - Mesos fetcher cache doesn't retry when missed.
* [MESOS-7564] - Introduce a heartbeat mechanism for v1 HTTP executor <-> agent communication.
* [MESOS-7883] - Quota heuristic check not accounting for mount volumes
* [MESOS-8156] - Add a socketpair helper to the stout net API
* [MESOS-8343] - SchedulerHttpApiTest.UpdatePidToHttpScheduler is flaky.
* [MESOS-8467] - Destroyed executors might be used after `Slave::publishResource()`.
* [MESOS-8470] - CHECK failure in DRFSorter due to invalid framework id.
* [MESOS-8545] - AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.
* [MESOS-8547] - Mount devpts with compatible defaults.
* [MESOS-8568] - Command checks should always call `WAIT_NESTED_CONTAINER` before `REMOVE_NESTED_CONTAINER`
* [MESOS-8782] - Transition operations to OPERATION_GONE_BY_OPERATOR when marking an agent gone.
* [MESOS-8783] - Transition pending operations to OPERATION_UNREACHABLE when an agent is removed.
* [MESOS-8797] - Check failed in the default executor while running `MesosContainerizer/DefaultExecutorTest.TaskUsesExecutor/0` test.
* [MESOS-8835] - mesos-tests takes a long time to execute no tests
* [MESOS-8872] - OperationReconciliationTest.AgentPendingOperationAfterMasterFailover is flaky.
* [MESOS-8887] - Unreachable tasks are not GC'ed when unreachable agent is GC'ed.
* [MESOS-8907] - Docker image fetcher fails with HTTP/2.
* [MESOS-8978] - Command executor calling setsid breaks the tty support.
* [MESOS-9056] - mesos-style.py messaging is poor
* [MESOS-9074] - Pylint is too noisy when using mesos-style.py
* [MESOS-9079] - Test MasterTestPrePostReservationRefinement.LaunchGroup is flaky.
* [MESOS-9089] - Test `PartitionTest.PartitionAwareTaskCompletedOnPartitionedAgent` is flaky.
* [MESOS-9112] - mesos-style reports violations on a clean checkout
* [MESOS-9124] - Agent reconfiguration can cause master to REVIVE on scheduler's behalf
* [MESOS-9130] - Test `StorageLocalResourceProviderTest.ROOT_ContainerTerminationMetric` is flaky.
* [MESOS-9131] - Health checks launching nested containers while a container is being destroyed lead to unkillable tasks.
* [MESOS-9143] - MasterQuotaTest.RemoveSingleQuota is flaky.
* [MESOS-9168] - Libprocess' http client does not encode the outgoing query.
* [MESOS-9172] - Fetcher deadlock with duplicated URIs.
* [MESOS-9179] - ./support/python3/mesos-gtest-runner.py --help crashes
* [MESOS-9186] - Failed to build Mesos with Python 3.7 and new CLI enabled
* [MESOS-9187] - Add allocator benchmark to allow multiple framework/agent profiles.
* [MESOS-9190] - Test `StorageLocalResourceProviderTest.ROOT_CreateDestroyDiskRecovery` is flaky.
* [MESOS-9193] - Mesos build fail with Clang 3.5.
* [MESOS-9210] - Mesos v1 scheduler library does not properly handle SUBSCRIBE retries
* [MESOS-9212] - Disable SIGCHLD handling in libev.
* [MESOS-9214] - Stout.FsTest.Used fails on macOS
* [MESOS-9217] - LongLivedDefaultExecutorRestart is flaky.
* [MESOS-9222] - Linking libevent should be avoided.
* [MESOS-9225] - Github's mesos/modules does not build.
* [MESOS-9228] - SLRP does not clean up plugin containers after it is removed.
* [MESOS-9231] - `docker inspect` may return an unexpected result to Docker executor due to a race condition.
* [MESOS-9232] - verify-reviews.py broken after enabling python3 support scripts
* [MESOS-9240] - CSI protobuf build fails when dependency tracking is disabled.
* [MESOS-9253] - Reviewbot is failing when posting a review
* [MESOS-9266] - Whenever our packaging tasks trigger errors we run into permission problems.
* [MESOS-9274] - v1 JAVA scheduler library can drop TEARDOWN upon destruction.
* [MESOS-9279] - Docker Containerizer 'usage' call might be expensive if mount table is big.
* [MESOS-9281] - SLRP gets a stale checkpoint after system crash.
* [MESOS-9283] - Docker containerizer actor can get backlogged with large number of containers.
* [MESOS-9293] - If a framework looses operation information it cannot reconcile to acknowledge updates.
* [MESOS-9295] - Nested container launch could fail if the agent upgrade with new cgroup subsystems.
* [MESOS-9300] - XFS isolator can mislabel project IDs on persistence volumes.
* [MESOS-9302] - Mesos fails to build on Fedora 28
* [MESOS-9308] - URI disk profile adaptor could deadlock.
* [MESOS-9316] - FsTest.Used is flaky
* [MESOS-9317] - Some master endpoints do not handle failed authorization properly.
* [MESOS-9319] - Move root filesystem creation to the `filesystem/linux` isolator.
* [MESOS-9324] - Resource fragmentation: frameworks may be starved of port resources in the presence of large number frameworks with quota.
* [MESOS-9331] - Some library functions ignore failures from ::close which should probably be handled.
* [MESOS-9334] - Container stuck at ISOLATING state due to libevent poll never returns.
* [MESOS-9350] - CLI build step is broken with CMake due to missing file.
* [MESOS-9354] - Automatically remount read-only bind mounts.
* [MESOS-9357] - FetcherTest.DuplicateFileURI fails on macos
* [MESOS-9358] - Test `SlaveRecoveryTest.AgentReconfigurationWithRunningTask` is flaky.
* [MESOS-9362] - Test `CgroupsIsolatorTest.ROOT_CGROUPS_CreateRecursively` is flaky.
* [MESOS-9366] - Test `HealthCheckTest.HealthyTaskNonShell` can hang.
* [MESOS-9367] - GetContainers call crashes when using XFS disk isolation.
* [MESOS-9370] - Unable to build new Mesos CLI with PyInstaller and Python 3.7.
* [MESOS-9382] - mesos-gtest-runner doesn't work on systems without ulimit binary
* [MESOS-9390] - Warnings in AdaptedOperation prevent clang build
* [MESOS-9397] - PosixRLimitsIsolatorTest.UnsetLimits is broken on macOS 10.14.2 beta3.
* [MESOS-9398] - post-reviews.py fails to update an existing chain.
* [MESOS-9411] - Validation of JWT tokens using HS256 hashing algorithm is not thread safe.
* [MESOS-9417] - User mesosphere made lots of incorrect ticket updates
* [MESOS-9418] - Add support for the `Discard` blkio operation type.
* [MESOS-9419] - Executor to framework message crashes master if framework has not re-registered.
* [MESOS-9434] - Completed framework update streams may retry forever
* [MESOS-9459] - Reviewbot is not verifying reviews that need verification
* [MESOS-9462] - Devices in a container are inaccessible due to `nodev` on `/var/run`.
* [MESOS-9469] - Mesos does not validate framework-supplied FrameworkIDs
* [MESOS-9474] - Master does not respect authorization result for `CREATE_DISK` and `DESTROY_DISK`.
* [MESOS-9479] - SLRP does not set RP ID in produced OperationStatus.
* [MESOS-9480] - Master may skip processing authorization results for `LAUNCH_GROUP`.
* [MESOS-9492] - Persist CNI working directory across reboot.
* [MESOS-9495] - Test `MasterTest.CreateVolumesV1AuthorizationFailure` is flaky.
* [MESOS-9501] - Mesos executor fails to terminate and gets stuck after agent host reboot.
* [MESOS-9502] - IOswitchboard cleanup could get stuck due to FD leak from a race.
* [MESOS-9505] - `make check` failed with linking errors when c-ares is installed.
* [MESOS-9507] - Agent could not recover due to empty docker volume checkpointed files.
* [MESOS-9508] - Official 1.7.0 tarball can't be built on Ubuntu 16.04 LTS.
* [MESOS-9514] - Reviewboard bot fails on verify-reviews.py.
* [MESOS-9517] - SLRP should treat gRPC timeouts as non-terminal errors, instead of reporting OPERATION_FAILED.
* [MESOS-9518] - CNI_NETNS should not be set for orphan containers that do not have network namespace.
* [MESOS-9519] - Unable to build Mesos with CMake on Ubuntu 14.04.
* [MESOS-9521] - MasterAPITest.OperationUpdatesUponAgentGone is flaky
* [MESOS-9529] - `/proc` should be remounted even if a nested container set `share_pid_namespace` to true
* [MESOS-9531] - chown error handling is incorrect in createSandboxDirectory.
* [MESOS-9532] - ResourceOffersTest.ResourceOfferWithMultipleSlaves is flaky.
* [MESOS-9533] - CniIsolatorTest.ROOT_CleanupAfterReboot is flaky.
* [MESOS-9537] - SLRP sends inconsistent status updates for dropped operations.
* [MESOS-9542] - Hierarchical allocator check failure when an operation on a shutdown framework finishes
* [MESOS-9544] - SLRP does not clean up destroyed persistent volumes.
* [MESOS-9549] - nvidia/cuda 10 does not work on GPU isolator.
* [MESOS-9554] - Allocator might skip allocations because a single framework is incapable of receiving certain resources.
* [MESOS-9555] - Allocator CHECK failure: reservationScalarQuantities.contains(role).
* [MESOS-9557] - Operations are leaked in Framework struct when agents are removed
* [MESOS-9559] - OPERATION_UNREACHABLE and OPERATION_GONE_BY_OPERATOR updates don't include the agent/RP IDs
* [MESOS-9564] - Logrotate container logger lets tasks execute arbitrary commands in the Mesos agent's namespace
* [MESOS-9568] - SLRP does not clean up mount directories for destroyed MOUNT disks.
* [MESOS-9573] - Agent should not try to recover operation status update streams that haven't been created yet.
* [MESOS-9574] - Operation status update streams are not properly garbage collected.
* [MESOS-9582] - Reviewbot jenkins jobs stops validating any reviews as soon as it sees a patch which does not apply
* [MESOS-9590] - Mesos CI sometimes, incorrectly, overwrites already-pushed mesos master nightly images with new images built from non-master branches.
* [MESOS-9592] - Mesos Websitebot is flaky
* [MESOS-9597] - Status update streams for operations affecting agent default resources should be stored under "meta/slaves/<slave_id>/operations/"
* [MESOS-9605] - mesos/mesos-centos nightly docker image has to include the SHA of the build.
* [MESOS-9607] - Removing a resource provider with consumers breaks resource publishing.
* [MESOS-9610] - Fetcher vulnerability - escaping from sandbox
* [MESOS-9612] - Resource provider manager assumes all operations are triggered by frameworks
* [MESOS-9619] - Mesos Master Crashes with Launch Group when using Port Resources
* [MESOS-9621] - Mesos failed to build due to error LNK2019 on Windows using MSVC.
* [MESOS-9629] - Pylint reports cyclic dependencies in cli_new
* [MESOS-9635] - OperationReconciliationTest.AgentPendingOperationAfterMasterFailover is flaky again (3x) due to orphan operations
* [MESOS-9637] - Impossible to CREATE a volume on resource provider resources over the operator API
* [MESOS-9661] - Agent crashes when SLRP recovers dropped operations.
* [MESOS-9667] - Check failure when executor for task using resource provider resources subscribes before agent is registered.
* [MESOS-9688] - Quota is not enforced properly when subroles have reservations.
* [MESOS-9691] - Quota headroom calculation is off when subroles are involved.
* [MESOS-9692] - Quota may be under allocated for disk resources.
* [MESOS-9696] - Test MasterQuotaTest.AvailableResourcesSingleDisconnectedAgent is flaky
* [MESOS-9707] - Calling link::lo() may cause runtime error
* [MESOS-9667] - Check failure when executor for task using resource provider resources subscribes before agent is registered.
* [MESOS-9711] - Avoid shutting down executors registering before a required resource provider.
* [MESOS-9712] - StorageLocalResourceProviderTest.CsiPluginRpcMetrics is flaky.
* [MESOS-9727] - Heartbeat calls from executor to agent are reported as errors.
* [MESOS-9729] - Unpublishing a volume that is failed to publish crashes the agent with CSI v1.
* [MESOS-9733] - Random sorter generates non-uniform result for hierarchical roles.
* [MESOS-9740] - Invalid protobuf unions in ExecutorInfo::ContainerInfo will prevent agents from reregistering with 1.8+ masters
** Epic
* [MESOS-8054] - Feedback for operations
* [MESOS-8345] - Improve master responsiveness while serving state information.
* [MESOS-9029] - Seccomp syscall filtering in Mesos containerizer
* [MESOS-9211] - Make the new Mesos CLI production ready
* [MESOS-9675] - Docker Manifest V2 Schema2 Support.
** Story
* [MESOS-907] - Add Kerberos Authentication support
** Improvement
* [MESOS-4036] - Install instructions for CentOS 6.6 lead to errors running `perf`.
* [MESOS-4599] - ReviewBot should re-verify a review chain if any of the reviews is updated
* [MESOS-5158] - Provide XFS quota support for persistent volumes.
* [MESOS-6765] - Make the Resources wrapper "copy-on-write" to improve performance.
* [MESOS-6934] - Support pulling Docker images with V2 Schema 2 image manifest
* [MESOS-7124] - Replace monadic type get() functions with operator*
* [MESOS-7947] - Add GC capability to nested containers
* [MESOS-8025] - Update the master field in the new CLI config to accept a URL instead of an <ip:port>
* [MESOS-8206] - Add the pip-requirements from other modules to the pylint virtual environment
* [MESOS-8380] - Update WebUI to show local resource providers.
* [MESOS-8403] - Add agent HTTP API operator call to mark local resource providers as gone
* [MESOS-8880] - Add minimum capabilities in the master.
* [MESOS-8999] - Add default bodies for libprocess HTTP error responses.
* [MESOS-9133] - Make the range of ports protected by the network/ports isolator configurable.
* [MESOS-9158] - Parallel serving of state-related read-only requests in the Master.
* [MESOS-9194] - Extend request batching to '/roles' endpoint
* [MESOS-9223] - Storage local provider does not sufficiently handle container launch failures or errors
* [MESOS-9224] - De-duplicate read-only requests to master based on principal.
* [MESOS-9239] - Improve sorting performance in the DRF sorter.
* [MESOS-9249] - Avoid dirtying the DRF sorter when allocating resources.
* [MESOS-9255] - Use consistent "totals" across role / framework DRF.
* [MESOS-9258] - Prevent subscribers to the master's event stream from leaking connections
* [MESOS-9275] - Allow optional `profile` to be specified in `CREATE_DISK` offer operation.
* [MESOS-9292] - Rejected quotas request error messages should specify which resources were overcommitted.
* [MESOS-9301] - Add flag to disable per-framework metrics.
* [MESOS-9305] - Create cgoup recursively to workaround systemd deleting cgroups_root.
* [MESOS-9315] - Adding support for implicit allocation of mandatory custom resources in Mesos
* [MESOS-9321] - Add an optional `vendor` field in `Resource.DiskInfo.Source`.
* [MESOS-9340] - Log all socket errors in libprocess.
* [MESOS-9384] - Resource providers reported by master should reflect connected resource providers
* [MESOS-9406] - Allow for optionally unbundled leveldb from CMake builds.
* [MESOS-9486] - Set up `object.value` for `CREATE_DISK` and `DESTROY_DISK` authorizations.
* [MESOS-9504] - Use ResourceQuantities in the allocator and sorter to improve performance.
* [MESOS-9510] - Disallowed nan, inf and so on in `Value::Scalar`.
* [MESOS-9516] - Extend `min_allocatable_resources` flag to cover non-scalar resources.
* [MESOS-9523] - Add per-framework allocatable resources matcher/filter.
* [MESOS-9540] - Support `DESTROY_DISK` on preprovisioned CSI volumes.
* [MESOS-9608] - Refactor and Improve `class ResourceQuantity`.
* [MESOS-9613] - Support seccomp `unconfined` option for whitelisting.
* [MESOS-9628] - Consider running tox as part of test suite, not as part of style checking
* [MESOS-9642] - Avoid reading host mount table when allocating a gid in GIDManager.
* [MESOS-9643] - Make setting volume ownership asynchronous in volume gid manager
* [MESOS-9655] - Improving SLRP tests for preprovisioned volumes.
* [MESOS-9704] - Support docker manifest v2s2 config GC.
** Task
* [MESOS-4509] - Remove deprecated .json endpoints.
* [MESOS-5827] - Add example framework for using inverse offers
* [MESOS-6551] - Add attach/exec commands to the Mesos CLI
* [MESOS-6630] - Add some benchmark test for quota allocation
* [MESOS-6840] - Tests for quota capacity heuristic.
* [MESOS-8241] - Add metrics for offer operation feedback
* [MESOS-8528] - Design Doc for Storage External Resource Provider (SERP) support.
* [MESOS-8770] - Use Python3 for Mesos support scripts
* [MESOS-8810] - Grant non-root task user the permissions to access the SANDBOX_PATH volume of PARENT type
* [MESOS-8813] - Support multiple tasks with different users can access a persistent volume.
* [MESOS-8957] - Install Python 3 on Mesos CI instances
* [MESOS-8975] - Problem and solution overview for the slow API issue.
* [MESOS-9009] - Support for creation non-existing host paths in a whitelist as source paths
* [MESOS-9032] - Update build scripts to support `seccomp-isolator` flag and `libseccomp` library
* [MESOS-9033] - Add Seccomp-related protobufs
* [MESOS-9034] - Implement a wrapper class for `libseccomp` API
* [MESOS-9035] - Implement `linux/seccomp` isolator
* [MESOS-9099] - Add allocator quota tests regarding reserve/unreserve already allocated resources.
* [MESOS-9105] - Implement Docker Seccomp profile parser.
* [MESOS-9106] - Add seccomp filter into containerizer launcher.
* [MESOS-9229] - Install Python3 on ubuntu-16.04-arm docker image
* [MESOS-9265] - Analyse and pinpoint libprocess SSL failures when using libevent 2.1.8.
* [MESOS-9270] - Get rid of dependency on `net-tools` in network/cni isolator.
* [MESOS-9278] - Add an operation status update manager to the agent
* [MESOS-9318] - Consider providing better operation status updates while an RP is recovering
* [MESOS-9333] - Document usage and build of new Mesos CLI
* [MESOS-9356] - Make agent atomically checkpoint operations and resources
* [MESOS-9392] - Implement tests for Seccomp parser
* [MESOS-9396] - Use the built CLI binary when running new CLI integration tests in CI
* [MESOS-9399] - Update 'mesos task list' to only list running tasks
* [MESOS-9409] - Implement Seccomp isolator tests
* [MESOS-9471] - Master should track operations on agent default resources.
* [MESOS-9472] - Unblock operation feedback on agent default resources.
* [MESOS-9473] - Add end to end tests for operations on agent default resources.
* [MESOS-9477] - Documentation for operation feedback
* [MESOS-9525] - Agent capability for operation feedback on default resources
* [MESOS-9535] - Master should clean up operations from downgraded agents
* [MESOS-9538] - Agent `ReconcileOperations` handler should handle operation affecting default resources
* [MESOS-9578] - Document per framework minimal allocatable resources in framework development guides
* [MESOS-9596] - Add a new `UPDATE_QUOTA` operator call.
* [MESOS-9604] - Clean up `QuotaRequest` and `QuotaInfo`.
* [MESOS-9615] - Example framework for feedback on agent default resources
* [MESOS-9620] - Add metrics for volume gid manager
* [MESOS-9622] - Refactor SLRP with a CSI volume manager.
* [MESOS-9623] - Implement CSI volume manager with CSI v1.
* [MESOS-9624] - Bundle CSI spec v1.0 in Mesos.
* [MESOS-9625] - Make `DiskProfileAdaptor` agnostic to CSI spec version.
* [MESOS-9626] - Make SLRP pick the appropriate CSI versions for plugins.
* [MESOS-9632] - Refactor SLRP with a CSI service manager.
* [MESOS-9639] - Make CSI plugin RPC metrics agnostic to CSI versions.
* [MESOS-9648] - Make operation reconciliation send asynchronous updates
* [MESOS-9651] - Design for docker registry v2 schema2 basic support.
* [MESOS-9676] - Add prettyjws support for docker v2 s1 manifest.
* [MESOS-9694] - Refactor UCR docker store to construct 'Image' protobuf at Puller.
** Documentation
* [MESOS-9036] - Document `linux/seccomp` isolator
Release Notes - Mesos - Version 1.7.4 (WIP)
-------------------------------------------
* This is a bug fix release.
** Bug
* [MESOS-10126] - Docker volume isolator needs to clean up the `info` struct regardless the result of unmount operation
* [MESOS-10134] - Race between concurrent `javah` runs trying to create `java/jni` output directory.
* [MESOS-10169] - Reintroduce image fetch deduplication while keeping it possible to destroy UCR containers in PROVISIONING state.
Release Notes - Mesos - Version 1.7.3
-------------------------------------------
* This is a bug fix release.
** Bug
* [MESOS-8467] - Destroyed executors might be used after `Slave::publishResource()`.
* [MESOS-8537] - Default executor doesn't wait for status updates to be ack'd before shutting down.
* [MESOS-9124] - Agent reconfiguration can cause master to unsuppress on scheduler's behalf.
* [MESOS-9507] - Agent could not recover due to empty docker volume checkpointed files.
* [MESOS-9529] - `/proc` should be remounted even if a nested container set `share_pid_namespace` to true.
* [MESOS-9549] - nvidia/cuda 10 does not work on GPU isolator.
* [MESOS-9564] - Logrotate container logger lets tasks execute arbitrary commands in the Mesos agent's namespace.
* [MESOS-9568] - SLRP does not clean up mount directories for destroyed MOUNT disks.
* [MESOS-9581] - Mesos package naming appears to be undeterministic.
* [MESOS-9590] - Mesos CI sometimes, incorrectly, overwrites already-pushed mesos master nightly images with new images built from non-master branches.
* [MESOS-9607] - Removing a resource provider with consumers breaks resource publishing.
* [MESOS-9610] - Fetcher vulnerability - escaping from sandbox.
* [MESOS-9616] - `Filters.refuse_seconds` declines resources not in offers.
* [MESOS-9619] - Mesos Master Crashes with Launch Group when using Port Resources
* [MESOS-9661] - Agent crashes when SLRP recovers dropped operations.
* [MESOS-9692] - Quota may be under allocated for disk resources.
* [MESOS-9695] - Remove the duplicate pid check in Docker containerizer
* [MESOS-9707] - Calling link::lo() may cause runtime error
* [MESOS-9750] - Agent V1 GET_STATE response may report a complete executor's tasks as non-terminal after a graceful agent shutdown.
* [MESOS-9766] - /__processes__ endpoint can hang.
* [MESOS-9785] - Frameworks recovered from reregistered agents are not reported to master `/api/v1` subscribers.
* [MESOS-9786] - Race between two REMOVE_QUOTA calls crashes the master.
* [MESOS-9787] - Log slow SSL (TLS) peer reverse DNS lookup.
* [MESOS-9803] - Memory leak caused by an infinite chain of futures in `UriDiskProfileAdaptor`.
* [MESOS-9836] - Docker containerizer overwrites `/mesos/slave` cgroups.
* [MESOS-9847] - Docker executor doesn't wait for status updates to be ack'd before shutting down.
* [MESOS-9852] - Slow memory growth in master due to deferred deletion of offer filters and timers.
* [MESOS-9856] - REVIVE call with specified role(s) clears filters for all roles of a framework.
* [MESOS-9868] - NetworkInfo from the agent /state endpoint is not correct.
* [MESOS-9870] - Simultaneous adding/removal of a role from framework's roles and its suppressed roles crashes the master.
* [MESOS-9887] - Race condition between two terminal task status updates for Docker/Command executor.
* [MESOS-9889] - Master CPU high due to unexpected foreachkey behaviour in Master::__reregisterSlave.
* [MESOS-9893] - `volume/secret` isolator should cleanup the stored secret from runtime directory when the container is destroyed.
* [MESOS-9925] - Default executor takes a couple of seconds to start and subscribe Mesos agent.
* [MESOS-9964] - Support destroying UCR containers in provisioning state.
* [MESOS-9966] - Agent crashes when trying to destroy orphaned nested container if root container is orphaned as well.
* [MESOS-9968] - WWWAuthenticate header parsing fails when commas are in (quoted) realm.
* [MESOS-10007] - Command executor can miss exit status for short-lived commands due to double-reaping.
* [MESOS-10015] - updateAllocation() can stall the allocator with a huge number of reservations on an agent.
* [MESOS-10018] - Duplicate tasks if agent partitioned during maintenance down.
* [MESOS-10084] - Detecting whether executor is generated for command task should work when the launcher_dir changes.
* [MESOS-10092] - Cannot pull image from docker registry which does not reply with 'scope'/'service' in WWW-Authenticate header.
** Improvements
* [MESOS-8880] - Add minimum capabilities in the master.
* [MESOS-9159] - Support Foreign URLs in docker registry puller.
* [MESOS-9540] - Support `DESTROY_DISK` on preprovisioned CSI volumes.
* [MESOS-9545] - Marking an unreachable agent as gone should transition the tasks to terminal state.
* [MESOS-9675] - Docker Manifest V2 Schema2 Support.
* [MESOS-9704] - Support docker manifest v2s2 config GC.
* [MESOS-9759] - Log required quota headroom and available quota headroom in the allocator.
* [MESOS-9948] - master::Slave::hasExecutor occupies 37% of a 150 second perf sample.
* [MESOS-10017] - Log all reverse DNS lookup failures in 'legacy' TLS (SSL) hostname validation scheme.
Release Notes - Mesos - Version 1.7.2
-------------------------------------
* This is a bug fix release.
** Bug
* [MESOS-8887] - Unreachable tasks are not GC'ed when unreachable agent is GC'ed.
* [MESOS-9210] - Mesos v1 scheduler library does not properly handle SUBSCRIBE retries.
* [MESOS-9517] - SLRP should treat gRPC timeouts as non-terminal errors, instead of reporting OPERATION_FAILED.
* [MESOS-9531] - chown error handling is incorrect in createSandboxDirectory.
* [MESOS-9532] - ResourceOffersTest.ResourceOfferWithMultipleSlaves is flaky.
* [MESOS-9533] - CniIsolatorTest.ROOT_CleanupAfterReboot is flaky.
* [MESOS-9537] - SLRP sends inconsistent status updates for dropped operations.
* [MESOS-9544] - SLRP does not clean up destroyed persistent volumes.
* [MESOS-9554] - Allocator might skip allocations because a single framework is incapable of receiving certain resources.
* [MESOS-9555] - Allocator CHECK failure: reservationScalarQuantities.contains(role).
** Improvement
* [MESOS-9340] - Log all socket errors in libprocess.
Release Notes - Mesos - Version 1.7.1
-------------------------------------
* This is a bug fix release. Also includes performance and API
improvements:
* **Allocator**: Improved allocation cycle time substantially
(see MESOS-9239 and MESOS-9249). These reduce the allocation
cycle time in some benchmarks by 80%.
* **Scheduler API**: Improved the experimental `CREATE_DISK` and
`DESTROY_DISK` operations for CSI volume recovery (see MESOS-9275
and MESOS-9321). Storage local resource providers now return disk
resources with the `source.vendor` field set, so frameworks needs to
upgrade the `Resource` protobuf definitions.
* **Scheduler API**: Offer operation feedbacks now present their agent
IDs and resource provider IDs (see MESOS-9293).
** Bug
* [MESOS-7042] - Send SIGKILL after SIGTERM to IOSwitchboard after container termination.
* [MESOS-7474] - Mesos fetcher cache doesn't retry when missed.
* [MESOS-8545] - AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.
* [MESOS-8907] - Docker image fetcher fails with HTTP/2.
* [MESOS-8978] - Command executor calling setsid breaks the tty support.
* [MESOS-9131] - Health checks launching nested containers while a container is being destroyed lead to unkillable tasks.
* [MESOS-9152] - Close all file descriptors except whitelist_fds in posix/subprocess.
* [MESOS-9154] - MasterTest.TaskStateMetrics is flaky
* [MESOS-9164] - Subprocess should unset CLOEXEC on whitelisted file descriptors.
* [MESOS-9228] - SLRP does not clean up plugin containers after it is removed.
* [MESOS-9231] - `docker inspect` may return an unexpected result to Docker executor due to a race condition.
* [MESOS-9266] - Whenever our packaging tasks trigger errors we run into permission problems.
* [MESOS-9267] - Mesos agent crashes when CNI network is not configured but used.
* [MESOS-9274] - v1 JAVA scheduler library can drop TEARDOWN upon destruction.
* [MESOS-9279] - Docker Containerizer 'usage' call might be expensive if mount table is big.
* [MESOS-9281] - SLRP gets a stale checkpoint after system crash.
* [MESOS-9283] - Docker containerizer actor can get backlogged with large number of containers.
* [MESOS-9293] - If a framework looses operation information it cannot reconcile to acknowledge updates.
* [MESOS-9295] - Nested container launch could fail if the agent upgrade with new cgroup subsystems.
* [MESOS-9308] - URI disk profile adaptor could deadlock.
* [MESOS-9317] - Some master endpoints do not handle failed authorization properly.
* [MESOS-9324] - Resource fragmentation: frameworks may be starved of port resources in the presence of large number frameworks with quota.
* [MESOS-9332] - Nested container should run as the same user of its parent container by default.
* [MESOS-9334] - Container stuck at ISOLATING state due to libevent poll never returns.
* [MESOS-9362] - Test `CgroupsIsolatorTest.ROOT_CGROUPS_CreateRecursively` is flaky.
* [MESOS-9411] - Validation of JWT tokens using HS256 hashing algorithm is not thread safe.
* [MESOS-9418] - Add support for the `Discard` blkio operation type.
* [MESOS-9419] - Executor to framework message crashes master if framework has not re-registered.
* [MESOS-9474] - Master does not respect authorization result for `CREATE_DISK` and `DESTROY_DISK`.
* [MESOS-9479] - SLRP does not set RP ID in produced OperationStatus.
* [MESOS-9480] - Master may skip processing authorization results for `LAUNCH_GROUP`.
* [MESOS-9492] - Persist CNI working directory across reboot.
* [MESOS-9501] - Mesos executor fails to terminate and gets stuck after agent host reboot.
* [MESOS-9502] - IOswitchboard cleanup could get stuck due to FD leak from a race.
* [MESOS-9505] - `make check` failed with linking errors when c-ares is installed.
* [MESOS-9508] - Official 1.7.0 tarball can't be built on Ubuntu 16.04 LTS.
* [MESOS-9518] - CNI_NETNS should not be set for orphan containers that do not have network namespace.
* [MESOS-9519] - Unable to build Mesos with CMake on Ubuntu 14.04.
** Improvement
* [MESOS-6765] - Make the Resources wrapper "copy-on-write" to improve performance.
* [MESOS-9239] - Improve sorting performance in the DRF sorter.
* [MESOS-9249] - Avoid dirtying the DRF sorter when allocating resources.
* [MESOS-9255] - Use consistent "totals" across role / framework DRF.
* [MESOS-9275] - Allow optional `profile` to be specified in `CREATE_DISK` offer operation.
* [MESOS-9305] - Create cgoup recursively to workaround systemd deleting cgroups_root.
* [MESOS-9321] - Add an optional `vendor` field in `Resource.DiskInfo.Source`.
* [MESOS-9325] - Optimize `Resources::filter` operation.
* [MESOS-9486] - Set up `object.value` for `CREATE_DISK` and `DESTROY_DISK` authorizations.
* [MESOS-9510] - Disallowed nan, inf and so on in `Value::Scalar`.
* [MESOS-9516] - Extend `min_allocatable_resources` flag to cover non-scalar resources.
Release Notes - Mesos - Version 1.7.0
-------------------------------------
This release contains the following highlights:
* Performance Improvements:
* **Master `/state` endpoint:** Adopted RapidJSON and reduced
copying for a ~130% throughput improvement due to a ~55%
decrease in latency (MESOS-9092). Also, added parallel
processing of `/state` requests to reduce master backlogging
/ interference under high request load (MESOS-9122).
* **Allocator:** Improved allocator cycle time significantly
(MESOS-9087). This, together with the reduced master
backlogging from `/state` improvements, reduces the
end-to-end offer cycling time between Mesos and schedulers.
* **Agent `/containers` endpoint:** Fixed a performance issue
that caused high latency / cpu consumption when there are
many containers on the agent (MESOS-8418).
* **Agent container launching performance improvements**:
The expensive `cgroups::verify()` calls were removed which
provides a significant improvement to container launch /
destroy throughput (MESOS-9081).
* Containerization:
* [MESOS-8794] - **Experimental** Supported docker image tarball
fetching from HDFS through the `--docker_registry` agent flag.
* [MESOS-7691] - Added a new option `cgroups/all` to the agent
flag `--isolation`. This allows cgroups isolator to
automatically load all the local enabled cgroups subsystems.
If this option is specified in the agent flag `--isolation`
along with other cgroups related options
(e.g., `cgroups/cpu`), those options will be just ignored.
* [MESOS-7947] - Added a new `--gc_non_executor_container_sandboxes`
option which tells the agent to garbage collect sandboxes created
via the LAUNCH_NESTED_CONTAINER API. The same flag will apply to
standalone container sandboxes in future.
* [MESOS-8327] - Added container-specific cgroups mounts under
`/sys/fs/cgroup` to containers with image launched by Mesos
containerizer.
* [MESOS-5647] - Expose network statistics for containers on
CNI network in the `network/cni` isolator.
* [MESOS-8792] - Added a new `linux/devices` isolator that
automatically populates containers with devices that have
been whitelisted with the `--allowed_devices` agent flag.
* [MESOS-8340] Added a new `--enforce_container_ports`
option to toggle ports resource enforcement by the
`network/ports` isolator.
* [MESOS-6451] - Add timer and percentile metrics for docker
pull latency distribution.
* Windows:
* [MESOS-8668] - Added support to libprocess for the Windows
Thread Pool API, replacing libevent with the native Windows
event and thread pool library. This can be enabled with
`-DENABLE_LIBWINIO=ON` during CMake configuration. By
utilizing I/O Completion Ports, this enables non-blocking
asynchronous I/O on Windows for sockets, pipes, and files.
* Multi-Framework Workloads:
* [MESOS-8842] - **Experimental** Added per-framework metrics
to the master. These new metrics provide detailed information
about the behavior of each framework and can help with
scalability testing, debugging, and fine grained monitoring.
Please refer to docs/monitoring.md for more details.
* [MESOS-8238] Documentation was added in the framework
development guide to provide recommendations on how schedulers
can behave co-operatively in a multi-framework setting, as
well as how to operationally configure Mesos in such a setting.
* [MESOS-8936] A new weighted random sorter was added as an
alternative to the existing DRF sorter, this allows users
that don't need DRF behavior to opt-out.
Additional API Changes:
* [MESOS-9066] - Introduced `CREATE_DISK` and `DESTROY_DISK` offer
operations to replace `CREATE_VOLUME`, `CREATE_BLOCK`,
`DESTROY_VOLUME` and `DESTROY_BLOCK`.
* Container logger module interface has been changed. The `prepare()` method
now takes `ContainerID` and `ContainerConfig` instead.
* `Isolator::recover` interface has been changed to take an `std::vector`
instead of `std::list`.
* JSON endpoints now use rapidjson to provide a performance improvement,
this means that if a client has a JSON de-serializer that does not
conform to the ECMA-404 spec for JSON, they may break. As an example,
Mesos would previously serialize '/' as '\/', but the spec does not
require the escaping and rapidjson does not escape '/'.
Changes to Dependencies:
* [MESOS-8395] - Made gRPC a requirement for Mesos builds. The `--enable-grpc`
Autotools option and the `-DENABLE_GRPC=ON` CMake option is now removed.
* [MESOS-8064] - Mesos now requires libarchive to programmatically decode
.zip, .tar, .gzip, and other common file compression schemes. Version 3.3.2
is bundled in Mesos.
* [MESOS-9092] - Adopt rapidjson for improved json serialization performance.
Version 1.1.0 is bundled in Mesos.
Unresolved Critical Issues:
* [MESOS-1718] - Command executor can overcommit the agent.
* [MESOS-2554] - Slave flaps when using --slave_subsystems that are not used for isolation.
* [MESOS-2774] - SIGSEGV received during process::MessageEncoder::encode()
* [MESOS-2842] - Update FrameworkInfo.principal on framework re-registration
* [MESOS-3747] - HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string
* [MESOS-5396] - After failover, master does not remove agents with same UPID.
* [MESOS-5989] - Libevent SSL Socket downgrade code accesses uninitialized memory / assumes single peek is sufficient.
* [MESOS-5995] - Protobuf JSON deserialisation does not accept numbers formated as strings
* [MESOS-6632] - ContainerLogger might leak FD if container launch fails.
* [MESOS-7076] - libprocess tests fail when using libevent 2.1.8
* [MESOS-7386] - Executor not cleaning up existing running docker containers if external logrotate/logger processes die/killed
* [MESOS-7566] - Master crash due to failed check in DRFSorter::remove
* [MESOS-7622] - Agent can crash if a HTTP executor tries to retry subscription in running state.
* [MESOS-7721] - Master's agent removal rate limit also applies to agent unreachability.
* [MESOS-7748] - Slow subscribers of streaming APIs can lead to Mesos OOMing.
* [MESOS-7911] - Non-checkpointing framework's tasks should not be marked LOST when agent disconnects.
* [MESOS-7991] - fatal, check failed !framework->recovered()
* [MESOS-8038] - Launching GPU task sporadically fails.
* [MESOS-8137] - Mesos agent can hang during startup.
* [MESOS-8256] - Libprocess can silently deadlock due to worker thread exhaustion.
* [MESOS-8257] - Unified Containerizer "leaks" a target container mount path to the host FS when the target resolves to an absolute path
* [MESOS-8522] - `prepareMounts` in Mesos containerizer is flaky.
* [MESOS-8623] - Crashed framework brings down the whole Mesos cluster
* [MESOS-8679] - If the first KILL stuck in the default executor, all other KILLs will be ignored.
* [MESOS-8703] - Mesos master can`t reconnect to zookeeper
* [MESOS-8731] - mesos master APIs become latent
* [MESOS-8769] - Agent crashes when CNI config not defined
* [MESOS-8803] - Libprocess deadlocks in a test.
* [MESOS-8840] - `cpu.cfs_quota_us` may be accidentally set for command task using docker during agent recovery.
* [MESOS-8927] - Default executor cannot kill tasks if `LAUNCH_NESTED_CONTAINER` is stuck.
* [MESOS-9006] - The agent's GET_AGENT leaks resource information when using authorization
* [MESOS-9022] - Race condition in task updates could cause missing event in streaming
* [MESOS-9049] - Agent GC could unmount a dangling persistent volume multiple times.
* [MESOS-9053] - Network ports isolator can falsely trigger while destroying containers.
* [MESOS-9109] - Windows agent uses reserved character :(colon) for file name and crashes when attempting to remove link
* [MESOS-9131] - Health checks launching nested containers while a container is being destroyed lead to unkillable tasks
* [MESOS-9157] - cannot pull docker image from dockerhub
* [MESOS-9169] - docker image fetching fails
All Resolved Issues:
** Bug
* [MESOS-2199] - Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
* [MESOS-3202] - Avoid role/framework offer starvation in DRF allocator.
* [MESOS-3475] - TestContainerizer should not modify global environment variables.
* [MESOS-3790] - ZooKeeper connection should retry on EAI_NONAME
* [MESOS-5371] - Implement `fcntl.hpp`
* [MESOS-5904] - Process routes implementation seems to drop routes on Windows.
* [MESOS-6092] - Docker containerizer launch command may access a "Container" struct after it has been destroyed
* [MESOS-6622] - NvidiaGpuTest.ROOT_INTERNET_CURL_CGROUPS_NVIDIA_GPU_NvidiaDockerImage is flaky
* [MESOS-6823] - bool/UserContainerLoggerTest.ROOT_LOGROTATE_RotateWithSwitchUserTrueOrFalse/0 is flaky
* [MESOS-6985] - os::getenv() can segfault
* [MESOS-7032] - Mesos fail NvidiaGpuTest.ROOT_INTERNET_CURL_CGROUPS_NVIDIA_GPU_NvidiaDockerImage
* [MESOS-7168] - Agent should validate that the nested container ID does not exceed certain length.
* [MESOS-7220] - 'EXPECT_SOME' and other asserts don't work with 'Try's that have a custom error state.
* [MESOS-7342] - Port Docker tests
* [MESOS-7397] - apply-reviews.py silently fails when using chain mode.
* [MESOS-7658] - apply-reviews.py fails with Unicode characters
* [MESOS-7966] - check for maintenance on agent causes fatal error
* [MESOS-8128] - Make os::pipe file descriptors O_CLOEXEC.
* [MESOS-8134] - SlaveTest.ContainersEndpoint is flaky due to getenv crash.
* [MESOS-8429] - Clean up endpoint socket if the container daemon is destroyed while waiting.
* [MESOS-8499] - Change docker health check image to the new nanoserver one
* [MESOS-8567] - Test UriDiskProfileTest.FetchFromHTTP is flaky.
* [MESOS-8568] - Command checks should always call `WAIT_NESTED_CONTAINER` before `REMOVE_NESTED_CONTAINER`
* [MESOS-8613] - Test `MasterAllocatorTest/*.TaskFinished` is flaky.
* [MESOS-8626] - The 'allocatable' check in the allocator is problematic with multi-role frameworks
* [MESOS-8686] - Mesos build failed with /permissive- + MSVC on windows
* [MESOS-8687] - Check failure in `ProcessBase::_consume()`.
* [MESOS-8786] - CgroupIsolatorProcess accesses subsystem processes directly.
* [MESOS-8830] - Agent gc on old slave sandboxes could empty persistent volume data
* [MESOS-8838] - Consider validating that resubscribing resource providers do not change their name or type
* [MESOS-8857] - Fix subprocess(flags) logic on Windows to handle arguments with quotes
* [MESOS-8871] - Agent may fail to recover if the agent dies before image store cache checkpointed.
* [MESOS-8873] - StorageLocalResourceProviderTest.ROOT_ZeroSizedDisk is flaky.
* [MESOS-8875] - `leveldb::PosixEnv::DeleteFile()` can segfault.
* [MESOS-8884] - Flaky `DockerContainerizerTest.ROOT_DOCKER_MaxCompletionTime`.
* [MESOS-8892] - MasterSlaveReconciliationTest.ReconcileDroppedOperation is flaky
* [MESOS-8897] - ROOT_XFS_QuotaTest.DiskUsageExceedsQuotaWithKill is flaky
* [MESOS-8906] - `UriDiskProfileAdaptor` fails to update profile selectors.
* [MESOS-8913] - Resource provider manager registry leaks file descriptors into executors.
* [MESOS-8917] - Agent leaking file descriptors into forked processes
* [MESOS-8921] - Autotools don't work with newer OpenJDK versions
* [MESOS-8932] - Quota guarantee metric does not handle removal correctly.
* [MESOS-8935] - Quota limit "chopping" can lead to cpu-only and memory-only offers.
* [MESOS-8945] - Master check failure due to CHECK_SOME(providerId).
* [MESOS-8952] - process::await/collect n^2 performance issue
* [MESOS-8954] - python3/post-reviews.py errors due to TypeError.
* [MESOS-8958] - LinuxDevicesIsolatorTest.ROOT_PopulateWhitelistedDevices fails on some boxes.
* [MESOS-8963] - Executor crash trying to print container ID.
* [MESOS-8970] - Tests relying on metrics segfault on some Linux distros.
* [MESOS-8977] - BuildBot uses Docker with AUFS that has a max file length limit of 242 characters
* [MESOS-8979] - python3/push-commits.py fails due to TypeError
* [MESOS-8980] - mesos-slave can deadlock with docker pull
* [MESOS-8985] - Posting to the operator api with 'accept recordio' header can crash the agent
* [MESOS-8987] - Master asks agent to shutdown upon auth errors.
* [MESOS-9000] - Operator API event stream can miss task status updates.
* [MESOS-9007] - XFS disk isolator doesn't clean up project ID from symlinks
* [MESOS-9008] - Fetcher fails to extract some archives containing hardlinks
* [MESOS-9010] - `UPDATE_STATE` can race with `UPDATE_OPERATION_STATUS` for a resource provider.
* [MESOS-9014] - MasterAPITest.SubscribersReceiveHealthUpdates is flaky
* [MESOS-9025] - The container which joins CNI network and has checkpoint enabled will be mistakenly destroyed by agent
* [MESOS-9027] - GPU Isolator still depends on cgroups/devices agent flag given cgrous/all is supported.
* [MESOS-9037] - DefaultExecutorTest.SigkillExecutor is flaky
* [MESOS-9038] - Archiver utility extracts links within subdirectories incorrectly
* [MESOS-9039] - CNI isolator recovery should wait until unknown orphan cleanup is done
* [MESOS-9051] - Move agent call validation into common validation library.
* [MESOS-9065] - Apply the `override` keyword globally.
* [MESOS-9073] - Tox doesn't run in the support virtualenv when using Python 3 mesos-style.py
* [MESOS-9075] - Virtualenv management in support directory is buggy.
* [MESOS-9094] - On macOS libprocess_tests fail to link when compiling with gRPC
* [MESOS-9114] - cmake build is broken on macos
* [MESOS-9115] - Stout depends on missing rapidjson headers.
* [MESOS-9116] - Launch nested container session fails due to incorrect detection of `mnt` namespace of command executor's task.
* [MESOS-9125] - Port mapper CNI plugin might fail with "Resource temporarily unavailable"
* [MESOS-9127] - Port mapper CNI plugin might deadlock iptables on the agent.
* [MESOS-9137] - GRPC build fails to pass compiler flags
* [MESOS-9142] - CNI detach might fail due to missing network config file.
* [MESOS-9144] - Master authentication handling leads to request amplification.
* [MESOS-9145] - Master has a fragile burned-in 5s authentication timeout.
* [MESOS-9146] - Agent has a fragile burn-in 5s authentication timeout.
* [MESOS-9147] - Agent and scheduler driver authentication retry backoff time could overflow.
* [MESOS-9149] - Failed to build gRPC on Linux without OpenSSL.
* [MESOS-9151] - Container stuck at ISOLATING due to FD leak
* [MESOS-9156] - StorageLocalResourceProviderProcess can deadlock
* [MESOS-9160] - Failed to compile gRPC when the build path contains symlinks.
* [MESOS-9163] - `UriDiskProfileAdaptor` should not update profiles when a poll returns a non-OK HTTP status.
* [MESOS-9170] - Zookeeper doesn't compile with newer gcc due to format error
* [MESOS-9171] - Mesos agent crashes in CNI isolator when usage is queried
* [MESOS-9177] - Mesos master segfaults when responding to /state requests.
* [MESOS-9185] - An attempt to remove or destroy container in composing containerizer leads to segfault.
* [MESOS-9193] - Mesos build fail with Clang 3.5.
* [MESOS-9196] - Removing rootfs mounts may fail with EBUSY.
** Epic
* [MESOS-8564] - Port libprocess-tests suites to Windows
* [MESOS-8668] - Transition libprocess on Windows to use the Thread Pool API
* [MESOS-8705] - Composing containerizer improvements
* [MESOS-8842] - Per Framework Metrics on Master
* [MESOS-8916] - Allocation logic cleanup.
* [MESOS-9013] - Support container Cgroup FS mount.
** Improvement
* [MESOS-6451] - Add timer and percentile for docker pull latency distribution.
* [MESOS-7691] - Support local enabled cgroups subsystems automatically.
* [MESOS-7947] - Add GC capability to nested containers
* [MESOS-8064] - Add capability so mesos can programmatically decode .zip, .tar, .gzip, and other common file compression schemes
* [MESOS-8106] - Docker fetcher plugin unsupported scheme failure message is not accurate.
* [MESOS-8340] - Add a no-enforce option to the `network/ports` isolator.
* [MESOS-8418] - mesos-agent high cpu usage because of numerous /proc/mounts reads
* [MESOS-8680] - Rename variable names in slave.hpp to be more explicit.
* [MESOS-8788] - Add alg RS256 support for JWT generator and validator in libprocess
* [MESOS-8792] - Automatically create whitelisted devices.
* [MESOS-8798] - Build the "unsecure" gRPC libraries to remove SSL dependency.
* [MESOS-8829] - Get rid of extra `containerizer->wait()` calls in tests.
* [MESOS-8908] - Add -fno-omit-frame-pointer to improve debugging and profiling.
* [MESOS-8911] - Add framework metrics benchmark test.
* [MESOS-8919] - Per Framework SUBSCRIBE metrics.
* [MESOS-8920] - Support per-container container logger configuration.
* [MESOS-8924] - Refactor the libprocess gRPC warpper.
* [MESOS-8955] - Manage Python2 and 3 in build steps
* [MESOS-8986] - `slave.available()` in the allocator is expensive and drags down allocation performance.
* [MESOS-8989] - Add a better benchmark for range type resources.
* [MESOS-8998] - Allow for unbundled libevent in CMake builds to work around 2.1.x SSL issues.
* [MESOS-9015] - Allow resources to be removed when updating the sorter.
* [MESOS-9055] - Make gRPC call deadline configurable.
* [MESOS-9067] - Improve performance of json parsing by avoiding conversion cost.
* [MESOS-9081] - cgroups::verify is expensive and is done implicitly during cgroups operations.
* [MESOS-9086] - Optimize range subtraction operation.
* [MESOS-9092] - Adopt rapidjson for improved json serialization performance.
* [MESOS-9104] - Refactor capability related logic in the allocator.
* [MESOS-9110] - Add move support to the Resources / Resource_ wrappers.
* [MESOS-9122] - Batch '/state' requests in the Master actor.
* [MESOS-9129] - Port mapper CNI plugin should use '-n' option with 'iptables --list'
* [MESOS-9213] - Avoid double copying of master->framework messages when incrementing metrics.
** Task
* [MESOS-2633] - Move implementations of Framework struct functions out of master.hpp.
* [MESOS-3442] - Port path_tests to Windows
* [MESOS-3444] - Port sendfile_tests
* [MESOS-5647] - Expose network statistics for containers on CNI network in the `network/cni` isolator.
* [MESOS-5814] - Port libprocess http_tests.cpp
* [MESOS-5817] - Port libprocess process_tests.cpp
* [MESOS-5941] - RemoteLink tests fail on Windows
* [MESOS-7329] - Authorize offer operations for converting disk resources
* [MESOS-7527] - Enable ProcessTest.THREADSAFE_Http2 on Windows.
* [MESOS-8314] - Add authorization to display of resource provider information in API calls and endpoints
* [MESOS-8327] - Add container-specific CGroup FS mounts under /sys/fs/cgroup/* to Mesos containers
* [MESOS-8383] - Add metrics for operations in Storage Local Resource Provider (SLRP).
* [MESOS-8395] - Made gRPC a requirement for Mesos builds.
* [MESOS-8473] - Authorize `GET_OPERATIONS` calls.
* [MESOS-8670] - Implement `process::io::read/write` using Thread Pool API
* [MESOS-8671] - Add EventLoop implementation using Thread Pool API
* [MESOS-8672] - Replace libprocess `PollSocketImpl` with IOCP and Thread Pool API
* [MESOS-8674] - Fix os::pipe to work in overlapped mode
* [MESOS-8681] - Clean up os::sendfile on Windows
* [MESOS-8712] - Remove `destroyed` promise from `Container` struct
* [MESOS-8713] - Synchronize result of `wait` and `destroy` composing c'zer methods
* [MESOS-8714] - Cleanup `containers_` hashmap once container exits
* [MESOS-8732] - Use composing containerizer in some agent tests.
* [MESOS-8734] - Restore `WaitAfterDestroy` test to check termination status of a terminated nested container.
* [MESOS-8736] - Implement a test which ensures that `wait` and `destroy` return the same result for a terminated nested container.
* [MESOS-8737] - Update composing containerizer tests.
* [MESOS-8774] - Authenticate and authorize calls to the resource provider manager's API
* [MESOS-8794] - Support docker image tarball hdfs based fetching.
* [MESOS-8814] - Mount the volume based on `Volume.mode`.
* [MESOS-8825] - Remove storage pools associated with missing profiles.
* [MESOS-8837] - Add test of resource provider manager recovery
* [MESOS-8843] - Per Framework CALL metrics
* [MESOS-8844] - Per Framework EVENT metrics
* [MESOS-8845] - Per Framework Operation metrics
* [MESOS-8846] - Per Framework state metrics
* [MESOS-8847] - Per Framework task state metrics
* [MESOS-8848] - Per Framework Offer metrics
* [MESOS-8849] - Per Framework resource allocation metrics
* [MESOS-8903] - Update the Python CLI to use Python 3
* [MESOS-8912] - Per Framework terminal task state metrics
* [MESOS-8931] - Add os::shell back to Windows
* [MESOS-8934] - Update python.m4 to support Python 3
* [MESOS-8936] - Implement a Random Sorter for offer allocations.
* [MESOS-8940] - Per Framework Offer metrics with a specific resource type
* [MESOS-8942] - Master streaming API does not send (health) check updates for tasks.
* [MESOS-8943] - Add metrics about CSI calls.
* [MESOS-8961] - Output of tasks gets corrupted if task defines the same environment variables as the executor container
* [MESOS-8990] - Build failure of the google-test dependency on Windows using MSVC.
* [MESOS-8995] - Add SLRP unit tests for missing profiles.
* [MESOS-8997] - Consider dropping PATH disk support for CSI volumes.
* [MESOS-9002] - GCC 8.1 build failure in os::Fork::Tree.
* [MESOS-9043] - Move check validators to the common validation library.
* [MESOS-9066] - Changing `CREATE_VOLUME` and `CREATE_BLOCK` to `CREATE_DISK`.
* [MESOS-9068] - Add a metrics benchmark in libprocess.
* [MESOS-9070] - Support systemd and freezer cgroup subsystems bind mount for container with rootfs.
* [MESOS-9148] - Make cgroups destroy timeout configurable for Mesos containerizer
** Documentation
* [MESOS-8740] - Update description of a Containerizer interface.
* [MESOS-9020] - Seccomp design doc
Release Notes - Mesos - Version 1.6.3 (WIP)
-------------------------------------------
* This is a bug fix release.
** Bug
* [MESOS-9124] - Agent reconfiguration can cause master to unsuppress on scheduler's behalf.
* [MESOS-9507] - Agent could not recover due to empty docker volume checkpointed files.
* [MESOS-9529] - `/proc` should be remounted even if a nested container set `share_pid_namespace` to true.
* [MESOS-9564] - Logrotate container logger lets tasks execute arbitrary commands in the Mesos agent's namespace.
* [MESOS-9616] - `Filters.refuse_seconds` declines resources not in offers.
* [MESOS-9619] - Mesos Master Crashes with Launch Group when using Port Resources
* [MESOS-9692] - Quota may be under allocated for disk resources.
* [MESOS-9695] - Remove the duplicate pid check in Docker containerizer
* [MESOS-9707] - Calling link::lo() may cause runtime error
* [MESOS-9766] - /__processes__ endpoint can hang.
* [MESOS-9786] - Race between two REMOVE_QUOTA calls crashes the master.
* [MESOS-9787] - Log slow SSL (TLS) peer reverse DNS lookup.
* [MESOS-9836] - Docker containerizer overwrites `/mesos/slave` cgroups.
* [MESOS-9852] - Slow memory growth in master due to deferred deletion of offer filters and timers.
* [MESOS-9856] - REVIVE call with specified role(s) clears filters for all roles of a framework.
* [MESOS-9868] - NetworkInfo from the agent /state endpoint is not correct.
* [MESOS-9870] - Simultaneous adding/removal of a role from framework's roles and its suppressed roles crashes the master.
* [MESOS-9887] - Race condition between two terminal task status updates for Docker/Command executor.
* [MESOS-9889] - Master CPU high due to unexpected foreachkey behaviour in Master::__reregisterSlave.
* [MESOS-9893] - `volume/secret` isolator should cleanup the stored secret from runtime directory when the container is destroyed.
* [MESOS-10007] - Command executor can miss exit status for short-lived commands due to double-reaping.
* [MESOS-10126] - Docker volume isolator needs to clean up the `info` struct regardless the result of unmount operation
** Improvement
* [MESOS-8880] - Add minimum capabilities in the master.
* [MESOS-9159] - Support Foreign URLs in docker registry puller.
* [MESOS-9675] - Docker Manifest V2 Schema2 Support.
* [MESOS-9704] - Support docker manifest v2s2 config GC.
* [MESOS-9759] - Log required quota headroom and available quota headroom in the allocator.
* [MESOS-9948] - master::Slave::hasExecutor occupies 37% of a 150 second perf sample.
* [MESOS-10017] - Log all reverse DNS lookup failures in 'legacy' TLS (SSL) hostname validation scheme.
Release Notes - Mesos - Version 1.6.2
-------------------------------------------
* This is a bug fix release.
** Bug
* [MESOS-7042] - Send SIGKILL after SIGTERM to IOSwitchboard after container termination.
* [MESOS-7474] - Mesos fetcher cache doesn't retry when missed.
* [MESOS-8128] - Make os::pipe file descriptors O_CLOEXEC.
* [MESOS-8418] - mesos-agent high cpu usage because of numerous /proc/mounts reads.
* [MESOS-8545] - AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.
* [MESOS-8568] - Command checks should always call `WAIT_NESTED_CONTAINER` before `REMOVE_NESTED_CONTAINER`
* [MESOS-8887] - Unreachable tasks are not GC'ed when unreachable agent is GC'ed.
* [MESOS-8907] - Docker image fetcher fails with HTTP/2.
* [MESOS-8917] - Agent leaking file descriptors into forked processes
* [MESOS-8921] - Autotools don't work with newer OpenJDK versions
* [MESOS-8978] - Command executor calling setsid breaks the tty support.
* [MESOS-9116] - Launch nested container session fails due to incorrect detection of `mnt` namespace of command executor's task.
* [MESOS-9125] - Port mapper CNI plugin might fail with "Resource temporarily unavailable"
* [MESOS-9127] - Port mapper CNI plugin might deadlock iptables on the agent.
* [MESOS-9131] - Health checks launching nested containers while a container is being destroyed lead to unkillable tasks.
* [MESOS-9142] - CNI detach might fail due to missing network config file.
* [MESOS-9144] - Master authentication handling leads to request amplification.
* [MESOS-9145] - Master has a fragile burned-in 5s authentication timeout.
* [MESOS-9146] - Agent has a fragile burn-in 5s authentication timeout.
* [MESOS-9147] - Agent and scheduler driver authentication retry backoff time could overflow.
* [MESOS-9151] - Container stuck at ISOLATING due to FD leak.
* [MESOS-9152] - Close all file descriptors except whitelist_fds in posix/subprocess.
* [MESOS-9164] - Subprocess should unset CLOEXEC on whitelisted file descriptors.
* [MESOS-9170] - Zookeeper doesn't compile with newer gcc due to format error.
* [MESOS-9196] - Removing rootfs mounts may fail with EBUSY.
* [MESOS-9210] - Mesos v1 scheduler library does not properly handle SUBSCRIBE retries.
* [MESOS-9231] - `docker inspect` may return an unexpected result to Docker executor due to a race condition.
* [MESOS-9267] - Mesos agent crashes when CNI network is not configured but used.
* [MESOS-9274] - v1 JAVA scheduler library can drop TEARDOWN upon destruction.
* [MESOS-9279] - Docker Containerizer 'usage' call might be expensive if mount table is big.
* [MESOS-9283] - Docker containerizer actor can get backlogged with large number of containers.
* [MESOS-9308] - URI disk profile adaptor could deadlock.
* [MESOS-9317] - Some master endpoints do not handle failed authorization properly.
* [MESOS-9324] - Resource fragmentation: frameworks may be starved of port resources in the presence of large number frameworks with quota.
* [MESOS-9332] - Nested container should run as the same user of its parent container by default.
* [MESOS-9334] - Container stuck at ISOLATING state due to libevent poll never returns.
* [MESOS-9411] - Validation of JWT tokens using HS256 hashing algorithm is not thread safe.
* [MESOS-9418] - Add support for the `Discard` blkio operation type.
* [MESOS-9419] - Executor to framework message crashes master if framework has not re-registered.
* [MESOS-9480] - Master may skip processing authorization results for `LAUNCH_GROUP`.
* [MESOS-9492] - Persist CNI working directory across reboot.
* [MESOS-9501] - Mesos executor fails to terminate and gets stuck after agent host reboot.
* [MESOS-9502] - IOswitchboard cleanup could get stuck due to FD leak from a race.
* [MESOS-9518] - CNI_NETNS should not be set for orphan containers that do not have network namespace.
* [MESOS-9531] - chown error handling is incorrect in createSandboxDirectory.
* [MESOS-9532] - ResourceOffersTest.ResourceOfferWithMultipleSlaves is flaky.
* [MESOS-9533] - CniIsolatorTest.ROOT_CleanupAfterReboot is flaky.
* [MESOS-9555] - Allocator CHECK failure: reservationScalarQuantities.contains(role).
** Improvement
* [MESOS-8418] - mesos-agent high cpu usage because of numerous /proc/mounts reads
* [MESOS-9305] - Create cgoup recursively to workaround systemd deleting cgroups_root.
* [MESOS-9340] - Log all socket errors in libprocess.
* [MESOS-9501] - Mesos executor fails to terminate and gets stuck after agent host reboot.
* [MESOS-9502] - IOswitchboard cleanup could get stuck due to FD leak from a race.
* [MESOS-9510] - Disallowed nan, inf and so on in `Value::Scalar`.
* [MESOS-9516] - Extend `min_allocatable_resources` flag to cover non-scalar resources.
Release Notes - Mesos - Version 1.6.1
-------------------------------------
* This is a bug fix release.
** Bug
* [MESOS-3790] - ZooKeeper connection should retry on `EAI_NONAME`.
* [MESOS-8106] - Docker fetcher plugin unsupported scheme failure message is not accurate.
* [MESOS-8786] - CgroupIsolatorProcess accesses subsystem processes directly.
* [MESOS-8830] - Agent gc on old slave sandboxes could empty persistent volume data
* [MESOS-8871] - Agent may fail to recover if the agent dies before image store cache checkpointed.
* [MESOS-8904] - Master crash when removing quota.
* [MESOS-8906] - `UriDiskProfileAdaptor` fails to update profile selectors.
* [MESOS-8935] - Quota limit "chopping" can lead to cpu-only and memory-only offers.
* [MESOS-8936] - Implement a Random Sorter for offer allocations.
* [MESOS-8942] - Master streaming API does not send (health) check updates for tasks.
* [MESOS-8945] - Master check failure due to CHECK_SOME(providerId).
* [MESOS-8947] - Improve the container preparing logging in IOSwitchboard and volume/secret isolator.
* [MESOS-8952] - process::await/collect n^2 performance issue.
* [MESOS-8963] - Executor crash trying to print container ID.
* [MESOS-8980] - mesos-slave can deadlock with docker pull.
* [MESOS-8986] - `slave.available()` in the allocator is expensive and drags down allocation performance.
* [MESOS-8987] - Master asks agent to shutdown upon auth errors.
* [MESOS-9002] - GCC 8.1 build failure in os::Fork::Tree.
* [MESOS-9024] - Mesos master segfaults with stack overflow under load.
* [MESOS-9025] - The container which joins CNI network and has checkpoint enabled will be mistakenly destroyed by agent.
* [MESOS-9049] - Agent GC could unmount a dangling persistent volume multiple times.
** Improvement
* [MESOS-8934] - Update python.m4 to support Python 3.
Release Notes - Mesos - Version 1.6.0
-------------------------------------
This release contains the following new features:
* [MESOS-4965] - **Experimental** Persistent volumes can be resized
through new offer operations and V1 operator API now.
* [MESOS-6575] - Added a new `--xfs_kill_containers` flag to the
Mesos agent. This causes the `disk/xfs` isolator to terminate
containers that exceed their disk quota.
* [MESOS-7944] - **Experimental** Added a new `MemoryProfiler` class to
libprocess to aid in debugging memory issues.
* [MESOS-8054] - **Experimental** Schedulers can now receive feedback about
offer operations which operate on resources managed by resource providers.
In the future, this feature will be extended to operations on agent default
resources.
* [MESOS-8534] - **Experimental** A nested container is now allowed
to join a separate CNI network than its parent container.
* [MESOS-8572] - Improvements to the Docker containerizer and executor
to more gracefully handle situations in which the Docker CLI is
unresponsive.
* [MESOS-8607] - The `mesos-execute` tool has been ported to Windows.
* [MESOS-8649] - **Experimental** Support for Container Storage Interface
(CSI) version 0.2 in Mesos.
* [MESOS-8659] - The Windows build now links the C runtime library
dynamically instead of statically. This requires the Visual Studio
redistributable to be available at runtime.
* [MESOS-8682] - The use of the C runtime library's POSIX wrappers on
Windows has been deprecated in favor of the native Windows APIs.
* [MESOS-8725] - Added a new `max_completion_time` field to `TaskInfo`.
Tasks which do not complete at the end of the specified duration will
fail with a new reason `REASON_MAX_COMPLETION_TIME_REACHED`.
* [MESOS-8801] - **Experimental** On Linux, Mesos can now be
configured to use the jemalloc allocator by default via the
`--enable-jemalloc-allocator` configuration option.
* Agents now support the `--fetcher_stall_timeout` flag which allows container
image and artifact fetchers to abort after the timeout when downloads stall.
Deprecations/Removals:
* Support for CSI v0.1 is deprecated in favor of CSI v0.2.
Additional API Changes:
* [MESOS-8306] - Authorization of resource reservation has been updated
to allow the restriction of which agents can statically reserve
resources for which roles.
* [MESOS-8332] - Container sandbox permissions have been changed from
0755 to 0750.
* [MESOS-8388] - Local resource provider resources are now included in
the responses to the GET_AGENTS and GET_RESOURCE_PROVIDER calls.
* [MESOS-8534] - Nested containers within a task group can now specify
separate network namespaces.
Changes to Dependencies:
* Upgraded minimum required gRPC library to version 1.10+ for gRPC-enabled builds.
Unresolved Critical Issues:
* [MESOS-1718] - Command executor can overcommit the agent.
* [MESOS-2554] - Slave flaps when using --slave_subsystems that are not used for isolation.
* [MESOS-2774] - SIGSEGV received during process::MessageEncoder::encode()
* [MESOS-2842] - Update FrameworkInfo.principal on framework re-registration
* [MESOS-3533] - Unable to find and run URIs files
* [MESOS-3747] - HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string
* [MESOS-5396] - After failover, master does not remove agents with same UPID.
* [MESOS-5989] - Libevent SSL Socket downgrade code accesses uninitialized memory / assumes single peek is sufficient.
* [MESOS-5995] - Protobuf JSON deserialisation does not accept numbers formated as strings
* [MESOS-6632] - ContainerLogger might leak FD if container launch fails.
* [MESOS-7386] - Executor not cleaning up existing running docker containers if external logrotate/logger processes die/killed
* [MESOS-7566] - Master crash due to failed check in DRFSorter::remove
* [MESOS-7622] - Agent can crash if a HTTP executor tries to retry subscription in running state.
* [MESOS-7721] - Master's agent removal rate limit also applies to agent unreachability.
* [MESOS-7748] - Slow subscribers of streaming APIs can lead to Mesos OOMing.
* [MESOS-7911] - Non-checkpointing framework's tasks should not be marked LOST when agent disconnects.
* [MESOS-7966] - check for maintenance on agent causes fatal error
* [MESOS-7991] - fatal, check failed !framework->recovered()
* [MESOS-8137] - Mesos agent can hang during startup.
* [MESOS-8256] - Libprocess can silently deadlock due to worker thread exhaustion.
* [MESOS-8257] - Unified Containerizer "leaks" a target container mount path to the host FS when the target resolves to an absolute path
* [MESOS-8522] - `prepareMounts` in Mesos containerizer is flaky.
* [MESOS-8623] - Crashed framework brings down the whole Mesos cluster
* [MESOS-8679] - If the first KILL stuck in the default executor, all other KILLs will be ignored.
* [MESOS-8703] - Mesos master can`t reconnect to zookeeper
* [MESOS-8731] - mesos master APIs become latent
* [MESOS-8769] - Agent crashes when CNI config not defined
* [MESOS-8803] - Libprocess deadlocks in a test.
* [MESOS-8830] - Agent gc on old slave sandboxes could empty persistent volume data
* [MESOS-8840] - `cpu.cfs_quota_us` may be accidentally set for command task using docker during agent recovery.
Feature Graduations:
* [MESOS-4828] - XFS disk quota isolator.
* [MESOS-6906] - Introduce a general non-interpreting task check.
All Experimental Features:
* [MESOS-3094] - Mesos on Windows.
* [MESOS-3421] - Support sharing of resources across task instances.
* [MESOS-4312] - Porting Mesos on Power (ppc64le).
* [MESOS-4355] - Implement isolator for Docker volume.
* [MESOS-4965] - Persistent volume resizing.
* [MESOS-5344] - Partition-aware Mesos frameworks.
* [MESOS-5788] - Added JAVA API adapter for seamless transition to new scheduler API.
* [MESOS-5931] - Support auto backend in Mesos Containerizer.
* [MESOS-6014] - Added port mapping CNI plugin.
* [MESOS-7944] - Libprocess `MemoryProfiler`.
* [MESOS-8054] - Offer operation feedback.
* [MESOS-8534] - Separate CNI networks for nested containers.
* [MESOS-8649] - Support for Container Storage Interface version 0.2.
* [MESOS-8801] - Linux support for jemalloc.
All Resolved Issues:
** Bug
* [MESOS-1720] - Slave should send exited executor message when the executor is never launched.
* [MESOS-3915] - Upgrade vendored Boost
* [MESOS-4420] - Support read host physical link speed from virtio driver
* [MESOS-5333] - GET /master/maintenance/schedule/ produces 404.
* [MESOS-5820] - Port master to Windows
* [MESOS-5882] - `os::cloexec` does not exist on Windows
* [MESOS-5940] - `setPaths` doesn't work on Windows
* [MESOS-6555] - Namespace 'mnt' is not supported
* [MESOS-6713] - Port `slave_recovery_tests.cpp`
* [MESOS-6715] - Port `uri_fetcher_tests.cpp`
* [MESOS-6822] - CNI reports confusing error message for failed interface setup.
* [MESOS-6973] - Fix BOOST random generator initialization on Windows
* [MESOS-7028] - NetSocketTest.EOFBeforeRecv is flaky.
* [MESOS-7342] - Port Docker tests
* [MESOS-7506] - Multiple tests leave orphan containers.
* [MESOS-7604] - SlaveTest.ExecutorReregistrationTimeoutFlag aborts on Windows
* [MESOS-7699] - "stdlib.h: No such file or directory" when building with GCC 6 (Debian stable freshly released)
* [MESOS-7742] - Race conditions in IOSwitchboard: listening on unix socket and premature closing of the connection.
* [MESOS-7803] - fs::list drops path components on Windows
* [MESOS-7944] - Implement jemalloc memory profiling support for Mesos
* [MESOS-7979] - reviewboard's GUESS_FIELDS setting leads to redundant information in commit messages
* [MESOS-8125] - Agent should properly handle recovering an executor when its pid is reused
* [MESOS-8140] - Executors should clear their auth tokens
* [MESOS-8232] - SlaveTest.RegisteredAgentReregisterAfterFailover is flaky.
* [MESOS-8258] - Mesos.DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer is flaky.
* [MESOS-8305] - DefaultExecutorTest.ROOT_MultiTaskgroupSharePidNamespace is flaky.
* [MESOS-8308] - CommandExecutorCheckTest.CommandCheckTimeout is flaky on Windows
* [MESOS-8334] - PartitionedSlaveReregistrationMasterFailover is flaky.
* [MESOS-8336] - MasterTest.RegistryUpdateAfterReconfiguration is flaky
* [MESOS-8348] - Enable function sections in the build.
* [MESOS-8350] - Resource provider-capable agents not correctly synchronizing checkpointed agent resources on reregistration
* [MESOS-8404] - Improve image puller error messages.
* [MESOS-8411] - Killing a queued task can lead to the command executor never terminating.
* [MESOS-8413] - Zookeeper configuration passwords are shown in clear text
* [MESOS-8416] - CHECK failure if trying to recover nested containers but the framework checkpointing is not enabled.
* [MESOS-8440] - `network/ports` isolator kills legitimate tasks on recovery.
* [MESOS-8444] - GC failure causes agent miss to detach virtual paths for the executor's sandbox
* [MESOS-8446] - Agent miss to detach `virtualLatestPath` for the executor's sandbox during recovery
* [MESOS-8447] - Incomplete output of apply-reviews.py --dry-run
* [MESOS-8453] - ExecutorAuthorizationTest.RunTaskGroup segfaults.
* [MESOS-8463] - Test MasterAllocatorTest/1.SingleFramework is flaky
* [MESOS-8468] - `LAUNCH_GROUP` failure tears down the default executor.
* [MESOS-8474] - Test StorageLocalResourceProviderTest.ROOT_ConvertPreExistingVolume is flaky
* [MESOS-8477] - Make clean fails without Python artifacts.
* [MESOS-8480] - Mesos returns high resource usage when killing a Docker task.
* [MESOS-8482] - Signed/Unsigned comparisons in tests
* [MESOS-8483] - ExampleTests PythonFramework fails with sigabort.
* [MESOS-8484] - stout test NumifyTest.HexNumberTest fails.
* [MESOS-8485] - MasterTest.RegistryGcByCount is flaky
* [MESOS-8489] - LinuxCapabilitiesIsolatorFlagsTest.ROOT_IsolatorFlags is flaky
* [MESOS-8490] - UpdateSlaveMessageWithPendingOffers is flaky.
* [MESOS-8497] - Docker parameter `name` does not work with Docker Containerizer.
* [MESOS-8508] - Missing map header when compiling against unbundled protobuf
* [MESOS-8510] - URI disk profile adaptor does not consider plugin type for a profile.
* [MESOS-8512] - Fetcher doesn't log it's stdout/stderr properly to the log file
* [MESOS-8513] - Noisy "transport endpoint is not connected" logs on closing sockets.
* [MESOS-8519] - Fix recovery of job object isolated tasks
* [MESOS-8530] - Default executor tasks can get stuck in KILLING state
* [MESOS-8536] - Pending offer operations on resource provider resources not properly accounted for in allocator
* [MESOS-8545] - AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.
* [MESOS-8546] - PythonFramework test fails with cache write failure.
* [MESOS-8548] - Test StorageLocalResourceProviderTest.ROOT_Metrics is flaky
* [MESOS-8550] - Bug in `Master::detected()` leads to coredump in `MasterZooKeeperTest.MasterInfoAddress`.
* [MESOS-8552] - CGROUPS_ROOT_PidNamespaceForward and CGROUPS_ROOT_PidNamespaceBackward tests fail
* [MESOS-8563] - Windows executors cannot re-register
* [MESOS-8565] - Persistent volumes are not visible in Mesos UI when launching a pod using default executor.
* [MESOS-8577] - Destroy nested container if `LAUNCH_NESTED_CONTAINER_SESSION` fails
* [MESOS-8578] - UpgradeTest.UpgradeAgentIntoHierarchicalRoleForNonHierarchicalRole is flaky.
* [MESOS-8585] - Agent crashes when starting a task with an unknown user.
* [MESOS-8586] - apply-reviews.py silently does nothing when a review was submitted already.
* [MESOS-8594] - Mesos master stack overflow in libprocess socket send loop.
* [MESOS-8598] - Allow empty resource provider selector in `UriDiskProfileAdaptor`.
* [MESOS-8601] - Master crashes during slave reregistration after failover.
* [MESOS-8604] - Quota headroom tracking may be incorrect in the presence of hierarchical reservation.
* [MESOS-8605] - Terminal task status update will not send if 'docker inspect' is hung
* [MESOS-8610] - NsTest.SupportedNamespaces fails on CentOS7
* [MESOS-8611] - SlaveTest.RemoveExecutorUponFailedLaunch is flaky.
* [MESOS-8617] - Tests using default executor occasionally fail.
* [MESOS-8618] - ReconciliationTest.ReconcileStatusUpdateTaskState is flaky.
* [MESOS-8619] - Docker on Windows uses USERPROFILE instead of HOME for credentials
* [MESOS-8620] - Containers stuck in FETCHING possibly due to unresponsive server.
* [MESOS-8624] - Valid tasks may be explicitly dropped by agent due to race conditions
* [MESOS-8631] - Agent should be able to start a task with every CPU on a Windows machine
* [MESOS-8641] - Event stream could send heartbeat before subscribed
* [MESOS-8642] - ballon-executor is hard to run as unprivileged user
* [MESOS-8643] - `os::system` and `os::spawn` returns -1 on valid windows commands
* [MESOS-8644] - W* macros wrong on Windows.
* [MESOS-8646] - Agent should be able to resolve file names on open files.
* [MESOS-8647] - Enable resource provider agent capability by default
* [MESOS-8651] - Potential memory leaks in the `volume/sandbox_path` isolator
* [MESOS-8654] - The `/proc/sys` mount point in Mesos containers should also include `nosuid,noexec,nodev` mount options.
* [MESOS-8659] - Fix warning `cl : Command line warning D9025 : overriding '/MTd' with '/MDd'`
* [MESOS-8664] - Perf sampler doesn't handle extra fields and nameless counters
* [MESOS-8691] - Forward CXX_FLAGS to C++ projects and C_FLAGS to C projects in CMake
* [MESOS-8711] - SlaveTest.ChangeDomain is disabled.
* [MESOS-8719] - Mesos configured with `--enable-grpc` doesn't compile on non-Linux builds
* [MESOS-8724] - G++ Warning about libc system macros `major` and `minor` prevents Mesos build
* [MESOS-8733] - OversubscriptionTest.ForwardUpdateSlaveMessage is flaky
* [MESOS-8741] - `Add` to sequence will not run if it races with sequence destruction
* [MESOS-8742] - Agent resource provider config API calls should be idempotent.
* [MESOS-8749] - CSI proto is always included in the build when using CMake
* [MESOS-8761] - Default linker fails to link tests on FreeBSD
* [MESOS-8781] - Mesos master shouldn't silently drop operations
* [MESOS-8784] - OPERATION_DROPPED operation status updates should include the operation/framework IDs
* [MESOS-8787] - RP-related API should be experimental.
* [MESOS-8804] - Fix Ninja Release builds on Windows
* [MESOS-8818] - VolumeSandboxPathIsolatorTest.SharedParentTypeVolume fails on macOS
* [MESOS-8834] - Indirect recursion between `send` and `_send` in libprocess may cause stack overflow.
* [MESOS-8865] - Suspicious enum value comparisons in scheduler Java bindings
* [MESOS-8866] - CMake builds are missing byproduct declaration for jemalloc.
* [MESOS-8868] - Some 'FsTest' test cases fail on macOS
* [MESOS-8870] - Master does not correctly reconcile dropped operations after agent failover
* [MESOS-8874] - ResourceProviderManagerHttpApiTest.ResubscribeResourceProvider is flaky.
* [MESOS-8876] - Normal exit of Docker container using rexray volume results in TASK_FAILED.
* [MESOS-8881] - Enable epoll backend in libevent integration.
* [MESOS-8885] - Disable libevent debug mode.
** Improvement
* [MESOS-2922] - Add move constructors / assignment to Future.
* [MESOS-3022] - export additional metrics from scheduler driver
* [MESOS-4965] - Support resizing of an existing persistent volume
* [MESOS-5362] - Add authentication to example frameworks
* [MESOS-6128] - Make "re-register" vs. "reregister" consistent in the master
* [MESOS-7016] - Make default AWAIT_* duration configurable
* [MESOS-7643] - The order of isolators provided in '--isolation' flag is not preserved and instead sorted alphabetically
* [MESOS-7656] - Update the JSON <=> protobuf message conversion for map support
* [MESOS-7881] - Building gRPC with CMake
* [MESOS-7990] - Support systemd named hierarchy (name=systemd) for Mesos Containerizer.
* [MESOS-8033] - Use more idiomatic CMake for compiler features
* [MESOS-8240] - Add an option to build the new CLI and run unit tests.
* [MESOS-8306] - Restrict which agents can statically reserve resources for which roles
* [MESOS-8332] - Narrow the container sandbox permissions.
* [MESOS-8357] - Example frameworks have an inconsistent UX.
* [MESOS-8361] - Example frameworks to support launching mesos-local.
* [MESOS-8389] - Notion of "removable" task in master code is inaccurate.
* [MESOS-8390] - Notion of "transitioning" agents in the master is now inaccurate.
* [MESOS-8402] - Resource provider manager should persist resource provider information
* [MESOS-8426] - Speed up SLRP tests
* [MESOS-8427] - Clean up residual CSI endpoints for SLRP tests.
* [MESOS-8434] - Cleanup Authorization logic in master and agent
* [MESOS-8454] - Add a download link for master and agent logs in WebUI
* [MESOS-8471] - Allow revocable_resources capability for mesos-execute
* [MESOS-8488] - Docker bug can cause unkillable tasks.
* [MESOS-8506] - Add test coverage for `Resources::find` on revocable resources
* [MESOS-8556] - Boost emits warning repeatedly
* [MESOS-8573] - Container stuck in PULLING when Docker daemon hangs
* [MESOS-8574] - Docker executor makes no progress when 'docker inspect' hangs
* [MESOS-8575] - Improve discard handling for 'Docker::stop' and 'Docker::pull'.
* [MESOS-8576] - Improve discard handling of 'Docker::inspect()'
* [MESOS-8591] - Add infra to test a hung Docker daemon
* [MESOS-8599] - Build with Ninja on Windows
* [MESOS-8607] - Port mesos-execute to Windows
* [MESOS-8609] - Create a metric to indicate how long agent takes to recover executors
* [MESOS-8640] - Validate `DockerInfo` exists when container's type is `DOCKER`
* [MESOS-8656] - Improve stout JSON -> protobuf message conversion to handle more valid JSONs
* [MESOS-8658] - CMake build should use same compiler warnings as Autotools
* [MESOS-8702] - Replace the manual parsing in Mesos code with the native protobuf map support
* [MESOS-8725] - Support max_duration for tasks
* [MESOS-8728] - Don't print full usage for invocation errors
* [MESOS-8772] - Add slave recovery test for default executor.
* [MESOS-8793] - Add more logging to agent recovery path.
* [MESOS-8801] - Add jemalloc as optional third-party memory allocator
* [MESOS-8851] - Introduce a push-based gauge.
** Task
* [MESOS-3441] - Port os_tests to Windows
* [MESOS-3445] - Port signals_tests to Windows
* [MESOS-3644] - Implement stout/os/windows/signals.hpp
* [MESOS-4176] - Support CMake build on FreeBSD
* [MESOS-5726] - Benchmark the v1 Operator API
* [MESOS-5850] - Add a test that runs the 'mesos-local' binary
* [MESOS-6575] - Change `disk/xfs` isolator to terminate executor when it exceeds quota
* [MESOS-7558] - Add resource provider validation
* [MESOS-8184] - Implement master's AcknowledgeOfferOperationMessage handler.
* [MESOS-8189] - Master's OperationStatusUpdate handler should forward updates to the framework when OfferOperationID is set.
* [MESOS-8190] - Update the master to accept OfferOperationIDs from frameworks.
* [MESOS-8191] - Implement ReconcileOfferOperations handler in the master
* [MESOS-8192] - Update the scheduler library to support request/response API calls.
* [MESOS-8275] - Remove use of ::_stat on Windows
* [MESOS-8284] - Add a ns::supported convenience API.
* [MESOS-8362] - Verify end-to-end operation status update retry after RP failover
* [MESOS-8363] - Verify that the master acknowledges operation status updates correctly
* [MESOS-8373] - Test reconciliation after operation is dropped en route to agent
* [MESOS-8382] - Master should bookkeep local resource providers.
* [MESOS-8388] - Show LRP resources in master and agent endpoints.
* [MESOS-8407] - Add SLRP unit tests for profile updates and corner cases.
* [MESOS-8408] - Add an SLRP test for CSI plugin restart.
* [MESOS-8409] - Add an SLRP test for agent registered with a new ID.
* [MESOS-8415] - Add an SLRP test for agent reboot.
* [MESOS-8420] - Test that operation status updates are retried after being dropped en-route to the master.
* [MESOS-8424] - Test that operations are correctly reported following a master failover
* [MESOS-8442] - Source tree contains generated endpoint documentation
* [MESOS-8445] - Test that `UPDATE_STATE` of a resource provider doesn't have unwanted side-effects in master or agent
* [MESOS-8462] - Unit test for `Slave::detachFile` on removed frameworks.
* [MESOS-8492] - Checkpoint profiles in storage local resource provider.
* [MESOS-8527] - Add metrics about number of subscribed LRPs on the agent.
* [MESOS-8534] - Allow nested containers in TaskGroups to have separate network namespaces
* [MESOS-8539] - Add metrics about CSI plugin terminations.
* [MESOS-8551] - Port libprocess HTTPTest.QueryEncodeDecode
* [MESOS-8569] - Allow newline characters when decoding base64 strings in stout.
* [MESOS-8650] - Bump CSI bundle to v0.2.
* [MESOS-8653] - Make the CSI client to support CSI v0.2.
* [MESOS-8657] - Build CSI proto in CMake.
* [MESOS-8673] - Fix os::open to use HANDLEs
* [MESOS-8675] - Remove FD_CRT from WindowsFD
* [MESOS-8676] - Fix os::read and os::write to use HANDLES
* [MESOS-8678] - Bump gRPC bundle to 1.10.0.
* [MESOS-8683] - Remove _close from Windows close.hpp
* [MESOS-8684] - Replace _dup with DuplicateHandle on Windows
* [MESOS-8685] - Replace _lseek with SetFilePointer
* [MESOS-8692] - Replace _chsize_s with SetEndOfFile on Windows
* [MESOS-8697] - Make gRPC-related tests cross-platform.
* [MESOS-8698] - Enable storage local resource provider in CMake.
* [MESOS-8706] - Unify return type of `wait` and `destroy` containerizer methods
* [MESOS-8710] - Update tests after changing return type of `wait` method
* [MESOS-8717] - Support CSI v0.2 in SLRP.
* [MESOS-8735] - Implement recovery for resource provider manager registrar
* [MESOS-8747] - Support resizing persistent volume through operator API
* [MESOS-8748] - Create ACL for grow and shrink volume
* [MESOS-8750] - Check failed: !slaves.registered.contains(task->slave_id)
* [MESOS-8777] - Support `STAGE_UNSTAGE_VOLUME` CSI capability in SLRP
* [MESOS-8819] - mesos.pom file hardcodes developers
* [MESOS-8833] - Port libprocess subprocess_tests.cpp
** Documentation
* [MESOS-8291] - Add documentation about fault domains
Release Notes - Mesos - Version 1.5.4 (WIP)
-------------------------------------------
* This is a bug fix release.
** Bug
* [MESOS-9124] - Agent reconfiguration can cause master to unsuppress on scheduler's behalf.
* [MESOS-9418] - Add support for the `Discard` blkio operation type.
* [MESOS-9529] - `/proc` should be remounted even if a nested container set `share_pid_namespace` to true.
* [MESOS-9616] - `Filters.refuse_seconds` declines resources not in offers.
* [MESOS-9619] - Mesos Master Crashes with Launch Group when using Port Resources
* [MESOS-9695] - Remove the duplicate pid check in Docker containerizer
* [MESOS-9707] - Calling link::lo() may cause runtime error
* [MESOS-9766] - /__processes__ endpoint can hang.
* [MESOS-9786] - Race between two REMOVE_QUOTA calls crashes the master.
* [MESOS-9787] - Log slow SSL (TLS) peer reverse DNS lookup.
* [MESOS-9852] - Slow memory growth in master due to deferred deletion of offer filters and timers.
* [MESOS-9856] - REVIVE call with specified role(s) clears filters for all roles of a framework.
* [MESOS-9870] - Simultaneous adding/removal of a role from framework's roles and its suppressed roles crashes the master.
* [MESOS-10007] - Command executor can miss exit status for short-lived commands due to double-reaping.
* [MESOS-10126] - Docker volume isolator needs to clean up the `info` struct regardless the result of unmount operation
** Improvement
* [MESOS-9159] - Support Foreign URLs in docker registry puller.
* [MESOS-9675] - Docker Manifest V2 Schema2 Support.
* [MESOS-9704] - Support docker manifest v2s2 config GC.
* [MESOS-9948] - master::Slave::hasExecutor occupies 37% of a 150 second perf sample.
* [MESOS-10017] - Log all reverse DNS lookup failures in 'legacy' TLS (SSL) hostname validation scheme.
Release Notes - Mesos - Version 1.5.3
-------------------------------------
* This is a bug fix release.
** Bug
* [MESOS-7474] - Mesos fetcher cache doesn't retry when missed.
* [MESOS-8887] - Unreachable tasks are not GC'ed when unreachable agent is GC'ed.
* [MESOS-8907] - Docker image fetcher fails with HTTP/2.
* [MESOS-9210] - Mesos v1 scheduler library does not properly handle SUBSCRIBE retries
* [MESOS-9317] - Some master endpoints do not handle failed authorization properly.
* [MESOS-9332] - Nested container should run as the same user of its parent container by default.
* [MESOS-9334] - Container stuck at ISOLATING state due to libevent poll never returns.
* [MESOS-9362] - Test `CgroupsIsolatorTest.ROOT_CGROUPS_CreateRecursively` is flaky.
* [MESOS-9411] - Validation of JWT tokens using HS256 hashing algorithm is not thread safe.
* [MESOS-9492] - Persist CNI working directory across reboot.
* [MESOS-9507] - Agent could not recover due to empty docker volume checkpointed files.
* [MESOS-9518] - CNI_NETNS should not be set for orphan containers that do not have network namespace.
* [MESOS-9532] - ResourceOffersTest.ResourceOfferWithMultipleSlaves is flaky.
* [MESOS-9533] - CniIsolatorTest.ROOT_CleanupAfterReboot is flaky.
* [MESOS-9555] - Allocator CHECK failure: reservationScalarQuantities.contains(role).
* [MESOS-9581] - Mesos package naming appears to be undeterministic.
** Improvement:
* [MESOS-9305] - Create cgoup recursively to workaround systemd deleting cgroups_root.
Release Notes - Mesos - Version 1.5.2
-------------------------------------
* This is a bug fix release.
** Bug
* [MESOS-3790] - ZooKeeper connection should retry on `EAI_NONAME`.
* [MESOS-7474] - Mesos fetcher cache doesn't retry when missed.
* [MESOS-8128] - Make os::pipe file descriptors O_CLOEXEC.
* [MESOS-8418] - mesos-agent high cpu usage because of numerous /proc/mounts reads.
* [MESOS-8545] - AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.
* [MESOS-8568] - Command checks should always call `WAIT_NESTED_CONTAINER` before `REMOVE_NESTED_CONTAINER`.
* [MESOS-8620] - Containers stuck in FETCHING possibly due to unresponsive server.
* [MESOS-8830] - Agent gc on old slave sandboxes could empty persistent volume data.
* [MESOS-8871] - Agent may fail to recover if the agent dies before image store cache checkpointed.
* [MESOS-8904] - Master crash when removing quota.
* [MESOS-8906] - `UriDiskProfileAdaptor` fails to update profile selectors.
* [MESOS-8907] - Docker image fetcher fails with HTTP/2.
* [MESOS-8917] - Agent leaking file descriptors into forked processes.
* [MESOS-8921] - Autotools don't work with newer OpenJDK versions.
* [MESOS-8935] - Quota limit "chopping" can lead to cpu-only and memory-only offers.
* [MESOS-8936] - Implement a Random Sorter for offer allocations.
* [MESOS-8942] - Master streaming API does not send (health) check updates for tasks.
* [MESOS-8945] - Master check failure due to CHECK_SOME(providerId).
* [MESOS-8947] - Improve the container preparing logging in IOSwitchboard and volume/secret isolator.
* [MESOS-8952] - process::await/collect n^2 performance issue.
* [MESOS-8963] - Executor crash trying to print container ID.
* [MESOS-8978] - Command executor calling setsid breaks the tty support.
* [MESOS-8980] - mesos-slave can deadlock with docker pull.
* [MESOS-8986] - `slave.available()` in the allocator is expensive and drags down allocation performance.
* [MESOS-8987] - Master asks agent to shutdown upon auth errors.
* [MESOS-9024] - Mesos master segfaults with stack overflow under load.
* [MESOS-9049] - Agent GC could unmount a dangling persistent volume multiple times.
* [MESOS-9116] - Launch nested container session fails due to incorrect detection of `mnt` namespace of command executor's task.
* [MESOS-9125] - Port mapper CNI plugin might fail with "Resource temporarily unavailable".
* [MESOS-9127] - Port mapper CNI plugin might deadlock iptables on the agent.
* [MESOS-9131] - Health checks launching nested containers while a container is being destroyed lead to unkillable tasks.
* [MESOS-9142] - CNI detach might fail due to missing network config file.
* [MESOS-9144] - Master authentication handling leads to request amplification.
* [MESOS-9145] - Master has a fragile burned-in 5s authentication timeout.
* [MESOS-9146] - Agent has a fragile burn-in 5s authentication timeout.
* [MESOS-9147] - Agent and scheduler driver authentication retry backoff time could overflow.
* [MESOS-9151] - Container stuck at ISOLATING due to FD leak.
* [MESOS-9170] - Zookeeper doesn't compile with newer gcc due to format error.
* [MESOS-9196] - Removing rootfs mounts may fail with EBUSY.
* [MESOS-9231] - `docker inspect` may return an unexpected result to Docker executor due to a race condition.
* [MESOS-9267] - Mesos agent crashes when CNI network is not configured but used.
* [MESOS-9279] - Docker Containerizer 'usage' call might be expensive if mount table is big.
* [MESOS-9283] - Docker containerizer actor can get backlogged with large number of containers.
* [MESOS-9305] - Create cgoup recursively to workaround systemd deleting cgroups_root.
* [MESOS-9308] - URI disk profile adaptor could deadlock.
* [MESOS-9317] - Some master endpoints do not handle failed authorization properly.
* [MESOS-9332] - Nested container should run as the same user of its parent container by default.
* [MESOS-9334] - Container stuck at ISOLATING state due to libevent poll never returns.
* [MESOS-9411] - Validation of JWT tokens using HS256 hashing algorithm is not thread safe.
* [MESOS-9419] - Executor to framework message crashes master if framework has not re-registered.
* [MESOS-9480] - Master may skip processing authorization results for `LAUNCH_GROUP`.
* [MESOS-9492] - Persist CNI working directory across reboot.
* [MESOS-9501] - Mesos executor fails to terminate and gets stuck after agent host reboot.
* [MESOS-9502] - IOswitchboard cleanup could get stuck due to FD leak from a race.
** Improvement:
* [MESOS-9510] - Disallowed nan, inf and so on in `Value::Scalar`.
* [MESOS-9516] - Extend `min_allocatable_resources` flag to cover non-scalar resources.
* [MESOS-9518] - CNI_NETNS should not be set for orphan containers that do not have network namespace.
Release Notes - Mesos - Version 1.5.1
-------------------------------------
* This is a bug fix release.
** Bug
* [MESOS-1720] - Slave should send exited executor message when the executor is never launched.
* [MESOS-7742] - Race conditions in IOSwitchboard: listening on unix socket and premature closing of the connection.
* [MESOS-8125] - Agent should properly handle recovering an executor when its pid is reused.
* [MESOS-8411] - Killing a queued task can lead to the command executor never terminating.
* [MESOS-8416] - CHECK failure if trying to recover nested containers but the framework checkpointing is not enabled.
* [MESOS-8468] - `LAUNCH_GROUP` failure tears down the default executor.
* [MESOS-8488] - Docker bug can cause unkillable tasks.
* [MESOS-8510] - URI disk profile adaptor does not consider plugin type for a profile.
* [MESOS-8536] - Pending offer operations on resource provider resources not properly accounted for in allocator.
* [MESOS-8550] - Bug in `Master::detected()` leads to coredump in `MasterZooKeeperTest.MasterInfoAddress`.
* [MESOS-8552] - CGROUPS_ROOT_PidNamespaceForward and CGROUPS_ROOT_PidNamespaceBackward tests fail.
* [MESOS-8565] - Persistent volumes are not visible in Mesos UI when launching a pod using default executor.
* [MESOS-8569] - Allow newline characters when decoding base64 strings in stout.
* [MESOS-8574] - Docker executor makes no progress when 'docker inspect' hangs.
* [MESOS-8575] - Improve discard handling for 'Docker::stop' and 'Docker::pull'.
* [MESOS-8576] - Improve discard handling of 'Docker::inspect()'.
* [MESOS-8577] - Destroy nested container if `LAUNCH_NESTED_CONTAINER_SESSION` fails.
* [MESOS-8594] - Mesos master stack overflow in libprocess socket send loop.
* [MESOS-8598] - Allow empty resource provider selector in `UriDiskProfileAdaptor`.
* [MESOS-8601] - Master crashes during slave reregistration after failover.
* [MESOS-8604] - Quota headroom tracking may be incorrect in the presence of hierarchical reservation.
* [MESOS-8605] - Terminal task status update will not send if 'docker inspect' is hung.
* [MESOS-8619] - Docker on Windows uses `USERPROFILE` instead of `HOME` for credentials.
* [MESOS-8624] - Valid tasks may be explicitly dropped by agent due to race conditions.
* [MESOS-8631] - Agent should be able to start a task with every CPU on a Windows machine.
* [MESOS-8641] - Event stream could send heartbeat before subscribed.
* [MESOS-8646] - Agent should be able to resolve file names on open files.
* [MESOS-8651] - Potential memory leaks in the `volume/sandbox_path` isolator.
* [MESOS-8741] - `Add` to sequence will not run if it races with sequence destruction.
* [MESOS-8742] - Agent resource provider config API calls should be idempotent.
* [MESOS-8786] - CgroupIsolatorProcess accesses subsystem processes directly.
* [MESOS-8787] - RP-related API should be experimental.
* [MESOS-8876] - Normal exit of Docker container using rexray volume results in TASK_FAILED.
* [MESOS-8881] - Enable epoll backend in libevent integration.
* [MESOS-8885] - Disable libevent debug mode.
Release Notes - Mesos - Version 1.5.0
-------------------------------------
This release contains the following new features:
* [MESOS-1739] - **Experimental** Agents now support the
`--reconfiguration_policy` flag which allows them to recover
the agent ID and running tasks after configuration changes.
See docs/agent-recovery.md for more details.
* [MESOS-4945] - **Experimental** Agents now can automatically
garbage collect unused Docker image layers used by Mesos
Containerizer.
* [MESOS-7289, MESOS-7235] - **Experimental** Support for the
Container Storage Interface (CSI) to simplify storage management
in Mesos, and allow 3rdparty vendors to plugin into Mesos very
easily.
* [MESOS-7302] - Support launching standalone containers on the
agent using MesosContainerizer without a master or framework
running.
* [MESOS-7749] - **Experimental** Support for gRPC client in Mesos.
The gRPC is bundled in Mesos and a gRPC client API is built is
built into libprocess.
* [MESOS-7973] - **Experimental** Non-leading replica is now allowed
to catch-up missing log positions in the replicated log. This opens
the door for implementing hot standby (by offloading some reading
from a leader to standbys) and fast failover time (by keeping
in-memory storage represented by the log "hot").
* Several improvements and fixes to the enforcement of quota
guarantees have been made:
* [MESOS-4527]: Previously a role could "game" the quota system
by amassing reservations that it leaves unused. This is now
prevented by accounting for reservations when allocating
resources.
* [MESOS-7099]: Resources are now allocated in a fine-grained
manner to prevent roles from exceeding their quota.
* [MESOS-8293]: There was a bug where a role may not receive its
reservation when it does not have quota, this has been fixed.
* [MESOS-8339]: When a role has more reservations than quota,
there was a bug previously where an insufficient amount of
quota headroom was held. This has been fixed.
* [MESOS-8352]: When allocating to a role with quota, we
previously included all other resources on the agent that the
role does not have quota for. This made it possible to violate
the quota guarantees of a different role. This has been fixed
by taking into account the headroom that is needed when
allocating the resources.
Deprecations/Removals:
* [MESOS-7305] - Some nested container agent APIs `****_NESTED_CONTAINER`
are deprecated in favor of the new generally named agent APIs
`****_CONTAINER`.
* Agent flag `--executor_secret_key` has been deprecated. Operators
should use `--jwt_secret_key` instead.
Additional API Changes:
* [MESOS-6406, MESOS-7215, MESOS-8337] Now when an agent is partitioned,
the master tracks all noncompleted tasks regardless of partition-awareness
so when the agent reregisters it can recover all of them and send their
latest statuses to the scheduler. NOTE: The master now sends updates for
tasks recovered from partitioned agents upon reregistration so the scheduler
can get them before reconciliation. We also fixed the buggy semantics that
exposes terminal unacknowledged tasks when partitioned as "completed" in the
HTTP endpoints and the operator API, now they are shown as "unreachable". We
plan to further improve the API on this in MESOS-8405.
* [MESOS-7550] The fields `Resource.disk.source.path.root` and
`Resource.disk.source.mount.root` can now be set to relative paths
to an agent's work directory.
* [MESOS-7660] `Filter::refuse_seconds` is now capped to 31536000
seconds (365 days).
* [MESOS-7941] Built-in executors will now send a TASK_STARTING
status update when a task is starting.
* [MESOS-7973] A new `catchup` method has been added to the
`Log.Reader` interface (including Java binding).
* [MESOS-8040] Return nested/standalone containers in `GET_CONTAINERS`
API call.
* [MESOS-8165] Master will now send TASK_GONE status for unknown
tasks of PARTITION_AWARE frameworks belonging to registered agents
during explicit reconciliation.
Changes to Dependencies:
* Upgraded minimum required Protobuf library to version 3+.
Feature Graduations:
* [MESOS-4791] - v1 Operator API is now considered stable. The performance has
been improved so that when using protobuf it is faster than v0, and when
using JSON it is slightly slower than v0.
* [MESOS-5116] - Add support for accounting only mode in XFS isolator.
* [MESOS-5275, MESOS-7476, MESOS-7477, MESOS-7671] - Add file-based and
protobuf-based capabilities support for mesos containerizer. This
includes the support for effective and bounding capabilities.
* [MESOS-6077] - Added a default (task group) executor.
* [MESOS-6402] - rlimit support for Mesos containerizer.
* [MESOS-6460] - Container Attach/Exec.
* [MESOS-6758] - Support docker registry that requires basic auth.
* [MESOS-7088] - Support private registry credential per container.
* [MESOS-7418] - Add support for file-based secrets.
Unresolved Critical Issues:
* [MESOS-1718] - Command executor can overcommit the agent.
* [MESOS-2554] - Slave flaps when using --slave_subsystems that are not used for isolation.
* [MESOS-2774] - SIGSEGV received during process::MessageEncoder::encode().
* [MESOS-2842] - Update FrameworkInfo.principal on framework re-registration.
* [MESOS-3533] - Unable to find and run URIs files.
* [MESOS-3747] - HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string.
* [MESOS-4996] - 'containerizer->update' will always fail after killing a docker container.
* [MESOS-5352] - Docker volume isolator cleanup can be blocked by first cleanup failure.
* [MESOS-5396] - After failover, master does not remove agents with same UPID.
* [MESOS-5989] - Libevent SSL Socket downgrade code accesses uninitialized memory / assumes single peek is sufficient.
* [MESOS-5995] - Protobuf JSON deserialisation does not accept numbers formated as strings.
* [MESOS-6632] - ContainerLogger might leak FD if container launch fails.
* [MESOS-6804] - Running 'tty' inside a debug container that has a tty reports "Not a tty".
* [MESOS-6986] - abort in DRFSorter::add.
* [MESOS-7386] - Executor not cleaning up existing running docker containers if external logrotate/logger processes die/killed.
* [MESOS-7566] - Master crash due to failed check in DRFSorter::remove.
* [MESOS-7622] - Agent can crash if a HTTP executor tries to retry subscription in running state.
* [MESOS-7721] - Master's agent removal rate limit also applies to agent unreachability.
* [MESOS-7748] - Slow subscribers of streaming APIs can lead to Mesos OOMing.
* [MESOS-7911] - Non-checkpointing framework's tasks should not be marked LOST when agent disconnects.
* [MESOS-7966] - check for maintenance on agent causes fatal error.
* [MESOS-7991] - fatal, check failed !framework->recovered().
* [MESOS-8038] - Launching GPU task sporadically fails.
* [MESOS-8125] - Agent should properly handle recovering an executor when its pid is reused.
* [MESOS-8137] - Mesos agent can hang during startup.
* [MESOS-8256] - Libprocess can silently deadlock due to worker thread exhaustion.
* [MESOS-8411] - Killing a queued task can lead to the command executor never terminating.
* [MESOS-8468] - `LAUNCH_GROUP` failure tears down the default executor.
All Resolved Issues:
** Bug
* [MESOS-1216] - Attributes comparator operator should allow multiple attributes of same name and type.
* [MESOS-3576] - Audit CMake linking flags.
* [MESOS-5455] - Transition away from temporary build variables.
* [MESOS-5462] - Re-organize isolator hierarchy.
* [MESOS-5656] - Incomplete modelling of 3rdparty dependencies in cmake build.
* [MESOS-5881] - Semantics of `os::symlink` differ across POSIX and Windows.
* [MESOS-5905] - Zookeeper tests do not work on CMake builds as directory structure changed.
* [MESOS-6086] - PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove is flaky.
* [MESOS-6187] - "double free or corruption" with Java 8.
* [MESOS-6345] - ExamplesTest.PersistentVolumeFramework failing due to double free corruption on Ubuntu 14.04.
* [MESOS-6406] - Send latest status for partition-aware tasks when agent reregisters.
* [MESOS-6428] - Mesos containerizer helper function signalSafeWriteStatus is not AS-Safe.
* [MESOS-6616] - Error: dereferencing type-punned pointer will break strict-aliasing rules.
* [MESOS-6671] - External 3rdparty deps are not built with the configured compiler in cmake build.
* [MESOS-6690] - Wire up resource control API to Windows Job objects API.
* [MESOS-6697] - Port `authentication_tests.cpp`.
* [MESOS-6703] - Port `credentials_tests.cpp`.
* [MESOS-6705] - Port `fetcher_tests.cpp`.
* [MESOS-6708] - Port `group_tests.cpp`.
* [MESOS-6735] - `os::realpath` semantics differ between Windows and POSIX.
* [MESOS-6784] - IOSwitchboardTest.KillSwitchboardContainerDestroyed is flaky.
* [MESOS-6790] - Wrong task started time in webui.
* [MESOS-6794] - Properly model header dependencies of cmake build components.
* [MESOS-6816] - Allows frameworks to overwrite system environment variables.
* [MESOS-6942] - CMake build with `-DENABLE_LIBEVENT=ON` requires system-installed `openssl`.
* [MESOS-6949] - SchedulerTest.MasterFailover is flaky.
* [MESOS-7007] - filesystem/shared and --default_container_info broken since 1.1.
* [MESOS-7099] - Quota can be exceeded due to coarse-grained offer technique.
* [MESOS-7130] - port_mapping isolator: executor hangs when running on EC2.
* [MESOS-7160] - Parsing of perf version segfaults.
* [MESOS-7215] - Race condition on re-registration of non-partition-aware frameworks.
* [MESOS-7223] - Linux filesystem isolator cannot mount host volume /dev/log.
* [MESOS-7296] - CMake 2.8.10 does not support TIMESTAMP.
* [MESOS-7312] - Update Resource proto for storage resource providers.
* [MESOS-7425] - ImageAlpine/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/3 is flaky in some OS.
* [MESOS-7440] - Various DefaultExecutorCheckTest* tests flaky on ASF CI.
* [MESOS-7500] - Command checks via agent lead to flaky tests.
* [MESOS-7504] - Parent's mount namespace cannot be determined when launching a nested container.
* [MESOS-7509] - CniIsolatorPortMapperTest.ROOT_INTERNET_CURL_PortMapper fails on some Linux distros.
* [MESOS-7511] - CniIsolatorTest.ROOT_DynamicAddDelofCniConfig is flaky.
* [MESOS-7519] - OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable is flaky.
* [MESOS-7541] - Cannot compile without pre-compiled headers on Windows.
* [MESOS-7586] - Make use of cout/cerr and glog consistent.
* [MESOS-7589] - CommandExecutorCheckTest.CommandCheckDeliveredAndReconciled is flaky.
* [MESOS-7660] - HierarchicalAllocator uses the default filter instead of a very long one.
* [MESOS-7661] - Libprocess timers with long durations trigger immediately.
* [MESOS-7704] - Remove use of #pragma comment (lib, "IPHLPAPI.lib").
* [MESOS-7726] - MasterTest.IgnoreOldAgentReregistration test is flaky.
* [MESOS-7729] - ExamplesTest.DynamicReservationFramework is flaky.
* [MESOS-7741] - SlaveRecoveryTest/0.MultipleSlaves has double free corruption.
* [MESOS-7781] - Windows API GetVersionExW was declared deprecated.
* [MESOS-7784] - MasterTestPrePostReservationRefinement.CreateAndDestroyVolumesV1 is flaky.
* [MESOS-7791] - subprocess' childMain using ABORT when encountering user errors.
* [MESOS-7811] - libprocess-tests depend on gtest but it's not setup.
* [MESOS-7828] - Current approach to parse protobuf enum from JSON does not support upgrades.
* [MESOS-7835] - CMake build does not support Marathon.
* [MESOS-7851] - Master stores old resource format in the registry.
* [MESOS-7867] - Master doesn't handle scheduler driver downgrade from HTTP based to PID based.
* [MESOS-7873] - Expose `ExecutorInfo.ContainerInfo.NetworkInfo` in Mesos `state` endpoint.
* [MESOS-7877] - Audit test code for undefined behavior in accessing container elements.
* [MESOS-7917] - Docker statistics not reported on Windows.
* [MESOS-7921] - ProcessManager::resume sometimes crashes accessing EventQueue.
* [MESOS-7923] - Make args optional in mesos port mapper plugin.
* [MESOS-7927] - The composing containerizer leaks memory in some scenarios.
* [MESOS-7929] - `Metrics()` hangs on second call on Windows.
* [MESOS-7945] - MasterAPITest.EventAuthorizationFiltering is flaky.
* [MESOS-7963] - Task groups can lose the container limitation status.
* [MESOS-7964] - Heavy-duty GC makes the agent unresponsive.
* [MESOS-7968] - Handle `/proc/self/ns/pid_for_children` when parsing available namespace.
* [MESOS-7969] - Handle cgroups v2 hierarchy when parsing /proc/self/cgroups.
* [MESOS-7972] - SlaveTest.HTTPSchedulerSlaveRestart test is flaky.
* [MESOS-7975] - The command/default/docker executor can incorrectly send a TASK_FINISHED update even when the task is killed.
* [MESOS-7978] - Lint javascript files to enable linting.
* [MESOS-7980] - Stout fails to compile with libc >= 2.26.
* [MESOS-7988] - Mesos attempts to open handle for the system idle process.
* [MESOS-7993] - Fix Windows header orderings.
* [MESOS-7996] - ContentType/SchedulerTest.NoOffersWithAllRolesSuppressed is flaky.
* [MESOS-7997] - ContentType/MasterAPITest.CreateAndDestroyVolumes is flaky.
* [MESOS-7998] - PersistentVolumeEndpointsTest.UnreserveVolumeResources is flaky.
* [MESOS-8000] - DefaultExecutorCniTest.ROOT_VerifyContainerIP is flaky.
* [MESOS-8001] - PersistentVolumeEndpointsTest.NoAuthentication is flaky.
* [MESOS-8003] - PersistentVolumeEndpointsTest.SlavesEndpointFullResources is flaky.
* [MESOS-8010] - AfterTest.Loop is flaky.
* [MESOS-8027] - os::open doesn't always atomically apply O_CLOEXEC.
* [MESOS-8035] - Correct mesos-tests CMake build dependencies.
* [MESOS-8039] - A broken connection during LaunchNestedContainer call might result in the nested container not being cleaned up.
* [MESOS-8046] - MasterTestPrePostReservationRefinement.ReserveAndUnreserveResourcesV1 is flaky.
* [MESOS-8048] - ReservationEndpointsTest.GoodReserveAndUnreserveACL is flaky.
* [MESOS-8051] - Killing TASK_GROUP fail to kill some tasks.
* [MESOS-8052] - "protoc" not found when running "make -j4 check" directly in stout.
* [MESOS-8057] - Apply security patches to AngularJS and JQuery in the Mesos UI.
* [MESOS-8058] - Agent and master can race when updating agent state.
* [MESOS-8066] - Pylint report errors in apply-reviews.py on Ubuntu 14.04.
* [MESOS-8070] - Bundled GRPC build does not build on Debian 8.
* [MESOS-8076] - PersistentVolumeTest.SharedPersistentVolumeRescindOnDestroy is flaky.
* [MESOS-8080] - The default executor does not propagate missing task exit status correctly.
* [MESOS-8082] - updateAvailable races with a periodic allocation and leads to flaky tests.
* [MESOS-8084] - Double free corruption in tests due to parallel manipulation of signal and control handlers.
* [MESOS-8085] - No point in deallocate() for a framework for maintenance if it is deactivated.
* [MESOS-8090] - Mesos 1.4.0 crashes with 1.3.x agent with oversubscription.
* [MESOS-8093] - Some tests miss subscribed event because expectation is set after event fires.
* [MESOS-8095] - ResourceProviderRegistrarTest.AgentRegistrar is flaky.
* [MESOS-8116] - Fix off by-one error in Windows long path support.
* [MESOS-8119] - ROOT_DOCKER_DockerHealthyTask segfaults in debian 8.
* [MESOS-8121] - Unified Containerizer Auto backend should check xfs ftype for overlayfs backend.
* [MESOS-8123] - GPU tests are failing due to TASK_STARTING.
* [MESOS-8135] - Masters can lose track of tasks' executor IDs.
* [MESOS-8136] - Update XFS isolator tests to handle TASK_STARTING.
* [MESOS-8157] - Review #62775 broke the build.
* [MESOS-8159] - ns::clone uses an async signal unsafe stack.
* [MESOS-8165] - TASK_UNKNOWN status is ambiguous.
* [MESOS-8169] - Incorrect master validation forces executor IDs to be globally unique.
* [MESOS-8171] - Using a failoverTimeout of 0 with Mesos native scheduler client can result in infinite subscribe loop.
* [MESOS-8173] - Improve fetcher exit status message.
* [MESOS-8178] - UnreachableAgentReregisterAfterFailover is flaky.
* [MESOS-8179] - Scheduler library has incorrect assumptions about connections.
* [MESOS-8180] - Port mesos-fetcher to Windows.
* [MESOS-8200] - Suppressed roles are not honoured for v1 scheduler subscribe requests.
* [MESOS-8217] - Don't run linters on every commit.
* [MESOS-8220] - Can't build with Visual Studio 15.5.
* [MESOS-8223] - Master crashes when suppressed on subscribe is enabled.
* [MESOS-8225] - Port os::which to Windows.
* [MESOS-8237] - Strip (Offer|Resource).allocation_info for non-MULTI_ROLE schedulers.
* [MESOS-8245] - SlaveRecoveryTest/0.ReconnectExecutor is flaky.
* [MESOS-8249] - Support image prune in mesos containerizer and provisioner.
* [MESOS-8263] - ResourceProviderManagerHttpApiTest.ConvertResources is flaky.
* [MESOS-8267] - NestedMesosContainerizerTest.ROOT_CGROUPS_RecoverLauncherOrphans is flaky.
* [MESOS-8272] - Fall back to bind mounting container devices.
* [MESOS-8279] - Persistent volumes are not visible in Mesos UI using default executor on Linux.
* [MESOS-8280] - Mesos Containerizer GC should set 'layers' after checkpointing layer ids in provisioner.
* [MESOS-8282] - Take pending offer operations into account when calculating framework allocated resources.
* [MESOS-8288] - SlaveTest.IgnoreV0ExecutorIfItReregistersWithoutReconnect is flaky.
* [MESOS-8289] - ReservationTest.MasterFailover is flaky when run with `RESOURCE_PROVIDER` capability.
* [MESOS-8293] - Reservation may not be allocated when the role has no quota.
* [MESOS-8297] - Built-in driver-based executors ignore kill task if the task has not been launched.
* [MESOS-8312] - Pass resource provider information to master as part of UpdateSlaveMessage.
* [MESOS-8315] - ResourceProviderManagerHttpApiTest.ResubscribeResourceProvider is flaky.
* [MESOS-8316] - Tests that fetch docker images might be flaky due to insufficient wait timeout.
* [MESOS-8318] - OfferOperationStatusUpdateManagerTest tests fail on Windows.
* [MESOS-8320] - Expose information about local resource providers in master.
* [MESOS-8325] - Mesos containerizer does not properly handle old running containers.
* [MESOS-8337] - Invalid state transition attempted when agent is lost.
* [MESOS-8339] - Quota headroom may be insufficiently held when role has more reservation than quota.
* [MESOS-8341] - Agent can become stuck in (re-)registering state during upgrades.
* [MESOS-8344] - Improve JSON v1 operator API performance.
* [MESOS-8346] - Resubscription of a resource provider will crash the agent if its HTTP connection isn't closed.
* [MESOS-8349] - When a resource provider driver is disconnected, it fails to reconnect.
* [MESOS-8350] - Resource provider-capable agents not correctly synchronizing checkpointed agent resources on reregistration.
* [MESOS-8352] - Resources may get over allocated to some roles while fail to meet the quota of other roles.
* [MESOS-8356] - Persistent volume ownership is set to root despite of sandbox owner (frameworkInfo.user) when docker executor is used.
* [MESOS-8369] - CI build failure compiling volume_profile.proto.
* [MESOS-8376] - Bundled GRPC does not build on Debian 9.
* [MESOS-8377] - RecoverTest.CatchupTruncated is flaky.
* [MESOS-8391] - Mesos agent doesn't notice that a pod task exits or crashes after the agent restart.
* [MESOS-8393] - SLRP NewVolumeRecovery and LaunchTaskRecovery tests CHECK failures.
* [MESOS-8410] - Reconfiguration policy fails to handle mount disk resources.
* [MESOS-8417] - Mesos can get "stuck" when a Process throws an exception.
* [MESOS-8419] - RP manager incorrectly setting framework ID leads to CHECK failure.
* [MESOS-8422] - Master's UpdateSlave handler not correctly updating terminated operations.
* [MESOS-8443] - Fix Docker Containerizer PATH on Windows so Docker is usable.
* [MESOS-8444] - GC failure causes agent miss to detach virtual paths for the executor's sandbox.
* [MESOS-8446] - Agent miss to detach `virtualLatestPath` for the executor's sandbox during recovery.
* [MESOS-8460] - `Slave::detachFile` can segfault because it could use invalid Framework*.
* [MESOS-8461] - SLRP should no assume a CSI plugin always has GetNodeID implemented.
* [MESOS-8469] - Mesos master might drop some events in the operator API stream.
* [MESOS-8480] - Mesos returns high resource usage when killing a Docker task.
* [MESOS-8481] - Agent reboot during checkpointing may result in empty checkpoints.
* [MESOS-8514] - SLRP failed to connect to CSI endpoint.
** Documentation
* [MESOS-5078] - Document TaskStatus reasons.
* [MESOS-7663] - Update the documentation to reflect the addition of reservation refinement.
* [MESOS-8007] - Add documentation for MARK_AGENT_GONE call.
* [MESOS-8303] - Add user doc for agent reconfiguration.
* [MESOS-8304] - Update CHANGELOG to call out agent reconfiguration feature.
* [MESOS-8310] - Document container image garbage collection.
** Epic
* [MESOS-1739] - Allow slave reconfiguration on restart.
* [MESOS-4945] - Garbage collect unused docker layers in the store.
* [MESOS-7235] - Improve Storage Support using Resource Provider and CSI.
* [MESOS-7289] - Support Container Storage Interface (CSI).
* [MESOS-7302] - Support launching standalone containers.
* [MESOS-7749] - Support gRPC client.
** Improvement
* [MESOS-564] - Update Contribution Documentation.
* [MESOS-5675] - Add support for master capabilities.
* [MESOS-5771] - Add benchmark test for shared resources.
* [MESOS-5902] - CMake should generate protobuf definitions for Java.
* [MESOS-6350] - Raise minimum required cmake version.
* [MESOS-6390] - Ensure Python support scripts are linted.
* [MESOS-6971] - Use arena allocation to improve protobuf message passing performance.
* [MESOS-7306] - Support mount propagation for host volumes.
* [MESOS-7330] - Add resource provider to offer.
* [MESOS-7361] - Command checks via agent pollute agent logs.
* [MESOS-7370] - Fix create symlink code to use flag which enables non-admins to make symlinks.
* [MESOS-7497] - Remove CMake anti-pattern of `set(x "${x} ..")`.
* [MESOS-7616] - Consider supporting changes to agent's domain without full drain.
* [MESOS-7675] - Isolate network ports.
* [MESOS-7695] - Add heartbeats to master stream API.
* [MESOS-7737] - Harden Mesos when building with cmake.
* [MESOS-7785] - Pass Operator API subscription events through authorizer.
* [MESOS-7795] - Remove "latest" symlink after agent reboot.
* [MESOS-7798] - Improve libprocess message passing performance.
* [MESOS-7837] - Propagate resource updates from local resource providers to master.
* [MESOS-7840] - Add Mesos CLI command to list active tasks.
* [MESOS-7842] - Basic sandbox GC metrics.
* [MESOS-7861] - Include check output in the DefaultExecutor log.
* [MESOS-7880] - Add an option to skip the Mesos style check when applying a review chain.
* [MESOS-7889] - Avoid Multiple PROTOC invocations when generating Protobuf & GRPC code in libprocess.
* [MESOS-7895] - ZK session timeout is unconfigurable in agent and scheduler drivers.
* [MESOS-7916] - Improve the test coverage of the DefaultExecutor.
* [MESOS-7924] - Add a javascript linter to the webui.
* [MESOS-7941] - Send TASK_STARTING status from built-in executors.
* [MESOS-7951] - Design Doc for Extended KillPolicy.
* [MESOS-7961] - Display task health in the webui.
* [MESOS-7962] - Display task state counters in the framework page of the webui.
* [MESOS-7973] - Non-leading VOTING replica catch-up.
* [MESOS-7987] - Initialize Google Mock rather than Google Test.
* [MESOS-8012] - Support Znode paths for masters in the new CLI.
* [MESOS-8015] - Design a scheduler (V1) HTTP API authenticatee mechanism.
* [MESOS-8016] - Introduce modularized HTTP authenticatee.
* [MESOS-8017] - Introduce a basic HTTP authenticatee.
* [MESOS-8021] - Update HTTP scheduler library to allow for modularized authenticatee.
* [MESOS-8034] - Remove LIBNAME_VERSION from EXTERNAL.
* [MESOS-8040] - Return nested/standalone containers in `GET_CONTAINERS` API call.
* [MESOS-8072] - Change Mesos common events verbose logs to use VLOG(2) instead of 1.
* [MESOS-8074] - Change Libprocess actor state transitions verbose logs to use VLOG(3) instead of 2.
* [MESOS-8078] - Some fields went missing with no replacement in api/v1.
* [MESOS-8115] - Add a master flag to disallow agents that are not configured with fault domain.
* [MESOS-8117] - Update Getting Started documentation.
* [MESOS-8221] - Use protobuf reflection to simplify downgrading of resources.
* [MESOS-8286] - Making bind mounts readonly fails with user namespaces.
* [MESOS-8294] - Support container image basic auto gc.
* [MESOS-8295] - Add excluded image parameter to containerizer::pruneImages() interface.
* [MESOS-8301] - Support moving into defer/dispatch/install handlers.
* [MESOS-8302] - Improve master failover performance.
* [MESOS-8328] - Improve logs displayed after a slave failed recovery.
* [MESOS-8358] - Create agent endpoints for pruning images.
* [MESOS-8365] - Create AuthN support for prune images API.
* [MESOS-8421] - Duration operators drop precision, even when used with integers.
* [MESOS-8455] - Avoid unnecessary copying of protobuf in the v1 API.
** Task
* [MESOS-3107] - Define CMake style guide.
* [MESOS-3110] - Harden the CMake system-dependency-locating routines.
* [MESOS-3384] - Include libsasl in Windows CMake build.
* [MESOS-3437] - Port flags_tests.
* [MESOS-4527] - Roles can exceed limit allocation via reservations.
* [MESOS-6193] - Make the docker/volume isolator nesting aware.
* [MESOS-6709] - Enable HTTP and TCP health checks on Windows.
* [MESOS-6714] - Port `slave_tests.cpp`.
* [MESOS-6733] - Windows: Enable authentication to the master.
* [MESOS-6894] - Checkpoint 'ContainerConfig' in Mesos Containerizer.
* [MESOS-7284] - Allow Mesos CLI to take masters IP.
* [MESOS-7285] - Implement a plugin to list container's on a given agent.
* [MESOS-7303] - Support Isolator capabilities.
* [MESOS-7305] - Adjust the recover logic of MesosContainerizer to allow standalone containers.
* [MESOS-7328] - Validate offer operations for converting disk resources.
* [MESOS-7388] - Update allocator interfaces to support resource providers.
* [MESOS-7443] - Add the MARK_AGENT_GONE call to the Operator v1 API protos.
* [MESOS-7444] - Add support for storing gone agents to the master registry.
* [MESOS-7445] - Implement the API handler on the master for marking agents as gone.
* [MESOS-7446] - Add authorization for the MARK_AGENT_GONE call.
* [MESOS-7448] - Add support for pruning the list of gone agents in the registry.
* [MESOS-7469] - Add resource provider driver.
* [MESOS-7491] - Build a CSI client to talk to a CSI plugin.
* [MESOS-7533] - Add a function stub for resource provider re-registration.
* [MESOS-7534] - Notify resource providers if they've been reregistered.
* [MESOS-7535] - Distinguish between active and inactive resource providers in RP Manager.
* [MESOS-7550] - Publish Local Resource Provider resources in the agent before container launch or update.
* [MESOS-7555] - Add resource provider IDs to the registry.
* [MESOS-7557] - Test that resource providers can reregister after agent fails over.
* [MESOS-7561] - Add storage resource provider specific information in ResourceProviderInfo.
* [MESOS-7578] - Write a proposal to make the I/O Switchboards optional.
* [MESOS-7594] - Implement 'apply' for resource provider related operations.
* [MESOS-7757] - Update master to handle updates to agent total resources.
* [MESOS-7790] - Design hierarchical quota allocation.
* [MESOS-7807] - Docker executor needs to return multiple IP addresses for the container.
* [MESOS-7892] - Filter results of `/state` on agent by role.
* [MESOS-7899] - Expose sandboxes using virtual paths and hide the agent work directory.
* [MESOS-7936] - Move sandbox path volume logic to 'volume/sandbox_path' isolator.
* [MESOS-7982] - Create Centos 6/7 RPM package.
* [MESOS-7985] - Use ASF CI for automating RPM packaging and upload to bintray.
* [MESOS-7992] - Enable OpenSSL build on Windows.
* [MESOS-8013] - Add test for blkio statistics.
* [MESOS-8032] - Launch CSI plugins in storage local resource provider.
* [MESOS-8050] - Mesos HTTP/HTTPS health checks for IPv6 docker containers.
* [MESOS-8060] - Introduce first class 'profile' for disk resources.
* [MESOS-8071] - Add agent capability for resource provider.
* [MESOS-8075] - Add ReadWriteLock to libprocess.
* [MESOS-8079] - Checkpoint and recover layers used to provision rootfs in provisioner.
* [MESOS-8086] - Update ACCEPT call handler in master for new operations.
* [MESOS-8087] - Add operation status update handler in Master.
* [MESOS-8088] - Introduce Lamport timestamp for offer operations.
* [MESOS-8089] - Add messages to publish resources on a resource provider.
* [MESOS-8097] - Add filesystem layout for local resource providers.
* [MESOS-8098] - Benchmark Master failover performance.
* [MESOS-8099] - Add protobuf for checkpointing resource provider states.
* [MESOS-8100] - Authorize standalone container calls from local resource providers.
* [MESOS-8101] - Import resources from CSI plugins in storage local resource provider.
* [MESOS-8102] - Add a test CSI plugin for storage local resource provider.
* [MESOS-8107] - Add a call to update total resources in the resource provider API.
* [MESOS-8108] - Process offer operations in storage local resource provider.
* [MESOS-8130] - Add placeholder handlers for offer operation feedback.
* [MESOS-8131] - Add new protobuf messages for offer operation feedback.
* [MESOS-8132] - Design a library to send offer operation status updates.
* [MESOS-8139] - Upgrade protobuf to 3.5.x.
* [MESOS-8141] - Add filesystem layout for storage resource providers.
* [MESOS-8143] - Publish and unpublish storage local resources through CSI plugins.
* [MESOS-8181] - Add tests that a failed offer operation on resource provider resources leads to a clock update.
* [MESOS-8183] - Add a container daemon to monitor a long-running standalone container.
* [MESOS-8186] - Implement the agent's AcknowledgeOfferOperationMessage handler.
* [MESOS-8187] - Enable LRP to send operation status updates, checkpoint, and retry using the SUM.
* [MESOS-8193] - Update master's OfferOperationStatusUpdate handler to acknowledge updates to the agent if OfferOperationID is not set.
* [MESOS-8195] - Implement explicit offer operation reconciliation between the master, agent and RPs.
* [MESOS-8196] - Propagate failures from applying offer operations from resource providers.
* [MESOS-8197] - Implement a library to send offer operation status updates.
* [MESOS-8198] - Update the ReconcileOfferOperations protos.
* [MESOS-8199] - Add plumbing for explicit offer operation reconciliation between master, agent, and RPs.
* [MESOS-8207] - Reconcile offer operations between resource providers, agents, and master.
* [MESOS-8211] - Handle agent local resources in offer operation handler.
* [MESOS-8218] - Support `RESERVE`/`CREATE` operations with resource providers.
* [MESOS-8222] - Add resource versions to RunTaskMessage.
* [MESOS-8244] - Add operator API to reload local resource providers.
* [MESOS-8251] - Introduce a way to resolve the "profile" for disk resources.
* [MESOS-8265] - Add state recovery for storage local resource provider.
* [MESOS-8269] - Support resource provider re-subscription in the resource provider manager.
* [MESOS-8270] - Add an agent endpoint to list all active resource providers.
* [MESOS-8309] - Introduce a UUID message type.
* [MESOS-8375] - Use protobuf reflection to simplify upgrading of resources.
* [MESOS-8394] - Bump CSI to 0.1.0.
Release Notes - Mesos - Version 1.4.4 (WIP)
-------------------------------------------
* This is a bug fix release.
** Bug
* [MESOS-9507] - Agent could not recover due to empty docker volume checkpointed files.
* [MESOS-9695] - Remove the duplicate pid check in Docker containerizer
* [MESOS-10126] - Docker volume isolator needs to clean up the `info` struct regardless the result of unmount operation
** Improvement:
* [MESOS-9159] - Support Foreign URLs in docker registry puller.
* [MESOS-9675] - Docker Manifest V2 Schema2 Support.
Release Notes - Mesos - Version 1.4.3
-------------------------------------
* This is a bug fix release.
** Bug
* [MESOS-8128] - Make os::pipe file descriptors O_CLOEXEC.
* [MESOS-8568] - Command checks should always call `WAIT_NESTED_CONTAINER` before `REMOVE_NESTED_CONTAINER`
* [MESOS-8620] - Containers stuck in FETCHING possibly due to unresponsive server.
* [MESOS-8917] - Agent leaking file descriptors into forked processes.
* [MESOS-8921] - Autotools don't work with newer OpenJDK versions
* [MESOS-9144] - Master authentication handling leads to request amplification.
* [MESOS-9145] - Master has a fragile burned-in 5s authentication timeout.
* [MESOS-9146] - Agent has a fragile burn-in 5s authentication timeout.
* [MESOS-9147] - Agent and scheduler driver authentication retry backoff time could overflow.
* [MESOS-9151] - Container stuck at ISOLATING due to FD leak.
* [MESOS-9170] - Zookeeper doesn't compile with newer gcc due to format error.
* [MESOS-9196] - Removing rootfs mounts may fail with EBUSY.
* [MESOS-9221] - If some image layers are large, the image pulling may stuck due to the authorized token expired.
* [MESOS-9231] - `docker inspect` may return an unexpected result to Docker executor due to a race condition.
* [MESOS-9279] - Docker Containerizer 'usage' call might be expensive if mount table is big.
* [MESOS-9283] - Docker containerizer actor can get backlogged with large number of containers.
* [MESOS-9304] - Test `CGROUPS_ROOT_PidNamespaceForward` and `CGROUPS_ROOT_PidNamespaceBackward` fails on 1.4.x.
* [MESOS-9334] - Container stuck at ISOLATING state due to libevent poll never returns.
* [MESOS-9419] - Executor to framework message crashes master if framework has not re-registered.
* [MESOS-9480] - Master may skip processing authorization results for `LAUNCH_GROUP`.
* [MESOS-9492] - Persist CNI working directory across reboot.
* [MESOS-9501] - Mesos executor fails to terminate and gets stuck after agent host reboot.
* [MESOS-9502] - IOswitchboard cleanup could get stuck due to FD leak from a race.
* [MESOS-9518] - CNI_NETNS should not be set for orphan containers that do not have network namespace.
* [MESOS-9532] - ResourceOffersTest.ResourceOfferWithMultipleSlaves is flaky.
* [MESOS-9533] - CniIsolatorTest.ROOT_CleanupAfterReboot is flaky.
** Improvement:
* [MESOS-9510] - Disallowed nan, inf and so on in `Value::Scalar`.
* [MESOS-9516] - Extend `min_allocatable_resources` flag to cover non-scalar resources.
Release Notes - Mesos - Version 1.4.2
-------------------------------------
* This is a bug fix release.
** Bug
* [MESOS-4527] - Roles can exceed limit allocation via reservations.
* [MESOS-6616] - Error: dereferencing type-punned pointer will break strict-aliasing rules.
* [MESOS-7099] - Quota can be exceeded due to coarse-grained offer technique.
* [MESOS-7504] - Parent's mount namespace cannot be determined when launching a nested container.
* [MESOS-7975] - The command/default/docker executor can incorrectly send a TASK_FINISHED update even when the task is killed.
* [MESOS-8106] - Docker fetcher plugin unsupported scheme failure message is not accurate.
* [MESOS-8125] - Agent should properly handle recovering an executor when its pid is reused.
* [MESOS-8159] - ns::clone uses an async signal unsafe stack.
* [MESOS-8171] - Using a failoverTimeout of 0 with Mesos native scheduler client can result in infinite subscribe loop.
* [MESOS-8237] - Strip (Offer|Resource).allocation_info for non-MULTI_ROLE schedulers.
* [MESOS-8253] - Mesos CI docker rmi conflict.
* [MESOS-8293] - Reservation may not be allocated when the role has no quota.
* [MESOS-8297] - Built-in driver-based executors ignore kill task if the task has not been launched.
* [MESOS-8339] - Quota headroom may be insufficiently held when role has more reservation than quota.
* [MESOS-8352] - Resources may get over allocated to some roles while fail to meet the quota of other roles.
* [MESOS-8356] - Persistent volume ownership is set to root despite of sandbox owner (frameworkInfo.user) when docker executor is used.
* [MESOS-8411] - Killing a queued task can lead to the command executor never terminating.
* [MESOS-8418] - mesos-agent high cpu usage because of numerous /proc/mounts reads.
* [MESOS-8480] - Mesos returns high resource usage when killing a Docker task.
* [MESOS-8488] - Docker bug can cause unkillable tasks.
* [MESOS-8550] - Bug in `Master::detected()` leads to coredump in `MasterZooKeeperTest.MasterInfoAddress`.
* [MESOS-8552] - CGROUPS_ROOT_PidNamespaceForward and CGROUPS_ROOT_PidNamespaceBackward tests fail.
* [MESOS-8569] - Allow newline characters when decoding base64 strings in stout.
* [MESOS-8573] - Container stuck in PULLING when Docker daemon hangs
* [MESOS-8574] - Docker executor makes no progress when 'docker inspect' hangs.
* [MESOS-8575] - Improve discard handling for 'Docker::stop' and 'Docker::pull'.
* [MESOS-8576] - Improve discard handling of 'Docker::inspect()'.
* [MESOS-8604] - Quota headroom tracking may be incorrect in the presence of hierarchical reservation.
* [MESOS-8605] - Terminal task status update will not send if 'docker inspect' is hung.
* [MESOS-8626] - The 'allocatable' check in the allocator is problematic with multi-role frameworks.
* [MESOS-8651] - Potential memory leaks in the `volume/sandbox_path` isolator.
* [MESOS-8786] - CgroupIsolatorProcess accesses subsystem processes directly.
* [MESOS-8830] - Agent gc on old slave sandboxes could empty persistent volume data
* [MESOS-8871] - Agent may fail to recover if the agent dies before image store cache checkpointed.
* [MESOS-8876] - Normal exit of Docker container using rexray volume results in TASK_FAILED.
* [MESOS-8881] - Enable epoll backend in libevent integration.
* [MESOS-8885] - Disable libevent debug mode.
* [MESOS-8904] - Master crash when removing quota.
* [MESOS-8934] - Update python.m4 to support Python 3.
* [MESOS-8935] - Quota limit "chopping" can lead to cpu-only and memory-only offers.
* [MESOS-8936] - Implement a Random Sorter for offer allocations.
* [MESOS-8942] - Master streaming API does not send (health) check updates for tasks.
* [MESOS-8945] - Master check failure due to CHECK_SOME(providerId).
* [MESOS-8947] - Improve the container preparing logging in IOSwitchboard and volume/secret isolator.
* [MESOS-8952] - process::await/collect n^2 performance issue.
* [MESOS-8963] - Executor crash trying to print container ID.
* [MESOS-8980] - mesos-slave can deadlock with docker pull.
* [MESOS-8986] - `slave.available()` in the allocator is expensive and drags down allocation performance.
* [MESOS-8987] - Master asks agent to shutdown upon auth errors.
* [MESOS-9049] - Agent GC could unmount a dangling persistent volume multiple times.
* [MESOS-9088] - `createStrippedScalarQuantity()` should clear all metadata fields.
* [MESOS-9125] - Port mapper CNI plugin might fail with "Resource temporarily unavailable"
* [MESOS-9127] - Port mapper CNI plugin might deadlock iptables on the agent.
Release Notes - Mesos - Version 1.4.1
-------------------------------------
* This is a bug fix release.
** Bug
* [MESOS-7873] - Expose `ExecutorInfo.ContainerInfo.NetworkInfo` in Mesos `state` endpoint.
* [MESOS-7921] - ProcessManager::resume sometimes crashes accessing EventQueue.
* [MESOS-7964] - Heavy-duty GC makes the agent unresponsive.
* [MESOS-7968] - Handle `/proc/self/ns/pid_for_children` when parsing available namespace.
* [MESOS-7969] - Handle cgroups v2 hierarchy when parsing /proc/self/cgroups.
* [MESOS-7980] - Stout fails to compile with libc >= 2.26.
* [MESOS-8051] - Killing TASK_GROUP fail to kill some tasks.
* [MESOS-8080] - The default executor does not propagate missing task exit status correctly.
* [MESOS-8090] - Mesos 1.4.0 crashes with 1.3.x agent with oversubscription
* [MESOS-8135] - Masters can lose track of tasks' executor IDs.
* [MESOS-8169] - Incorrect master validation forces executor IDs to be globally unique.
Release Notes - Mesos - Version 1.4.0
-------------------------------------
This release contains the following new features:
* [MESOS-5116] - The `disk/xfs` isolator now supports the
`--enforce_container_disk_quota` flag to efficiently measure disk
usage without enforcing usage constraints.
* [MESOS-6223] - Agents are now allowed to recover the agent ID
after a host reboot. See docs/upgrades.md for details.
* [MESOS-6375] - **Experimental** Support for hierarchical resource
allocation roles. Hierarchical roles allows delegation of resource
allocation policies (i.e. fair sharing and quota) further down the
hierarchy. For example, the "engineering" organization gets a 75%
share of the resources, but it's up to the operators within the
"engineering" organization to figure out how to fairly share between
the "engineering/backend" team and the "engineering/frontend" team.
The same delegation applies for quota. NOTE: There are known issues
related to hierarchical roles (e.g. hierarchical quota allocation
is not implemented and quota will be over-allocated if used with
hierarchical roles, see: MESOS-7402) and thus it is not recommended
for production usage at this time.
* [MESOS-7418, MESOS-7088] - File-based secrets are now supported for Mesos
and Universal containerizer. Image-pull secrets are supported for Docker
registry credentials.
* [MESOS-7477] - Linux ambient capabilites are now supported, so
frameworks can run tasks that use ambient capabilites to grant
limited additional privileged to tasks.
* [MESOS-7476, MESOS-7671] - Support for frameworks and operators
specifying Linux bounding capabilities in order to limit the
maximum privileges that a task may acquire.
Deprecations/Removals:
* [MESOS-7671] - LinuxInfo.capabilities is deprecated in favor
of LinuxInfo.effective_capabilities.
* [MESOS-7477] - The agent `--allowed_capabilities` flag is
deprecated in favor of `--effective_capabilities`
Unresolved Critical Issues:
* [MESOS-7643] - The order of isolators provided in '--isolation' flag is not preserved and instead sorted alphabetically
* [MESOS-7402] - Quota is over-allocated when used with hierarchical roles.
Additional API Changes:
* [MESOS-7755] The interpretation of the optional resource argument
passed in `Allocator::updateSlave` was changed from the total
amount of oversubscribed resources on the agent to the new total
resources (both revocable and non-revocable) on the agent. Custom
allocator implementation should be changed to interpretation of the
passed value as a total before updating.
Feature Graduations:
* [MESOS-2533] - Support HTTP checks in Mesos.
* [MESOS-3567] - Support TCP checks in Mesos.
All Resolved Issues:
** Bug
* [MESOS-1987] - Add support for SemVer build and prerelease labels to stout.
* [MESOS-4210] - Investigate increasing protobuf protocol message size limit.
* [MESOS-4331] - git commit-msg hook completely breaks fixup commits.
* [MESOS-4467] - Implement `sleep` in Windows
* [MESOS-4983] - Segfault in ProcessTest.Spawn with GCC 6
* [MESOS-4992] - sandbox uri does not work outisde mesos http server
* [MESOS-5187] - The filesystem/linux isolator does not set the permissions of the host_path.
* [MESOS-5903] - `GTEST_IS_THREADSAFE` guards prevent many tests from being run on Windows.
* [MESOS-5937] - `flags::parse` assumes the filesystem is rooted at '/'
* [MESOS-5938] - `net::links` is not implemented on Windows.
* [MESOS-6115] - Source tree contains compiled protobuf source
* [MESOS-6539] - Compile warning in GMock: "binding dereferenced null pointer to reference"
* [MESOS-6743] - Docker executor hangs forever if `docker stop` fails.
* [MESOS-6814] - Make sure compilation configuration is propagated correctly to third party dependencies
* [MESOS-6817] - Audit the use of UNICODE-related code paths
* [MESOS-6916] - Improve health checks validation.
* [MESOS-6950] - Launching two tasks with the same Docker image simultaneously may cause a staging dir never cleaned up
* [MESOS-6961] - Executors don't use glog for logging.
* [MESOS-7017] - HTTP API responses can crash the master.
* [MESOS-7115] - Agent should prefer LOG(FATAL) over EXIT().
* [MESOS-7173] - CMake does not define `GIT_SHA` etc. in build.cpp
* [MESOS-7186] - Metrics about used/allocated shared resources are incorrect accounted.
* [MESOS-7193] - Use of `GTEST_IS_THREADSAFE` in asserts is problematic.
* [MESOS-7252] - Need to fix resource check in long-lived framework
* [MESOS-7268] - CNI isolator should mount network related /etc/* files in readonly mode
* [MESOS-7351] - CMake < 3.8.0 cannot find VS2017 tools
* [MESOS-7373] - Remove thread_local workaround on OSX
* [MESOS-7374] - Running DOCKER images in Mesos Container Runtime without `linux/filesystem` isolation enabled renders host unusable
* [MESOS-7378] - Build failure with glibc 2.12.
* [MESOS-7381] - Flaky tests in NestedMesosContainerizerTest
* [MESOS-7389] - Mesos 1.2.0 crashes with pre-1.0 Mesos agents.
* [MESOS-7403] - Resources::apply(const Offer::Operation&) should fail when a shared persistent volume can't be removed
* [MESOS-7441] - RegisterSlaveValidationTest.DropInvalidRegistration is flaky
* [MESOS-7457] - HierarchicalAllocatorTest.NestedRoleQuota is flaky
* [MESOS-7458] - webui display of framework resources is confusing
* [MESOS-7459] - Fix the duration.hpp warning
* [MESOS-7462] - Flaky test HierarchicalAllocatorTest.NestedRoleDRF
* [MESOS-7464] - Recent Docker versions cannot be parsed by stout.
* [MESOS-7468] - Could not copy the sandbox path on WebUI
* [MESOS-7471] - Provisioner recover should not always assume 'rootfses' dir exists.
* [MESOS-7476] - Restrict capabilities to only the bounding set.
* [MESOS-7484] - VersionTest.ParseInvalid aborts on Windows.
* [MESOS-7496] - The /debug:fastlink linker option is not being respected
* [MESOS-7498] - Remove need to set environment variable `PreferredToolArchitecture`
* [MESOS-7502] - Build error on Windows when using "int" for a file descriptor
* [MESOS-7507] - Add a metric for the network size of replicas for the registry.
* [MESOS-7515] - MasterAllocatorTest/0.ResourcesUnused is flaky
* [MESOS-7524] - Basic fetcher success metrics
* [MESOS-7545] - Volume secret isolator breaks Windows build
* [MESOS-7552] - MasterAllocatorTest/0.FrameworkExited is flaky
* [MESOS-7569] - Allow "old" executors with half-open connections to be preserved during agent upgrade / restart.
* [MESOS-7581] - Specifying an unbundled dependency can cause build to pick up wrong Boost version
* [MESOS-7584] - ASF Jenkins build errors out on missing 'python-six' dependency
* [MESOS-7597] - libprocess build is broken
* [MESOS-7618] - CMake files incompatible with multi-configuration generators
* [MESOS-7627] - Mesos slave stucks
* [MESOS-7638] - The command `false` does not exist on Windows
* [MESOS-7640] - Docker containerizer fails to set sandbox logs ownership correctly.
* [MESOS-7652] - Docker image with universal containerizer does not work if WORKDIR is missing in the rootfs.
* [MESOS-7655] - Reservation Refinement: Update the resources logic.
* [MESOS-7662] - Documentation regarding TASK_LOST is misleading
* [MESOS-7666] - Update the agent to use the new resource format
* [MESOS-7667] - Update the master to use the new resource format.
* [MESOS-7669] - Update the test utilities to produce the resources in the new format
* [MESOS-7671] - Let frameworks specify the task bounding capabilities.
* [MESOS-7674] - Update the generic Protobuf to JSON facility to not output deprecated fields
* [MESOS-7679] - V1 Operator API update for reservation refinement.
* [MESOS-7689] - Libprocess can crash on malformed request paths for libprocess messages.
* [MESOS-7690] - The agent can crash when an unknown executor tries to register.
* [MESOS-7700] - Prevent reserve/create operations with refined reservations on non-capable agents.
* [MESOS-7703] - Mesos fails to exec a custom executor when no shell is used
* [MESOS-7711] - Master updates registry for reregistering agents even when they haven't been unreachable
* [MESOS-7714] - Fix agent downgrade for reservation refinement
* [MESOS-7716] - Mesos 1.2.0 agent crashes Mesos 1.4.0 master
* [MESOS-7725] - PersistentVolumeEndpointsTest.ReserveAndSlaveRemoval test is flaky
* [MESOS-7728] - Java HTTP adapter crashes JVM when leading master disconnects.
* [MESOS-7735] - The master crashes when state endpoint is hit during a task authorization.
* [MESOS-7744] - Mesos Agent Sends TASK_KILL status update to Master, and still launches task
* [MESOS-7751] - Mesos failed to build on Windows due to error C2039: 'parse': is not a member of 'mesos::internal::protobuf'
* [MESOS-7753] - `log.LearnedMessage` could be rejected due to being sent from '@0.0.0.0:0'
* [MESOS-7758] - Stout doesn't build standalone.
* [MESOS-7761] - Website ruby deps do not bundle on macOS
* [MESOS-7765] - MasterTest.KillUnknownTask is failling due to a bug in `net::IPv4::ANY()`
* [MESOS-7769] - libprocess initializes to bind to random port if --ip is not specified
* [MESOS-7770] - Persistent volume might not be mounted if there is a sandbox volume whose source is the same as the target of the persistent volume.
* [MESOS-7772] - Copy-n-paste error in slave/main.cpp
* [MESOS-7775] - Eliminate extra process abort in a subprocess watchdog
* [MESOS-7777] - Agent failed to recover due to mount namespace leakage in Docker 1.12/1.13
* [MESOS-7778] - Hide per-platform subprocess headers.
* [MESOS-7783] - Framework might not receive status update when a just launched task is killed immediately
* [MESOS-7794] - Mesos failed with error c2102 when build in conformance mode (/permissive-)
* [MESOS-7796] - LIBPROCESS_IP isn't passed on to the fetcher
* [MESOS-7797] - Hard-coded forward slash breaks windows docker container task in DC/OS
* [MESOS-7805] - mesos-execute has incorrect example TaskInfo in help string
* [MESOS-7817] - CreateProcess wrapper's error message is bad
* [MESOS-7821] - Resource refinement does downgrade task.executor.resources in LAUNCH_GROUP handler.
* [MESOS-7830] - Sandbox_path volume does not have ownership set correctly.
* [MESOS-7831] - Resource refinement is not applied to tasks in completed_frameworks.
* [MESOS-7849] - The rlimits and linux/capabilities isolators should support nested containers
* [MESOS-7858] - Launching a nested container with namespace/pid isolation, with glibc < 2.25, may deadlock the LinuxLauncher and MesosContainerizer
* [MESOS-7863] - Agent may drop pending kill task status updates.
* [MESOS-7865] - Agent may process a kill task and still launch the task.
* [MESOS-7869] - Build fails with `--disable-zlib` or `--with-zlib=DIR`
* [MESOS-7871] - Agent fails assertion during request to '/state'
* [MESOS-7872] - Scheduler hang when registration fails.
* [MESOS-7888] - Track fetcher task success and failures
* [MESOS-7909] - Ordering dependency between 'linux/capabilities' and 'docker/runtime' isolator.
* [MESOS-7912] - Master WebUI not working in Chrome.
* [MESOS-7921] - process::EventQueue sometimes crashes
* [MESOS-7922] - Fix communication between old masters and new agents.
* [MESOS-7926] - Abnormal termination of default executor can cause MesosContainerizer::destroy to fail.
* [MESOS-7934] - OOM due to LibeventSSLSocket send incorrectly returning 0 after shutdown.
** Documentation
* [MESOS-7246] - Add documentation for AGENT_ADDED/AGENT_REMOVED events.
* [MESOS-7349] - Document Mesos "check" feature.
* [MESOS-7501] - Change legacy --with-network-isolator to --with-port-mapping-isolator
** Epic
* [MESOS-6975] - Prevent pre-1.0 agents from registering with 1.3+ master.
* [MESOS-7088] - Support private registry credential per container.
* [MESOS-7623] - Automatically publish website through CI
** Improvement
* [MESOS-5116] - Add support for accounting only mode in XFS isolator.
* [MESOS-5417] - define WSTRINGIFY behaviour on Windows
* [MESOS-6053] - Combine test helpers into one single binary.
* [MESOS-6223] - Allow agents to reregister post a host reboot
* [MESOS-6535] - The default executor should support kill policies
* [MESOS-6549] - Asynchronous dir removal in agent GC
* [MESOS-6782] - Inherit Environment from parent container when launching DEBUG container.
* [MESOS-6905] - Task status updates caused by task health update do not set appropriate reason.
* [MESOS-6976] - Disallow (re-)registration attempts by old agents.
* [MESOS-6977] - Cleanup tech debt in master for old agents
* [MESOS-6978] - Update webui to remove orphan tasks
* [MESOS-7006] - Launch docker containers with --cpus instead of cpu-shares
* [MESOS-7015] - Frameworks should be able to (re)register in suppressed state
* [MESOS-7092] - Health checker duplicates a lot of checker's functionality.
* [MESOS-7228] - Upgrade Mesos to build with proto3.
* [MESOS-7327] - Add a test with multiple tasks and checks for the default executor.
* [MESOS-7343] - Add a ReviewBot for testing patches on Windows
* [MESOS-7355] - Set MESOS_SANDBOX in debug containers.
* [MESOS-7364] - Upgrade vendored GMock / GTest
* [MESOS-7401] - Optionally reject messages when UPIDs does not match IP.
* [MESOS-7418] - Add support for file-based secrets
* [MESOS-7429] - Allow isolators to inject task-specific environment variables.
* [MESOS-7451] - Expose MOUNT volumes of an agent in master's v0 HTTP API
* [MESOS-7477] - Support ambient capabilities.
* [MESOS-7540] - Add an agent flag for executor re-registration timeout.
* [MESOS-7542] - Add executor reconnection retry logic to the agent
* [MESOS-7572] - Attach latest symlink when executor is registered.
* [MESOS-7585] - Added 'mesos config show' command to the new Mesos CLI.
* [MESOS-7608] - Protobuf definitions for domains
* [MESOS-7609] - Protobuf definitions for region-aware framework capability
* [MESOS-7610] - Support domains in master and agent
* [MESOS-7611] - Prevent master from joining mixed-region cluster
* [MESOS-7612] - Prevent agent with misconfigured domain from registering
* [MESOS-7614] - Only offer resources on remote agents to region-aware frameworks
* [MESOS-7630] - Add simple filtering to unversioned operator API
* [MESOS-7644] - Add DomainInfo to offers
* [MESOS-7782] - Add fetcher cache size metrics.
* [MESOS-7792] - Add support for ECDH ciphers
* [MESOS-7808] - Bundling gRPC into 3rdparty
* [MESOS-7809] - Building gRPC with Autotools
* [MESOS-7810] - gRPC support in libprocess
* [MESOS-7814] - Improve the test frameworks.
* [MESOS-7862] - Get rid of timestamp and date in generated javadoc files
* [MESOS-7870] - Refactor libssl and libcrypto checks for building gRPC
* [MESOS-7881] - Building gRPC with CMake
** Task
* [MESOS-6101] - Add Framwork events to master's operator API
* [MESOS-6162] - Add support for cgroups blkio subsystem blkio statistics.
* [MESOS-6441] - Display reservations in the agent page in the webui.
* [MESOS-7149] - Support reservations for role subtrees
* [MESOS-7283] - Add ability to initialize a test cluster for Mesos CLI unit-test infrastructure
* [MESOS-7304] - Fetcher should not depend on SlaveID.
* [MESOS-7315] - Design doc for resource provider and storage integration.
* [MESOS-7414] - Enable authorization for master's logging API calls: GET_LOGGING_LEVEL and SET_LOGGING_LEVEL
* [MESOS-7415] - Add authorization to master's operator maintenance API in v0 and v1
* [MESOS-7416] - Filter results of `/master/slaves` and the v1 call GET_AGENTS
* [MESOS-7417] - Design doc for file-based secrets.
* [MESOS-7433] - Set working directory in DEBUG containers.
* [MESOS-7449] - Refactor containerizers to not depend on TaskInfo or ExecutorInfo
* [MESOS-7488] - Add `--ip6` and `--ip6_discovery_command` flag to Mesos agent
* [MESOS-7505] - Enable hierarchical roles
* [MESOS-7560] - Add 'type' and 'name' to ResourceProviderInfo.
* [MESOS-7571] - Add `--resource_provider_config_dir` flag to the agent.
* [MESOS-7576] - Add master flag `--filter-gpu-resources={true|false}`
* [MESOS-7582] - Add Config class to manage the Mesos CLI config file.
* [MESOS-7591] - Update master to use resource provider IDs instead of agent ID in allocator calls.
* [MESOS-7593] - Update offer handling in the master to consider local resource providers
* [MESOS-7624] - Move website from svn to git
* [MESOS-7625] - Create script to automate publishing website
* [MESOS-7626] - Create a CI job to publish the website
* [MESOS-7631] - DefautlExecutor needs to inform tasks about IP addresses
* [MESOS-7632] - Add `HIERARCHICAL_ROLE` agent capability
* [MESOS-7633] - Prevent hierarchical roles from being allocated resources from non-HIERARCHICAL_ROLE agents.
* [MESOS-7665] - V0 Operator API update for reservation refinement.
* [MESOS-7668] - Update authorization to handle reservation refinement.
* [MESOS-7696] - Update resource provider design in the master
* [MESOS-7709] - Add --default_container_dns flag to the agent.
* [MESOS-7713] - Optimize number of copies made in dispatch/defer mechanism
* [MESOS-7755] - Update allocator to support updating agent total resources
* [MESOS-7757] - Update master to handle updates to agent total resources
* [MESOS-7767] - Make `net::IP` fields protected to allow for inheritance
* [MESOS-7780] - Add `SUBSCRIBE` call handling to the resource provider manager
* [MESOS-7806] - Add copy assignment operator to `net::IP::Network`
* [MESOS-7853] - Support shared PID namespace.
* [MESOS-7879] - The kill nested container call should provide ability to specify a signal.
Release Notes - Mesos - Version 1.3.3 (WIP) - cancelled
-------------------------------------------------------
* This is a bug fix release.
All Issues:
** Bug
* [MESOS-8125] - Agent should properly handle recovering an executor when its pid is reused.
* [MESOS-8171] - Using a failoverTimeout of 0 with Mesos native scheduler client can result in infinite subscribe loop.
* [MESOS-8411] - Killing a queued task can lead to the command executor never terminating.
* [MESOS-8480] - Mesos returns high resource usage when killing a Docker task.
* [MESOS-8488] - Docker bug can cause unkillable tasks.
* [MESOS-8552] - CGROUPS_ROOT_PidNamespaceForward and CGROUPS_ROOT_PidNamespaceBackward tests fail.
* [MESOS-8574] - Docker executor makes no progress when 'docker inspect' hangs.
* [MESOS-8575] - Improve discard handling for 'Docker::stop' and 'Docker::pull'.
* [MESOS-8576] - Improve discard handling of 'Docker::inspect()'.
* [MESOS-8605] - Terminal task status update will not send if 'docker inspect' is hung.
* [MESOS-8651] - Potential memory leaks in the `volume/sandbox_path` isolator.
* [MESOS-8786] - CgroupIsolatorProcess accesses subsystem processes directly.
* [MESOS-8876] - Normal exit of Docker container using rexray volume results in TASK_FAILED.
* [MESOS-8881] - Enable epoll backend in libevent integration.
* [MESOS-8885] - Disable libevent debug mode.
* [MESOS-8904] - Master crash when removing quota.
Release Notes - Mesos - Version 1.3.2
-------------------------------------
* This is a bug fix release.
All Issues:
** Bug
* [MESOS-6743] - Docker executor hangs forever if `docker stop` fails.
* [MESOS-6950] - Launching two tasks with the same Docker image simultaneously may cause a staging dir never cleaned up.
* [MESOS-7652] - Docker image with universal containerizer does not work if WORKDIR is missing in the rootfs.
* [MESOS-7674] - Update the generic Protobuf to JSON facility to not output deprecated fields.
* [MESOS-7858] - Launching a nested container with namespace/pid isolation, with glibc < 2.25, may deadlock the LinuxLauncher and MesosContainerizer.
* [MESOS-7863] - Agent may drop pending kill task status updates.
* [MESOS-7865] - Agent may process a kill task and still launch the task.
* [MESOS-7872] - Scheduler hang when registration fails.
* [MESOS-7909] - Ordering dependency between 'linux/capabilities' and 'docker/runtime' isolator.
* [MESOS-7912] - Master WebUI not working in Chrome.
* [MESOS-7926] - Abnormal termination of default executor can cause MesosContainerizer::destroy to fail.
* [MESOS-7934] - OOM due to LibeventSSLSocket send incorrectly returning 0 after shutdown.
* [MESOS-8135] - Masters can lose track of tasks' executor IDs.
* [MESOS-8237] - Strip (Offer|Resource).allocation_info for non-MULTI_ROLE schedulers.
* [MESOS-8356] - Persistent volume ownership is set to root despite of sandbox owner (frameworkInfo.user) when docker executor is used.
Release Notes - Mesos - Version 1.3.1
-------------------------------------
* This is a bug fix release.
All Issues:
** Bug
* [MESOS-5187] - The filesystem/linux isolator does not set the permissions of the host_path.
* [MESOS-7252] - Need to fix resource check in long-lived framework.
* [MESOS-7429] - Allow isolators to inject task-specific environment variables.
* [MESOS-7540] - Add an agent flag for executor re-registration timeout.
* [MESOS-7546] - WAIT_NESTED_CONTAINER sometimes returns 404.
* [MESOS-7569] - Allow "old" executors with half-open connections to be preserved during agent upgrade / restart.
* [MESOS-7581] - Fix interference of external Boost installations when using some unbundled dependencies.
* [MESOS-7689] - Libprocess can crash on malformed request paths for libprocess messages.
* [MESOS-7690] - The agent can crash when an unknown executor tries to register.
* [MESOS-7692] - Default environment variables defined in Docker image are not available in Mesos containerizer.
* [MESOS-7703] - Mesos fails to exec a custom executor when no shell is used.
* [MESOS-7728] - Java HTTP adapter crashes JVM when leading master disconnects.
* [MESOS-7770] - Persistent volume might not be mounted if there is a sandbox volume whose source is the same as the target of the persistent volume.
* [MESOS-7777] - Agent failed to recover due to mount namespace leakage in Docker 1.12/1.13.
* [MESOS-7796] - LIBPROCESS_IP isn't passed on to the fetcher.
* [MESOS-7830] - Sandbox_path volume does not have ownership set correctly.
Release Notes - Mesos - Version 1.3.0
-------------------------------------
This release contains the following new features:
* [MESOS-1763] - Support for frameworks to receive resources for multiple
roles. This allows "multi-user" frameworks to leverage the role-based
resource allocation in mesos. Prior to this support, one had to run
multiple instances of a single-user framework to achieve multi-user
resource allocation, or implement multi-user resource allocation in
the framework.
* [MESOS-6365] - Authentication and authorization support for HTTP executors.
A new `--authenticate_http_executors` agent flag enables required
authentication on the HTTP executor API. A new `--executor_secret_key` flag
sets a key file to be used when generating and authenticating default tokens
that are passed to HTTP executors. Note that enabling these flags after
upgrade is disruptive to HTTP executors that were launched before the
upgrade; see 'docs/authentication.md' for more information on these flags
and the recommended upgrade procedure. Implicit authorization rules have
been added which allow an authenticated executor to make executor API calls
as that executor and make operator API calls which affect that executor's
container. See 'docs/authorization.md' for more information on these
implicit authorization rules.
* [MESOS-6627] - Support for frameworks to modify the role(s) they are
subscribed to. This is essential to supporting "multi-user" frameworks
(see MESOS-1763) in that roles are expected to come and go over time
(e.g. new employees join, new teams are formed, employees leave, teams
are disbanded, etc).
**NOTE**: In Mesos 1.3.0, the master will no longer allow 0.x agents to
register. Interoperability between 1.1+ masters and 0.x agents has never
been supported; however, it was not explicitly disallowed, either.
Starting with this release of Mesos, registration attempts by 0.x Mesos
agents will be ignored.
Deprecations/Removals:
* [MESOS-7259] - Remove deprecated ACLs `SetQuota` and `RemoveQuota`.
This change is only applicable to the local authorizer since internally
these acls were being translated to the `UPDATE_QUOTA` action.
* [MESOS-7320] - Remove deprecated ACL `ShutdownFramework`.
This change is only applicable to the local authorizer since internally
these acls were being translated to the `TEARDOWN_FRAMEWORK` action.
Unresolved Critical Issues:
* [MESOS-1625] - Extra trailing CRLF being sent after the HTTP body in libprocess.
* [MESOS-1718] - Command executor can overcommit the agent.
* [MESOS-2554] - Slave flaps when using --slave_subsystems that are not used for isolation.
* [MESOS-2774] - SIGSEGV received during process::MessageEncoder::encode().
* [MESOS-2842] - Update FrameworkInfo.principal on framework re-registration.
* [MESOS-3533] - Unable to find and run URIs files.
* [MESOS-3747] - HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string.
* [MESOS-3794] - Master should not store arbitrarily sized data in ExecutorInfo.
* [MESOS-4259] - mesos HA can't delete the the redundant container on failure slave node.
* [MESOS-4297] - Executor does not shutdown when framework teardown.
* [MESOS-4642] - Mesos Agent Json API can dump binary data from log files out as invalid JSON.
* [MESOS-4996] - 'containerizer->update' will always fail after killing a docker container.
* [MESOS-5352] - Docker volume isolator cleanup can be blocked by first cleanup failure.
* [MESOS-5396] - After failover, master does not remove agents with same UPID.
* [MESOS-5849] - Agent sandboxes on Windows surpass the 260 character path length limit.
* [MESOS-5859] - Some tasks are always in staged state.
* [MESOS-5989] - Libevent SSL Socket downgrade code accesses uninitialized memory / assumes single peek is sufficient.
* [MESOS-5995] - Protobuf JSON deserialisation does not accept numbers formated as strings.
* [MESOS-6356] - ASF CI has interleaved logging.
* [MESOS-6615] - Running mesos-slave in the docker that leave many zombie process.
* [MESOS-6623] - Re-enable tests impacted by request streaming support.
* [MESOS-6632] - ContainerLogger might leak FD if container launch fails.
* [MESOS-6780] - ContentType/AgentAPIStreamingTest.AttachContainerInput test fails reliably.
* [MESOS-6784] - IOSwitchboardTest.KillSwitchboardContainerDestroyed is flaky.
* [MESOS-6804] - Running 'tty' inside a debug container that has a tty reports "Not a tty".
* [MESOS-6843] - Fetcher should not assume stdout/stderr in the sandbox.
* [MESOS-6913] - AgentAPIStreamingTest.AttachInputToNestedContainerSession fails on Mac OS.
* [MESOS-6974] - DefaultExecutorTest.CommitSuicideOnTaskFailure test is flaky.
* [MESOS-6986] - `abort` in `DRFSorter::add`.
* [MESOS-7017] - HTTP API responses can crash the master.
* [MESOS-7082] - ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillTask/0 is flaky.
* [MESOS-7099] - Quota can be exceeded due to coarse-grained offer technique.
* [MESOS-7215] - Race condition on re-registration of non-partition-aware frameworks.
* [MESOS-7298] - Fetcher caches files with world-readable permissions.
* [MESOS-7362] - GPU support can't work when run spark.
* [MESOS-7374] - Running DOCKER images in Mesos Container Runtime without `linux/filesystem` isolation enabled renders host unusable.
* [MESOS-7381] - Flaky tests in NestedMesosContainerizerTest.
* [MESOS-7386] - Executor not cleaning up existing running docker containers if external logrotate/logger processes die/killed.
Feature Graduations:
* [MESOS-2449] - Support group of tasks (Pod) constructs and API in Mesos.
* [MESOS-4641] - Support Container Network Interface (CNI).
* [MESOS-6419] - Teardown unregistered frameworks.
All Experimental Features:
* [MESOS-2533] - Support HTTP checks in Mesos.
* [MESOS-3094] - Mesos on Windows.
* [MESOS-3421] - Support sharing of resources across task instances.
* [MESOS-3567] - Support TCP checks in Mesos.
* [MESOS-4312] - Porting Mesos on Power (ppc64le).
* [MESOS-4355] - Implement isolator for Docker volume.
* [MESOS-4791] - Operator API v1.
* [MESOS-4828] - XFS disk quota isolator.
* [MESOS-5275] - Add capabilities support for mesos containerizer.
* [MESOS-5344] - Partition-aware Mesos frameworks.
* [MESOS-5788] - Added JAVA API adapter for seamless transition to new scheduler API.
* [MESOS-5931] - Support auto backend in Mesos Containerizer.
* [MESOS-6014] - Added port mapping CNI plugin.
* [MESOS-6077] - Added a default (task group) executor.
* [MESOS-6402] - rlimit support for Mesos containerizer.
* [MESOS-6460] - Container Attach/Exec.
* [MESOS-6758] - Support docker registry that requires basic auth.
* [MESOS-6906] - Introduce a general non-interpreting task check.
All Resolved Issues:
** Bug
* [MESOS-1987] - Add support for SemVer build and prerelease labels to stout.
* [MESOS-4245] - Add `dist` target to CMake solution.
* [MESOS-4263] - Report volume usage through ResourceStatistics.
* [MESOS-5028] - Copy provisioner cannot replace directory with symlink.
* [MESOS-5172] - Registry puller cannot fetch blobs correctly from http Redirect 3xx urls.
* [MESOS-5288] - Update leveldb patch file to suport s390x.
* [MESOS-5880] - Semantics of `environment` differ across Windows and POSIX.
* [MESOS-6134] - Port CFS quota support to Docker Containerizer using command executor.
* [MESOS-6138] - Add 'syntax=proto2' to all .proto files in Mesos.
* [MESOS-6327] - Large docker images causes container launch failures: Too many levels of symbolic links.
* [MESOS-6560] - The default stout stringify always copies its argument.
* [MESOS-6606] - Reject optimized builds with libcxx before 3.9.
* [MESOS-6720] - Check that `PreferredToolArchitecture` is set to `x64` on Windows before building.
* [MESOS-6730] - Reserve operation should validate reserved resource role against resource allocationInfo role.
* [MESOS-6731] - Create a test filter for stout tests that use `symlink` on Windows, as they will fail if not run as admin.
* [MESOS-6732] - XFS disk isolator should check whether quotas are enabled.
* [MESOS-6742] - Adding support for s390x architecture.
* [MESOS-6815] - Enable glog stack traces when we call things like `ABORT` on Windows.
* [MESOS-6858] - network/cni isolator generates incomplete resolv.conf.
* [MESOS-6868] - Transition Windows away from `os::killtree`.
* [MESOS-6892] - Reconsider process creation primitives on Windows.
* [MESOS-6907] - FutureTest.After3 is flaky.
* [MESOS-6951] - Docker containerizer: mangled environment when env value contains LF byte.
* [MESOS-6953] - A compromised mesos-master node can execute code as root on agents.
* [MESOS-6976] - Disallow (re-)registration attempts by old agents.
* [MESOS-6982] - PerfTest.Version fails on recent Arch Linux.
* [MESOS-7022] - Update framework authorization to support multiple roles.
* [MESOS-7029] - FaultToleranceTest.FrameworkReregister is flaky.
* [MESOS-7035] - Add test for framework upgrading to MULTI_ROLE with tasks running.
* [MESOS-7049] - CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_PERF_PerfTest is broken on Fedora 25.
* [MESOS-7097] - Framework credentials can be used to register as an agent.
* [MESOS-7133] - mesos-fetcher fails with openssl-related output.
* [MESOS-7135] - Outstanding offers to a dropped framework role should be rescinded.
* [MESOS-7146] - OSX broken due to wrong configuration of LevelDB after update.
* [MESOS-7158] - Add `role` to task/executor to indicate allocation role of their resources.
* [MESOS-7165] - Agents should be able to upgrade to be MULTI_ROLE capable.
* [MESOS-7172] - CMake does not incrementally recompile.
* [MESOS-7182] - Couple of MULTI_ROLE related tests are flaky.
* [MESOS-7197] - Requesting tiny amount of CPU crashes master.
* [MESOS-7208] - Persistent volume ownership is set to root when task is running with non-root user.
* [MESOS-7210] - HTTP health check doesn't work when mesos runs with --docker_mesos_image.
* [MESOS-7225] - Tasks launched via the default executor cannot access disk resource volumes.
* [MESOS-7236] - Base64 encoding/decoding (via stout) behaves differently on Windows.
* [MESOS-7237] - Enabling cgroups_limit_swap can lead to "invalid argument" error.
* [MESOS-7248] - RemoveNestedContainer returns unsupported.
* [MESOS-7255] - New mesos-style.py linter behavior breaks commiting when virtualenv is not installed.
* [MESOS-7259] - Remove deprecated ACLs `SetQuota` and `RemoveQuota`.
* [MESOS-7261] - maintenance.html is missing during packaging.
* [MESOS-7263] - User supplied task environment variables cause warnings in sandbox stdout.
* [MESOS-7264] - Possibly duplicate environment variables should not leak values to the sandbox.
* [MESOS-7265] - Containerizer startup may cause sensitive data to leak into sandbox logs.
* [MESOS-7270] - Java V1 Framwork Test failed on macOS.
* [MESOS-7272] - Unified containerizer does not support docker registry version < 2.3.
* [MESOS-7280] - Unified containerizer provisions docker image error with COPY backend.
* [MESOS-7281] - Backwards incompatible UpdateFrameworkMessage handling.
* [MESOS-7287] - Fix post-reviews.py to find `rbt.cmd` on Windows.
* [MESOS-7300] - Mesos failed to build on Windows due to error C2440: 'return': cannot convert from 'Error' to 'bool'.
* [MESOS-7311] - CopyFetcherPluginTest.FetchExistingFile.
* [MESOS-7316] - Upgrading Mesos to 1.2.0 results in some information missing from the `/flags` endpoint.
* [MESOS-7323] - Framework role tracking in allocator results in framework treated as active incorrectly.
* [MESOS-7340] - Log HTTP accesses to the /files endpoint.
* [MESOS-7346] - Agent crashes if the task name is too long.
* [MESOS-7348] - Network isolator crashes agent on startup when network interface cannot be found.
* [MESOS-7350] - Failed to pull image from Nexus Registry due to signature missing.
* [MESOS-7363] - Improver master robustness against duplicate UPIDs.
* [MESOS-7365] - Compile error with recent glibc.
* [MESOS-7372] - Improve agent re-registration robustness.
* [MESOS-7378] - Build failure with glibc 2.12.
* [MESOS-7389] - Mesos 1.2.0 crashes with pre-1.0 Mesos agents.
* [MESOS-7400] - The mesos master crashes due to an incorrect invariant check in the decoder.
* [MESOS-7427] - Registry puller cannot fetch manifests from Amazon ECR: 405 Unsupported.
* [MESOS-7430] - Per-role Suppress call implementation is broken.
* [MESOS-7431] - Registry puller cannot fetch manifests from Google GCR: 403 Forbidden.
* [MESOS-7453] - glyphicons-halflings-regular.woff2 is missing in WebUI.
* [MESOS-7456] - Compilation error on recent glibc in cgroups device subsystem.
* [MESOS-7464] - Recent Docker versions cannot be parsed by stout.
* [MESOS-7471] - Provisioner recover should not always assume 'rootfses' dir exists.
* [MESOS-7478] - Pre-1.2.x master does not work with 1.2.x agent.
* [MESOS-7484] - VersionTest.ParseInvalid aborts on Windows.
* [MESOS-7521] - Major performance regression in DRF sorter.
* [MESOS-7538] - Don't validate re-registrations that are going to be dropped.
** Documentation
* [MESOS-7005] - Add executor authentication documentation.
* [MESOS-7324] - Update documentation to reflect the addition of multi-role framework support.
** Epic
* [MESOS-1763] - Add support for frameworks to receive resources for multiple roles.
* [MESOS-6365] - Executor authentication.
* [MESOS-6627] - Allow frameworks to modify the role(s) they are subscribed to.
** Improvement
* [MESOS-970] - Upgrade bundled leveldb to 1.19.
* [MESOS-5186] - mesos.interface: Allow using protobuf 3.x.
* [MESOS-5992] - Complete the list of API Calls on the Operator HTTP API Doc.
* [MESOS-6280] - Task group executor should support command health checks.
* [MESOS-6304] - Add authentication support to the default executor.
* [MESOS-6523] - Agent cgroup assignment should precede agent initialization.
* [MESOS-6906] - Introduce a general non-interpreting task check.
* [MESOS-7021] - Consistent symlink behavior for os::stat accessors.
* [MESOS-7074] - port_mapping isolator: do not depend on /sys/class/net/<ifname>/speed.
* [MESOS-7101] - ExamplesTest.PersistentVolumeFramework failed on ASF CI.
* [MESOS-7120] - Add an Agent API call to cleanup nested container artifacts.
* [MESOS-7226] - Introduce precompiled headers (on Windows).
* [MESOS-7249] - Default executor does not support general checks.
* [MESOS-7256] - Replace Boost Type Traits leftovers with STL.
* [MESOS-7274] - Health checker does not support pause / resume.
* [MESOS-7275] - General checker does not support TCP checks.
* [MESOS-7276] - General checker does not support pause / resume.
* [MESOS-7277] - General checker does not support command checks via agent.
* [MESOS-7376] - Reduce copying of the Registry to improve Registrar performance.
* [MESOS-7387] - ZK master contender and detector don't respect zk_session_timeout option.
** Task
* [MESOS-3139] - Incorporate CMake into standard documentation.
* [MESOS-5418] - Test case: Escape containerizer command line on Windows.
* [MESOS-6022] - unit-test for port-mapper CNI plugin.
* [MESOS-6032] - Add infrastructure for unit tests in the new python-based CLI.
* [MESOS-6123] - Implement GET_AGENT call in v1 agent API.
* [MESOS-6447] - Display role weight / role quota information in the webui.
* [MESOS-6636] - Validate that tasks / executors / reservations / volumes do not mix Resource.allocation_info.roles.
* [MESOS-6637] - Validate that schedulers cannot perform operations on offers with different allocation roles.
* [MESOS-6657] - Update the webui to reflect that frameworks have multiple roles.
* [MESOS-6691] - Enable SSL in Mesos builds.
* [MESOS-6762] - Update release notes for multi-role changes.
* [MESOS-6791] - Allow to specific the device whitelist entries in cgroup devices subsystem.
* [MESOS-6808] - Refactor Docker::run to only take docker cli parameters.
* [MESOS-6855] - Add `role` section to response of /state endpoint.
* [MESOS-6886] - Add authorization tests for debug API handlers.
* [MESOS-6940] - Do not send offers to MULTI_ROLE schedulers if agent does not have MULTI_ROLE capability.
* [MESOS-6967] - Ensure offer operations can be applied for MULTI_ROLE and non-MULTI_ROLE frameworks.
* [MESOS-6992] - Remove validation against "/" characters in roles to support hierarchical roles.
* [MESOS-6995] - Update the webui to reflect hierarchical roles.
* [MESOS-6996] - Add a 'Secret' protobuf message.
* [MESOS-6997] - Add the SecretGenerator module interface.
* [MESOS-6998] - Add authentication support to agent's '/v1/executor' endpoint.
* [MESOS-6999] - Add agent support for generating and passing executor secrets.
* [MESOS-7000] - Implement a JWT SecretGenerator.
* [MESOS-7001] - Implement a JWT authenticator.
* [MESOS-7003] - Introduce a 'Principal' type.
* [MESOS-7004] - Enable multiple HTTP authenticator modules.
* [MESOS-7009] - Add a 'secret' field to the 'Environment' message.
* [MESOS-7011] - Add an '--executor_secret_key' flag to the agent.
* [MESOS-7013] - Update the authorizer interface for executor authentication.
* [MESOS-7014] - Add implicit executor authorization to local authorizer.
* [MESOS-7024] - Update the allocator to handle hierarchical roles.
* [MESOS-7026] - Update authorization / authorization-filtering to handle hierarchical roles.
* [MESOS-7037] - Prevent setting quota on nested roles not contained by parent role quota.
* [MESOS-7038] - Update quota cluster capacity heuristic for hierarchical roles.
* [MESOS-7039] - Prevent quota removal that violates parent role-child role quota containment.
* [MESOS-7047] - Update agent for hierarchical roles.
* [MESOS-7048] - Remove adjustment code within Resources::apply.
* [MESOS-7061] - Re-persist tasks/executors with allocation info during agent recovery.
* [MESOS-7063] - Add a test for a MULTI_ROLE master reregistering an old agent.
* [MESOS-7269] - Migrate setting in config.py to a TOML file.
* [MESOS-7282] - Create a table abstraction for the Mesos CLI.
* [MESOS-7320] - Remove deprecated ACL `ShutdownFramework`.
* [MESOS-7336] - Add resource provider API protobuf.
* [MESOS-7339] - Add authorization to agent executor API.
* [MESOS-7377] - Add authentication to the checker and health checker libraries.
* [MESOS-7391] - Add deprecation warning for Visual Studio 14 2015.
* [MESOS-7395] - Benchmark performance of hierarchical roles.
* [MESOS-7439] - Bump the default timeout value for docker volume driver unmount operation.
Release Notes - Mesos - Version 1.2.3
-------------------------------------
* This is a bug fix release.
All Issues:
** Bug
* [MESOS-6743] - Docker executor hangs forever if `docker stop` fails.
* [MESOS-6950] - Launching two tasks with the same Docker image simultaneously may cause a staging dir never cleaned up.
* [MESOS-7365] - Compile error with recent glibc.
* [MESOS-7378] - Build failure with glibc 2.12.
* [MESOS-7627] - Mesos slave stucks.
* [MESOS-7652] - Docker image with universal containerizer does not work if WORKDIR is missing in the rootfs.
* [MESOS-7744] - Mesos Agent Sends TASK_KILL status update to Master, and still launches task.
* [MESOS-7783] - Framework might not receive status update when a just launched task is killed immediately.
* [MESOS-7858] - Launching a nested container with namespace/pid isolation, with glibc < 2.25, may deadlock the LinuxLauncher and MesosContainerizer.
* [MESOS-7863] - Agent may drop pending kill task status updates.
* [MESOS-7865] - Agent may process a kill task and still launch the task.
* [MESOS-7872] - Scheduler hang when registration fails.
* [MESOS-7909] - Ordering dependency between 'linux/capabilities' and 'docker/runtime' isolator.
* [MESOS-7926] - Abnormal termination of default executor can cause MesosContainerizer::destroy to fail.
* [MESOS-7934] - OOM due to LibeventSSLSocket send incorrectly returning 0 after shutdown.
* [MESOS-7968] - Handle `/proc/self/ns/pid_for_children` when parsing available namespace.
* [MESOS-7969] - Handle cgroups v2 hierarchy when parsing /proc/self/cgroups.
* [MESOS-7975] - The command/default/docker executor can incorrectly send a TASK_FINISHED update even when the task is killed.
* [MESOS-7980] - Stout fails to compile with libc >= 2.26.
* [MESOS-8051] - Killing TASK_GROUP fail to kill some tasks.
* [MESOS-8080] - The default executor does not propagate missing task exit status correctly.
* [MESOS-8135] - Masters can lose track of tasks' executor IDs.
Release Notes - Mesos - Version 1.2.2
-------------------------------------
* This is a bug fix release.
All Issues:
** Bug
* [MESOS-5187] - The filesystem/linux isolator does not set the permissions of the host_path.
* [MESOS-7252] - Need to fix resource check in long-lived framework.
* [MESOS-7546] - WAIT_NESTED_CONTAINER sometimes returns 404.
* [MESOS-7569] - Allow "old" executors with half-open connections to be preserved during agent upgrade / restart.
* [MESOS-7581] - Fix interference of external Boost installations when using some unbundled dependencies.
* [MESOS-7689] - Libprocess can crash on malformed request paths for libprocess messages.
* [MESOS-7690] - The agent can crash when an unknown executor tries to register.
* [MESOS-7703] - Mesos fails to exec a custom executor when no shell is used.
* [MESOS-7728] - Java HTTP adapter crashes JVM when leading master disconnects.
* [MESOS-7770] - Persistent volume might not be mounted if there is a sandbox volume whose source is the same as the target of the persistent volume.
* [MESOS-7777] - Agent failed to recover due to mount namespace leakage in Docker 1.12/1.13.
* [MESOS-7796] - LIBPROCESS_IP isn't passed on to the fetcher.
* [MESOS-7830] - Sandbox_path volume does not have ownership set correctly.
** Improvement
* [MESOS-7540] - Add an agent flag for executor re-registration timeout.
Release Notes - Mesos - Version 1.2.1
-------------------------------------
* This is a bug fix release.
**NOTE**: In Mesos 1.2.1, the master will no longer allow 0.x agents to
register. Interoperability between 1.1+ masters and 0.x agents has never
been supported; however, it was not explicitly disallowed, either.
Starting with this release of Mesos, registration attempts by 0.x Mesos
agents will be ignored.
All Issues:
** Bug
* [MESOS-1987] - Add support for SemVer build and prerelease labels to stout.
* [MESOS-5028] - Copy provisioner cannot replace directory with symlink.
* [MESOS-5172] - Registry puller cannot fetch blobs correctly from http Redirect 3xx urls.
* [MESOS-6327] - Large docker images causes container launch failures: Too many levels of symbolic links.
* [MESOS-6951] - Docker containerizer: mangled environment when env value contains LF byte.
* [MESOS-6976] - Disallow (re-)registration attempts by old agents.
* [MESOS-7133] - mesos-fetcher fails with openssl-related output.
* [MESOS-7197] - Requesting tiny amount of CPU crashes master.
* [MESOS-7208] - Persistent volume ownership is set to root when task is running with non-root user.
* [MESOS-7210] - HTTP health check doesn't work when mesos runs with --docker_mesos_image.
* [MESOS-7232] - Add support to auto-load /dev/nvidia-uvm in the GPU isolator.
* [MESOS-7237] - Enabling cgroups_limit_swap can lead to "invalid argument" error.
* [MESOS-7261] - maintenance.html is missing during packaging.
* [MESOS-7263] - User supplied task environment variables cause warnings in sandbox stdout.
* [MESOS-7264] - Possibly duplicate environment variables should not leak values to the sandbox.
* [MESOS-7265] - Containerizer startup may cause sensitive data to leak into sandbox logs.
* [MESOS-7272] - Unified containerizer does not support docker registry version < 2.3.
* [MESOS-7280] - Unified containerizer provisions docker image error with COPY backend.
* [MESOS-7316] - Upgrading Mesos to 1.2.0 results in some information missing from the `/flags` endpoint.
* [MESOS-7346] - Agent crashes if the task name is too long.
* [MESOS-7350] - Failed to pull image from Nexus Registry due to signature missing.
* [MESOS-7366] - Agent sandbox gc could accidentally delete the entire persistent volume content.
* [MESOS-7368] - Documentation of framework role(s) in proto definition is confusing.
* [MESOS-7383] - Docker executor logs possibly sensitive parameters.
* [MESOS-7389] - Mesos 1.2.0 crashes with pre-1.0 Mesos agents.
* [MESOS-7400] - The mesos master crashes due to an incorrect invariant check in the decoder.
* [MESOS-7427] - Registry puller cannot fetch manifests from Amazon ECR: 405 Unsupported.
* [MESOS-7429] - Allow isolators to inject task-specific environment variables.
* [MESOS-7453] - glyphicons-halflings-regular.woff2 is missing in WebUI.
* [MESOS-7464] - Recent Docker versions cannot be parsed by stout.
* [MESOS-7471] - Provisioner recover should not always assume 'rootfses' dir exists.
* [MESOS-7478] - Pre-1.2.x master does not work with 1.2.x agent.
* [MESOS-7484] - VersionTest.ParseInvalid aborts on Windows.
Release Notes - Mesos - Version 1.2.0
-------------------------------------
This release contains the following new features:
* [MESOS-5931] - **Experimental** Support auto backend in Mesos Containerizer,
prefering overlayfs then aufs. Please note that the bind backend needs to be
specified explicitly through the agent flag '--image_provisioner_backend'
since it requires the sandbox already existed.
* [MESOS-6402] - **Experimental** Add rlimit support to Mesos containerizer.
The isolator adds support for setting POSIX resource limits (rlimits) for
containers launched using the Mesos containerizer. POSIX rlimits can be used
to control the resources a process can consume. See `docs/posix_rlimits.md`
for details.
* [MESOS-6419] - **Experimental** Teardown unregistered frameworks. The master
now treats recovered frameworks very similarly to frameworks that are registered
but currently disconnected. For example, recovered frameworks will be reported
via the normal "frameworks" key when querying HTTP endpoints. This means there
is no longer a concept of "orphan tasks": if the master knows about a task, the
task will be running under a framework. Similarly, "teardown" operations on
recovered frameworks will now work correctly.
* [MESOS-6460] - **Experimental** Container Attach and Exec. This feature adds
new Agent APIs for attaching a remote client to the stdin, stdout, and stderr
of a running Mesos task, as well as an API for launching new processes inside
the same container as a running Mesos task and attaching to its stdin, stdout,
and stderr. At a high level, these APIs mimic functionality similar to docker
attach and docker exec. The primary motivation for such functionality is to
enable users to debug their running Mesos tasks.
* [MESOS-6758] - **Experimental** Support 'Basic' auth docker private registry
on Mesos Containerizer. Until now, the mesos containerizer always assumed
Bearer auth, but we now also support basic auth for private registries. Please
note that the AWS ECS uses Basic authorization but it does not work yet due to
the redirect issue MESOS-5172.
Deprecations:
* [MESOS-6650] - Remove slavePreLaunchDockerEnvironmentDecorator and slavePreLaunchDockerHook.
Additional API Changes:
* [MESOS-3601] - Formalize all headers and metadata for HTTP API Event Stream
* [MESOS-6286] - If an agent restarts but fails to complete recovery
within `agent_reregister_timeout`, the master will now mark the
agent as unreachable. This mainly changes behavior in two
situations: (a) the master will now be more robust if agent recovery
hangs indefinitely (e.g., due to a container being in a bad state),
and (b) if agent recovery takes a very long time (e.g., because the
agent's work directory contains a large number of completed tasks),
the master might now mark an agent unreachable that would previously
have been able to eventually recover successfully.
* [MESOS-6419] - When a framework reregisters after master failover,
it is only allowed to change certain fields in its FrameworkInfo.
For example, changing "failover_timeout" is allowed, but changing
"role" is not. In previous Mesos releases, the same restrictions on
changes to FrameworkInfo were only enforced after framework
failover, not master failover.
* [MESOS-6670] - Authz for Agent v1 operator API
* [MESOS-6675] - Changed the allocator API to support adding inactive
frameworks. Custom allocator implementations will need to be updated.
* [MESOS-6865] - Remove the constraint of being only able to launch
2-level nested containers on Agent API.
Unresolved Critical Issues:
* [MESOS-1625] - Extra trailing CRLF being sent after the HTTP body in libprocess
* [MESOS-1718] - Command executor can overcommit the agent.
* [MESOS-2554] - Slave flaps when using --slave_subsystems that are not used for isolation.
* [MESOS-2774] - SIGSEGV received during process::MessageEncoder::encode()
* [MESOS-2842] - Update FrameworkInfo.principal on framework re-registration
* [MESOS-3533] - Unable to find and run URIs files
* [MESOS-3747] - HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string
* [MESOS-3794] - Master should not store arbitrarily sized data in ExecutorInfo.
* [MESOS-4259] - mesos HA can't delete the the redundant container on failure slave node.
* [MESOS-4297] - Executor does not shutdown when framework teardown.
* [MESOS-4642] - Mesos Agent Json API can dump binary data from log files out as invalid JSON.
* [MESOS-4996] - 'containerizer->update' will always fail after killing a docker container.
* [MESOS-5352] - Docker volume isolator cleanup can be blocked by first cleanup failure.
* [MESOS-5396] - After failover, master does not remove agents with same UPID.
* [MESOS-5849] - Agent sandboxes on Windows surpass the 260 character path length limit
* [MESOS-5859] - Some tasks are always in staged state.
* [MESOS-5989] - Libevent SSL Socket downgrade code accesses uninitialized memory / assumes single peek is sufficient.
* [MESOS-6327] - Large docker images causes container launch failures: Too many levels of symbolic links.
* [MESOS-6356] - ASF CI has interleaved logging.
* [MESOS-6615] - Running mesos-slave in the docker that leave many zombie process
* [MESOS-6623] - Re-enable tests impacted by request streaming support
* [MESOS-6632] - ContainerLogger might leak FD if container launch fails.
* [MESOS-6780] - ContentType/AgentAPIStreamingTest.AttachContainerInput test fails reliably
* [MESOS-6784] - IOSwitchboardTest.KillSwitchboardContainerDestroyed is flaky
* [MESOS-6804] - Running 'tty' inside a debug container that has a tty reports "Not a tty"
* [MESOS-6815] - Enable glog stack traces when we call things like `ABORT` on Windows
* [MESOS-6843] - Fetcher should not assume stdout/stderr in the sandbox.
* [MESOS-6913] - AgentAPIStreamingTest.AttachInputToNestedContainerSession fails on Mac OS.
* [MESOS-6974] - DefaultExecutorTest.CommitSuicideOnTaskFailure test is flaky.
* [MESOS-6986] - abort in DRFSorter::add
* [MESOS-7017] - HTTP API responses can crash the master.
* [MESOS-7050] - IOSwitchboard FDs leaked when containerizer launch fails -- leads to deadlock
* [MESOS-7099] - Quota can be exceeded due to coarse-grained offer technique.
Feature Graduations:
* None
All Experimental Features:
* [MESOS-2449] - Support group of tasks (Pod) constructs and API in Mesos.
* [MESOS-2533] - Support HTTP checks in Mesos.
* [MESOS-3094] - Mesos on Windows.
* [MESOS-3421] - Support sharing of resources across task instances.
* [MESOS-3567] - Support TCP checks in Mesos.
* [MESOS-4312] - Porting Mesos on Power (ppc64le).
* [MESOS-4355] - Implement isolator for Docker volume.
* [MESOS-4641] - Support Container Network Interface (CNI).
* [MESOS-4791] - Operator API v1.
* [MESOS-4828] - XFS disk quota isolator.
* [MESOS-5275] - Add capabilities support for mesos containerizer.
* [MESOS-5344] - Partition-aware Mesos frameworks.
* [MESOS-5788] - Added JAVA API adapter for seamless transition to new scheduler API.
* [MESOS-5931] - **NEW** Support auto backend in Mesos Containerizer.
* [MESOS-6014] - Added port mapping CNI plugin.
* [MESOS-6077] - Added a default (task group) executor.
* [MESOS-6402] - **NEW** rlimit support for Mesos containerizer
* [MESOS-6419] - **NEW** Teardown unregistered frameworks
* [MESOS-6460] - **NEW** Container Attach/Exec
* [MESOS-6758] - **NEW** Support docker registry that requires basic auth.
All Issues:
** Bug
* [MESOS-1802] - HealthCheckTest.HealthStatusChange is flaky on jenkins.
* [MESOS-2537] - AC_ARG_ENABLED checks are broken
* [MESOS-2723] - The mesos-execute tool does not support zk:// master URLs
* [MESOS-3335] - FlagsBase copy-ctor leads to dangling pointer.
* [MESOS-3932] - Silence Boost compiler warnings with CMake
* [MESOS-4601] - Don't dump stack trace on failure to bind()
* [MESOS-4695] - SlaveTest.StateEndpoint is flaky
* [MESOS-4973] - Duplicates in 'unregistered_frameworks' in /state
* [MESOS-4975] - mesos::internal::master::Slave::tasks can grow unboundedly
* [MESOS-5218] - Fetcher should not chown the entire sandbox.
* [MESOS-5303] - Add capabilities support for mesos execute cli.
* [MESOS-5662] - Call parent class `SetUpTestCase` function in our test fixtures.
* [MESOS-5821] - Clean up the thousands of compiler warnings on MSVC
* [MESOS-5835] - Audit `PATCH_CMD`; make sure all patches are being applied on Windows.
* [MESOS-5856] - Logrotate ContainerLogger module does not rotate logs when run as root with `--switch_user`.
* [MESOS-5879] - cgroups/net_cls isolator causing agent recovery issues
* [MESOS-5963] - HealthChecker should not decide when to kill tasks and when to stop performing health checks.
* [MESOS-6001] - Aufs backend cannot support the image with numerous layers.
* [MESOS-6002] - The whiteout file cannot be removed correctly using aufs backend.
* [MESOS-6010] - Docker registry puller shows decode error "No response decoded".
* [MESOS-6119] - TCP health checks are not portable.
* [MESOS-6142] - Frameworks may RESERVE for an arbitrary role.
* [MESOS-6206] - Change reconciliation to return results for in-progress removals and reregistrations
* [MESOS-6286] - Master does not remove an agent if it is responsive but not registered
* [MESOS-6288] - The default executor should maintain launcher_dir.
* [MESOS-6293] - HealthCheckTest.HealthyTaskViaHTTPWithoutType fails on some distros.
* [MESOS-6316] - CREATE of shared volumes should not be allowed by frameworks not opted in to the capability.
* [MESOS-6320] - Implement clang-tidy check to catch incorrect flags hierarchies
* [MESOS-6349] - JSON Generation breaks if other locale than C is used.
* [MESOS-6360] - The handling of whiteout files in provisioner is not correct.
* [MESOS-6380] - mesos-local failed to start without sudo
* [MESOS-6388] - Report new PARTITION_AWARE task statuses in HTTP endpoints
* [MESOS-6389] - Update webui for PARTITION_AWARE changes
* [MESOS-6409] - mesos-ps - Invalid header value
* [MESOS-6414] - cgroups isolator cleanup failed when the hierarchy is cleanup by docker daemon
* [MESOS-6419] - The 'master/teardown' endpoint should support tearing down 'unregistered_frameworks'.
* [MESOS-6420] - Mesos Agent leaking sockets when port mapping network isolator is ON
* [MESOS-6432] - Roles with quota assigned can "game" the system to receive excessive resources.
* [MESOS-6444] - Ensure single copy of shared count of total resources in role sorter.
* [MESOS-6446] - WebUI redirect doesn't work with stats from /metric/snapshot
* [MESOS-6448] - Show the leading master hostname in the webUI.
* [MESOS-6452] - Compile error in strerror.h on OSX
* [MESOS-6455] - DefaultExecutorTests fail when running on hosts without docker.
* [MESOS-6459] - PosixRLimitsIsolatorTest.TaskExceedingLimit fails on OS X
* [MESOS-6461] - Duplicate framework ids in /master/frameworks endpoint 'unregistered_frameworks'.
* [MESOS-6478] - "filesystem/linux" isolator leaks (phantom) mounts in `mount` output
* [MESOS-6483] - Check failure when a 1.1 master marking a 0.28 agent as unreachable
* [MESOS-6484] - Memory leak in `Future<T>::after()`
* [MESOS-6501] - Add a test for duplicate framework ids in "unregistered_frameworks"
* [MESOS-6504] - Use 'geteuid()' for the root privileges check.
* [MESOS-6508] - monitor/statistics error in webui when launch mesos via mesos-local
* [MESOS-6516] - Parallel test running does not respect GTEST_FILTER
* [MESOS-6519] - MasterTest.OrphanTasksMultipleAgents
* [MESOS-6520] - Make errno an explicit argument for ErrnoError.
* [MESOS-6526] - `mesos-containerizer launch --environment` exposes executor env vars in `ps`.
* [MESOS-6527] - Memory leak in the libprocess request decoder.
* [MESOS-6544] - MasterMaintenanceTest.InverseOffersFilters is flaky.
* [MESOS-6545] - TestContainerizer is not thread-safe.
* [MESOS-6566] - The Docker executor should not leak task env variables in the Docker command cmd line.
* [MESOS-6569] - MesosContainerizer/DefaultExecutorTest.KillTask/0 failing on ASF CI
* [MESOS-6576] - DefaultExecutorTest.KillTaskGroupOnTaskFailure sometimes fails in CI
* [MESOS-6588] - LinuxRootfs misses required files
* [MESOS-6597] - Include v1 Operator API protos in generated JAR and python packages.
* [MESOS-6598] - Broken Link Framework Development Page
* [MESOS-6602] - Shutdown completed frameworks when unreachable agent reregisters
* [MESOS-6604] - Uninitialized member ObjectApprover::weight_info.
* [MESOS-6606] - Reject optimized builds with libcxx before 3.9
* [MESOS-6618] - Some tests use hardcoded port numbers.
* [MESOS-6619] - Improve task management for unreachable tasks
* [MESOS-6621] - SSL downgrade path will CHECK-fail when using both temporary and persistent sockets
* [MESOS-6624] - Master WebUI does not work on Firefox 45
* [MESOS-6625] - Expose container id in ContainerStatus in DockerContainerizer.
* [MESOS-6640] - mesos-local doesn't hande --work_dir correctly.
* [MESOS-6646] - StreamingRequestDecoder incompletely initializes its http_parser_settings
* [MESOS-6647] - Cyclic header dependency between libprocess' defer.hpp and executor.hpp
* [MESOS-6652] - Perf version not correctly parsed on Fedora 24 (and probably others)
* [MESOS-6653] - Overlayfs backend may fail to mount the rootfs if both container image and image volume are specified.
* [MESOS-6654] - Duplicate image layer ids may make the backend failed to mount rootfs.
* [MESOS-6658] - Mesos tests generated with cmake build fail to unload libraries properly
* [MESOS-6665] - io::redirect might cause stack overflow.
* [MESOS-6666] - HttpServeTest.Discard failed on OSX sierra
* [MESOS-6672] - Class DynamicLibrary's default copy constructor can lead to inconsistent state
* [MESOS-6676] - Always re-link with scheduler during re-registration.