This is the model of Slider and YARN for the rest of the specification.
A File System HDFS
represents a Hadoop FileSystem -either HDFS or another File System which spans the cluster. There are also other filesystems that can act as sources of data that is then copied into HDFS. These will be marked as FS
or with the generic FileSystem
type.
There's ongoing work in HADOOP-9361 to define the Hadoop Filesytem APIs using the same notation as here, the latest version being available on github Two key references are
The model and its predicates and invariants will be used in these specifications.
From the perspective of YARN application, The YARN runtime is a state, YARN
, comprised of: (Apps, Queues, Nodes)
Apps: Map[AppId, ApplicationReport]
An application has a name, an application report and a list of outstanding requests
App: (Name, report: ApplicationReport, Requests:List[AmRequest])
An application report contains a mixture of static and dynamic state of the application and the AM.
ApplicationReport: AppId, Type, User, YarnApplicationState, AmContainer, RpcPort, TrackingURL,
YARN applications have a number of states. These are ordered such that if the state.ordinal() > RUNNING.ordinal()
then the application has entered an exit state.
YarnApplicationState : [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING, FINISHED, FAILED, KILLED ]
AMs can request containers to be added or released
AmRequest = { add-container(priority, requirements), release(containerId)}
Job queues are named queues of job requests; there is always a queue called "default"
Queues: Map[String:Queue] Queue: List[Requests] Request = { launch(app-name, app-type, requirements, context) } Context: (localized-resources: Map[String,URL], command)
This is doesn‘t completely model the cluster from the AM perspective -there’s no notion of node operations (launching code in a container) or events coming from YARN.
The Nodes
structure models the nodes in a cluster
Nodes: Map[nodeID,(name, containers:List[Container])]
A container contains some state
Container: (containerId, appId, context)
The containers in a cluster are the aggregate set of all containers across all nodes
def containers(YARN) = [c for n in keys(YARN.Nodes) for c in YARN.Nodes[n].Containers ]
The containers of an application are all containers that are considered owned by it,
def app-containers(YARN, appId: AppId) = [c in containers(YARN) where c.appId == appId ]
def applications(YARN, type) = [ app.report for app in YARN.Apps.values where app.report.Type == type] def user-applications(YARN, type, user) [a in applications(YARN, type) where: a.User == user]
Applications are launched and executed on hosts computers: either client machines or nodes in the cluster, these have their own state which may need modeling
HostState: Map[String, String]
A key part of the host state is actually the identity of the current user, which is used to define the location of the persistent state of the cluster -including its data, and the identity under which a deployed container executes.
In a secure cluster, this identity is accompanied by kerberos tokens that grant the caller access to the filesystem and to parts of YARN itself.
This specification does not currently explicitly model the username and credentials. If it did they would be used throughout the specification to bind to a YARN or HDFS instance.
UserGroupInformation.getCurrentUser(): UserGroupInformation
Returns the current user information. This information is immutable and fixed for the duration of the process.
A valid cluster name is a name of length > 1 which follows the internet hostname scheme of letter followed by letter or digit
def valid-cluster-name(c) = len(c)> 0 and c[0] in ['a'..'z'] and c[1] in (['a'..'z'] + ['-'] + ['0..9'])
A Slider cluster's persistent state is stored in a path
def cluster-path(FS, clustername) = user-home(FS) + ["clusters", clustername] def cluster-json-path(FS, clustername) = cluster-path(FS, clustername) + ["cluster.json"] def original-conf-path(FS, clustername) = cluster-path(FS, clustername) + ["original"] def generated-conf-path(FS, clustername) = cluster-path(FS, clustername) + ["generated"] def data-path(FS, clustername) = cluster-path(FS, clustername) + ["data"]
When a cluster is built/created the specified original configuration directory is copied to original-conf-path(FS, clustername)
; this is patched for the specific instance bindings and saved into generated-conf-path(FS, clustername)
.
A cluster exists if all of these paths are found:
def cluster-exists(FS, clustername) = is-dir(FS, cluster-path(FS, clustername)) and is-file(FS, cluster-json-path(FS, clustername)) and is-dir(FS, original-conf-path(FS, clustername)) and generated-conf-path(FS, original-conf-path(FS, clustername))
A cluster is considered running
if there is a Slider application type belonging to the current user in one of the states {NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING}
.
def final-yarn-states = {FINISHED, FAILED, KILLED } def slider-app-instances(YARN, clustername, user) = [a in user-applications(YARN, "slider", user) where: and a.Name == clustername] def slider-app-running-instances(YARN, clustername, user) = [a in slider-app-instances(YARN, user, clustername) where: not a.YarnApplicationState in final-yarn-state] def slider-app-running(YARN, clustername, user) = [] != slider-app-running-instances(YARN, clustername, user) def slider-app-live-instances(YARN, clustername, user) = [a in slider-app-instances(YARN, user, clustername) where: a.YarnApplicationState == RUNNING] def slider-app-live(YARN, clustername, user) = [] != slider-app-live-instances(YARN, clustername, user)
There must never be more than one instance of the same Slider cluster running:
forall a in user-applications(YARN, "slider", user): len(slider-app-running-instances(YARN, a.Name, user)) <= 1
There may be multiple instances in a finished state, and one running instance alongside multiple finished instances -the applications that work with Slider MUST select a running cluster ahead of any terminated clusters.
The containers of a slider application are the set of containers of that application
def slider-app-containers(YARN, clustername, user) = app-containers(YARN, appid where appid = slider-app-running-instances(YARN, clustername, user)[0])
An application is accepting RPC requests for a given protocol if there is a port binding defined and it is possible to authenticate a connection using the specified protocol
def rpc-connection(appReport, protocol) = appReport.host != null appReport.rpcPort != 0 and RPC.getProtocolProxy(appReport.host, appReport.rpcPort, protocol)
Being able to open an RPC port is the strongest definition of liveness possible to make: if the AM responds to RPC operations, it is doing useful work.
The cluster.json
file of a cluster configures Slider to deploy the application.
A Cluster Description is well-defined if it is valid JSON and required properties are present
OBSOLETE
Irrespective of specific details for deploying the Slider AM or any provider-specific role instances, a Cluster Description defined in a cluster.json
file at the path cluster-json-path(FS, clustername)
is well-defined if
options
and roles
version
element matches a supported versionoptions/cluster.application.home
and options/cluster.application.image.path
must exist.This specification is very vague here to avoid duplication: the cluster description structure is currently implicitly defined in org.apache.slider.api.ClusterDescription
Currently Slider ignores unknown elements during parsing. This may be changed.
The test for this state does not refer to the cluster filesystem
A Cluster Description defines a deployable cluster if it is well-defined cluster and the contents contain valid information to deploy a cluster
This defines how a cluster description is valid in the extends the valid configuration with
The entry name
must match a supported provider
Any elements that name the cluster match the cluster name as defined by the path to the cluster:
originConfigurationPath == original-conf-path(FS, clustername) generatedConfigurationPath == generated-conf-path(FS, clustername) dataPath == data-path(FS, clustername)
The paths defined in originConfigurationPath
, generatedConfigurationPath
and dataPath
must all exist.
options/zookeeper.path
must be defined and refer to a path in the ZK cluster defined by (options/zookeeper.hosts
, zookeeper.port)
to which the user has write access (required by HBase and Accumulo)
If options/cluster.application.image.path
is defined, it must exist and be readable by the user.
It must declare a type that maps to a provider entry in the Slider client's XML configuration:
len(clusterspec["type"]) > 0 clientconfig["slider.provider."+ clusterspec["type"]] != null
That entry must map to a class on the classpath which can be instantiated and cast to SliderProviderFactory
.
let classname = clientconfig["slider.provider."+ clusterspec["type"]] (Class.forName(classname).newInstance()) instanceof SliderProviderFactory
A provider considers a specification valid if its own validation logic is satisfied. This normally consists of rules about the number of instances of different roles; it may include other logic.