A Hoya Cluster Specification is a JSON file which describes a cluster to Hoya: what application is to be deployed, which archive file contains the application, specific cluster-wide options, and options for the individual roles in a cluster.
These are options read by the Hoya Application Master; the program deployed in the YARN cluster to start all the other roles.
When the AM is started, all entries in the cluster-wide are loaded as Hadoop configuration values.
Only those related to Hadoop, the filesystem in use, YARN and hoya will be used; others are likely to be ignored.
Cluster wide options are used to configure the application itself.
These are specified at the command line with the -O key=value
syntax
All options beginning with the prefix site.
are converted into site XML options for the specific application (assuming the application uses a site XML configuration file)
Standard keys are defined in the class org.apache.hoya.api.OptionKeys
.
hoya.test
A boolean value to indicate this is a test run, not a production run. In this mode Hoya opts to fail fast, rather than retry container deployments when they fail. It is primarily used for internal tests.
hoya.container.failure.shortlife
An integer stating the time in milliseconds before which a failed container is considered ‘short lived’.
A failure of a short-lived container is treated as a sign of a problem with the role configuration and/or another aspect of the Hoya cluster -or a problem with the specific node on which the attempt to run the container was made.
hoya.container.failure.threshold
An integer stating the number of failures tolerated in a single role before the cluster is considered to have failed.
hoya.am.monitoring.enabled
A boolean flag to indicate whether application-specific health monitoring should take place. Until this monitoring is completed the option is ignored -it is added as false
by default on clusters created.
A Hoya application consists of the Hoya Application Master, “the AM”, which manages the cluster, and a number of instances of the “roles” of the actual application.
For HBase the roles are master
and worker
; Accumulo has more.
For every role, the cluster specification can define
role.instances
role.additional.args
This argument is meant to provide a configuration-based static option that is provided to every instance of the given role. For example, this is useful in providing a binding address to Accumulo's Monitor process.
Users can override the option on the hoya executable using the roleopt argument:
--roleopt monitor role.additional.args "--address 127.0.0.1"
yarn.memory
The amount of memory in MB for the YARN container hosting that role. Default “256”.
The special value max
indicates that Hoya should request the maximum value that YARN allows containers to have -a value determined dynamically after the cluster starts.
Examples:
--roleopt master yarn.memory 2048 --roleopt worker yarn.memory max
If a YARN cluster is configured to set process memory limits via the OS, and the application tries to use more memory than allocated, it will fail with the exit code “143”.
yarn.vcores
Number of “Cores” for the container hosting a role. Default value: “1”
The special value max
indicates that Hoya should request the maximum value that YARN allows containers to have -a value determined dynamically after the cluster starts.
As well as being able to specify the numeric values of memory and cores in role, via the --roleopt
argument, you can now ask for the maximum allowed value by using the parameter max
Examples:
--roleopt master yarn.vcores 2 --roleopt master yarn.vcores max
app.infoport
The TCP socket port number to use for the master node web UI. This is translated into an application-specific site.xml property for both Accumulo and HBase.
If set to a number other than the default, “0”, then if the given port is in use, the role instance will not start. This will occur if YARN is already running a master node on that server, or if another application is using the same TCP port.
jvm.heapsize
Heapsize as a JVM option string, such as "256M"
or "2G"
--roleopt worker jvm.heapsize 8G
This is not correlated with the YARN memory -changes in the YARN memory allocation are not reflected in the JVM heapsize -and vice versa.
All role options beginning with env.
are automatically converted to environment variables which will be set for all instances of that role.
--roleopt worker env.MALLOC_ARENA 4
Here are options specific to Accumulo clusters.
zk.home
Location of Zookeeper on the target machine. This is needed by the Accumulo startup scripts.
hadoop.home
Location of Hadoop on the target machine. This is needed by the Accumulo startup scripts.
accumulo.password
This is the password used to control access to the accumulo data. A random password (from a UUID, hence very low-entropy) is chosen when the cluster is created. A more rigorous password can be set on the command line at the time of cluster creation.
The Hoya Application Master has its own role, hoya
, which can also be configured with role options. Currently only JVM and YARN options are supported:
--roleopt hoya jvm.heapsize 256M --roleopt hoya jvm.opts "-Djvm.property=true" --roleopt hoya yarn.memory 512
Normal memory requirements of the AM are low, except in the special case of starting an accumulo cluster for the first time. In this case, bin\accumulo init
needs to be run: the extra memory requirements of the accumulo process need to be included in the hoya role's yarn.memory
values.