src/docs/src/documentation/content/xdocs/capacity_scheduler.xml - hadoop-mapreduce - Git at Google

 <?xml version="1.0"?>
 <!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
 -->

 <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">

 <document>

   <header>
     <title>Capacity Scheduler</title>
   </header>

   <body>

     <section>
       <title>Purpose</title>

       <p>This document describes the Capacity Scheduler, a pluggable
       MapReduce scheduler for Hadoop which provides a way to share
       large clusters.</p>
     </section>

     <section>
       <title>Features</title>

       <p>The Capacity Scheduler supports the following features:</p>
       <ul>
         <li>
           Multiple queues, possibly hierarchical/recursive, where a job is
           submitted to a queue.
         </li>
         <li>
           Queues are allocated a fraction of the capacity of the grid in the
           sense that a certain capacity of resources will be at their
           disposal. All jobs submitted to a queue will have access to the
           capacity allocated to the queue.
         </li>
         <li>
           Free resources can be allocated to any queue beyond it's capacity.
           When there is demand for these resources from queues running below
           capacity at a future point in time, as tasks scheduled on these
           resources complete, they will be assigned to jobs on queues
           running below the capacity.
         </li>
         <li>
           Queues optionally support job priorities (disabled by default).
         </li>
         <li>
           Within a queue, jobs with higher priority will have access to the
           queue's resources before jobs with lower priority. However, once a
           job is running, it will not be preempted for a higher priority job,
           though new tasks from the higher priority job will be
           preferentially scheduled.
         </li>
         <li>
           In order to prevent one or more users from monopolizing its
           resources, each queue enforces a limit on the percentage of
           resources allocated to a user at any given time, if there is
           competition for them.
         </li>
         <li>
           Queues can use idle resources of other queues. In order to prevent
           monopolizing of resources by particular queues, each queue can be
           set a cap on the maximum number of resources it can expand to in
           the presence of idle resources in other queues of the cluster.
         </li>
         <li>
           Support for memory-intensive jobs, wherein a job can optionally
           specify higher memory-requirements than the default, and the tasks
           of the job will only be run on TaskTrackers that have enough memory
           to spare.
         </li>
         <li>
           Support for refreshing/reloading some of the queue-properties
           without restarting the JobTracker, taking advantage of the
           <a href="ext:cluster-setup/RefreshingQueueConfiguration">
           queue-refresh</a> feature in the framework.
         </li>
       </ul>
     </section>

     <section>
       <title>Picking a Task to Run</title>

       <p>Note that many of these steps can be, and will be, enhanced over time
       to provide better algorithms.</p>

       <p>Whenever a TaskTracker is free, the Capacity Scheduler picks
       a queue which has most free space (whose ratio of # of running slots to
       capacity is the lowest).</p>

       <p>Once a queue is selected, the Scheduler picks a job in the queue. Jobs
       are sorted based on when they're submitted and their priorities (if the
       queue supports priorities). Jobs are considered in order, and a job is
       selected if its user is within the user-quota for the queue, i.e., the
       user is not already using queue resources above his/her limit. The
       Scheduler also makes sure that there is enough free memory in the
       TaskTracker to tun the job's task, in case the job has special memory
       requirements.</p>

       <p>Once a job is selected, the Scheduler picks a task to run. This logic
       to pick a task remains unchanged from earlier versions.</p>

       <section>
         <title>Scheduling Tasks Considering Memory Requirements</title>

         <p>
         The Capacity Scheduler supports scheduling of tasks on a
         TaskTracker based on a job's virtual memory requirements and
         the availability
         of enough virtual memory on the TaskTracker node. By doing so, it
         simplifies the virtual memory monitoring function on the
         TaskTracker node, described in the section on
         <a href="ext:cluster-setup/ConfiguringMemoryParameters">
         Monitoring Task Memory Usage</a> in the Cluster Setup guide.
         Refer to that section for more details on how memory for
         MapReduce tasks is handled.
         </p>

         <p>
         Virtual memory based task scheduling uses the same parameters as
         the memory monitoring function of the TaskTracker, and is enabled
         along with virtual memory monitoring. When enabled, the scheduler
         ensures that a task is scheduled on a TaskTracker only when the
         virtual memory required by the map or reduce task can be assured
         by the TaskTracker. That is, the task is scheduled only if the
         following constraint is satisfied:<br/>
         <code>
         Job's mapreduce.{map|reduce}.memory.mb of the job &lt;=
           total virtual memory for all map or reduce tasks on the TaskTracker -
             total virtual memory required for all running map or reduce tasks
             on the TaskTracker
         </code><br/>
         </p>

         <p>
         When a task at the front of the scheduler's queue cannot be scheduled
         on a TaskTracker due to insufficient memory, the scheduler creates
         a virtual <em>reservation</em> for this task. This can continue
         for all pending tasks on a job, subject to other capacity constraints.
         Once all tasks are either scheduled or have reservations, the
         scheduler will proceed to schedule other jobs's tasks that are not
         necessarily at the front of the queue, but meet memory constraints
         of the TaskTracker.
         By following this reservation procedure of reserving just enough
         TaskTrackers, the scheduler balances between not starving jobs with
         high memory requirements and under-utilizing cluster resources.
         </p>

         <p>
         Tasks of jobs that require more virtual memory than the
         per slot <code>mapreduce.cluster.{map|reduce}memory.mb</code>
         value, are treated as occupying more than one slot, and account
         for a corresponding increased capacity usage for their queue.
         The number of slots they occupy is determined as:<br/>
         <code>
         Number of slots for a task = mapreduce.{map|reduce}.memory.mb /
         mapreduce.cluster.{map|reduce}memory.mb
         </code><br/>
         However, special tasks run by the framework like setup
         and cleanup tasks do not count for more than 1 slot,
         irrespective of their job's memory requirements.
         </p>

       </section>

     </section>

     <section>
       <title>Installation</title>

         <p>The Capacity Scheduler is available as a JAR file in the Hadoop
         tarball under the <em>contrib/capacity-scheduler</em> directory. The name of
         the JAR file would be on the lines of hadoop-*-capacity-scheduler.jar.</p>
         <p>You can also build the Scheduler from source by executing
         <em>ant package</em>, in which case it would be available under
         <em>build/contrib/capacity-scheduler</em>.</p>
         <p>To run the Capacity Scheduler in your Hadoop installation, you need
         to put it on the <em>CLASSPATH</em>. The easiest way is to copy the
         <code>hadoop-*-capacity-scheduler.jar</code> from
         to <code>HADOOP_HOME/lib</code>. Alternatively, you can modify
         <em>HADOOP_CLASSPATH</em> to include this jar, in
         <code>conf/hadoop-env.sh</code>.</p>
     </section>

     <section>
       <title>Configuration</title>

       <section>
         <title>Using the Capacity Scheduler</title>
         <p>
           To make the Hadoop framework use the Capacity Scheduler, set up
           the following property in the site configuration:</p>
           <table>
             <tr>
               <th>Name</th>
               <th>Value</th>
             </tr>
             <tr>
               <td>mapreduce.jobtracker.taskscheduler</td>
               <td>org.apache.hadoop.mapred.CapacityTaskScheduler</td>
             </tr>
           </table>
       </section>

       <section>
         <title>Setting Up Queues</title>
         <p>
           You can define multiple, possibly hierarchical queues to which users
           can submit jobs with the Capacity Scheduler. To define queues,
           various properties should be set in two configuration files -
           <a href="ext:cluster-setup/mapred-queues.xml">mapred-queues.xml</a>
           and
           <a href="ext:capacity-scheduler-conf">conf/capacity-scheduler.xml</a>
           .</p>
           <p><em>conf/capacity-scheduler.xml</em> can be used to configure (1)
           job-initialization-poller related properties and (2) the
           default values for various properties in the queues</p>
           <p><em>conf/mapred-queues.xml</em> contains the actual queue
           configuration including (1) framework specific properties like ACLs
           for controlling which users or groups have access to the queues and
           state of the queues and (2) the scheduler specific properties for
           each queue. If any of these scheduler specific properties are
           missing and not configured for a queue, then the properties in
           <em>conf/capacity-scheduler.xml</em> are used to set default values.
           More details about the properties that can be configured, and their
           semantics is mentioned below. Also, a default template for
           mapred-queues.xml tailored for using with
           Capacity-scheduler can be found
           <a href="ext:mapred-queues-capacity-scheduler">here</a>.</p>
       </section>

       <section>
         <title>Configuring Properties for Queues</title>

         <p>The Capacity Scheduler can be configured with several properties
         for each queue that control the behavior of the Scheduler. As
         described above, this scheduler specific configuration has to be in
         the <em>conf/mapred-queues.xml</em> along with the rest of the
         framework specific configuration. By
         default, the configuration is set up for one queue, named
         <em>default</em>.</p>
         <p>To specify a property for a specific queue that is defined in the
         mapred-queues.xml, you should set the corresponding property in a
         &lt;property&gt; tag explained
         <a href="ext:cluster-setup/property_tag">here</a>.
         </p>

         <p>The properties defined for queues and their descriptions are
         listed in the table below:</p>

         <table>
           <tr>
           	<th>Name</th>
             <th>
             	<a href="ext:commands-manual/RefreshQueues">
             	Refresh-able?</a>
            	</th>
             <th>Applicable to?</th>
             <th>Description</th>
           </tr>
           <tr>
           	<td>capacity</td>
           	<td>Yes</td>
           	<td>Container queues as well as leaf queues</td>
           	<td>For a root-level container queue, this is the percentage of the
           	number of slots in the cluster that will be available for all its
           	immediate children together. For a root-level leaf-queue, this is
           	the percentage of the number of slots in the cluster that will be
           	available for all its jobs.	For a non-root level container queue,
           	this is the percentage of the number of slots in its parent queue
           	that will be available for all its	children together. For a
           	non-root-level leaf queue, this	is the percentage of the number of
           	slots in its parent queue that will be available for jobs in this
           	queue. The sum of capacities for all children of a container queue
           	should be less than or equal 100. The sum of capacities of all the
           	root-level queues should be less than or equal to 100.
             </td>
           </tr>
           <tr>
             <td>maximum-capacity</td>
             <td>Yes</td>
           	<td>Container queues as well as leaf queues</td>
             <td>
 	          A limit in percentage beyond which a non-root-level queue cannot use
 	          the capacity of its parent queue; for a root-level queue, this is
 	          the limit in percentage beyond which it cannot use the
 	          cluster-capacity. This property provides a means to limit how much
 	          excess capacity a queue can use.  It can be used to prevent queues
 	          with long running jobs from occupying more than a certain percentage
 	          of the parent-queue or the cluster, which, in the absence of
 	          pre-emption, can lead to capacity guarantees of other queues getting
 	          affected.

 	          The maximum-capacity of a queue can only be greater than or equal to
 	          its capacity. By default, there is no limit for a queue. For a
 	          non-root-level queue this means it can occupy till the
 	          maximum-capacity of its parent, for a root-level queue, it means that
 	          it can occupy the whole cluster. A value of 100 implies that a queue
 	          can use the complete capacity of its parent, or the complete
 	          cluster-capacity in case of root-level-queues.
             </td>
           </tr>
           <tr>
           	<td>supports-priority</td>
           	<td>No</td>
           	<td>Leaf queues only</td>
           	<td>If true, priorities of jobs will be taken into account in scheduling
           	decisions.
           	</td>
           </tr>
           <tr>
           	<td>minimum-user-limit-percent</td>
           	<td>Yes</td>
           	<td>Leaf queues only</td>
           	<td>Each queue enforces a limit on the percentage of resources
           	allocated to a user at any given time, if there is competition
           	for them. This user limit can vary between a minimum and maximum
           	value. The former depends on the number of users who have submitted
           	jobs, and the latter is set to this property value. For example,
           	suppose the value of this property is 25. If two users have
           	submitted jobs to a queue, no single user can use more than 50%
           	of the queue resources. If a third user submits a job, no single
           	user can use more than 33% of the queue resources. With 4 or more
           	users, no user can use more than 25% of the queue's resources. A
           	value of 100 implies no user limits are imposed.
 	        </td>
           </tr>
           <tr>
             <td>maximum-initialized-jobs-per-user</td>
             <td>Yes</td>
           	<td>Leaf queues only</td>
             <td>
               Maximum number of jobs which are allowed to be pre-initialized for
               a particular user in the queue. Once a job is scheduled, i.e.
               it starts running, then that job is not considered
               while scheduler computes the maximum job a user is allowed to
               initialize.
             </td>
           </tr>
         </table>
         <p>See  <a href="ext:mapred-queues-capacity-scheduler">
         this configuration file</a> for a default configuration of queues in
         capacity-scheduler.</p>
       </section>

    <section>
         <title>Job Initialization Parameters</title>
         <p>Capacity scheduler lazily initializes the jobs before they are
         scheduled, for reducing the memory footprint on jobtracker.
         Following are the parameters, by which you can control the laziness
         of the job initialization. The following parameters can be
         configured in capacity-scheduler.xml:
         </p>

         <table>
           <tr><th>Name</th><th>Description</th></tr>
           <tr>
             <td>
               mapred.capacity-scheduler.init-poll-interval
             </td>
             <td>
               Amount of time in miliseconds which is used to poll the scheduler
               job queue to look for jobs to be initialized.
             </td>
           </tr>
           <tr>
             <td>
               mapred.capacity-scheduler.init-worker-threads
             </td>
             <td>
               Number of worker threads which would be used by Initialization
               poller to initialize jobs in a set of queue. If number mentioned
               in property is equal to number of job queues then a thread is
               assigned jobs from one queue. If the number configured is lesser than
               number of queues, then a thread can get jobs from more than one queue
               which it initializes in a round robin fashion. If the number configured
               is greater than number of queues, then number of threads spawned
               would be equal to number of job queues.
             </td>
           </tr>
         </table>
       </section>
       <section>
         <title>Reviewing the Configuration of the Capacity Scheduler</title>
         <p>
           Once the installation and configuration is completed, you can review
           it after starting the MapReduce cluster from the admin UI.
         </p>
         <ul>
           <li>Start the MapReduce cluster as usual.</li>
           <li>Open the JobTracker web UI.</li>
           <li>The queues you have configured should be listed under the <em>Scheduling
               Information</em> section of the page.</li>
           <li>The properties for the queues should be visible in the <em>Scheduling
               Information</em> column against each queue.</li>
         </ul>
       </section>

    </section>
   </body>

 </document>
	<?xml version="1.0"?>
	<!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->

	<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">

	<document>

	<header>
	<title>Capacity Scheduler</title>
	</header>

	<body>

	<section>
	<title>Purpose</title>

	<p>This document describes the Capacity Scheduler, a pluggable
	MapReduce scheduler for Hadoop which provides a way to share
	large clusters.</p>
	</section>

	<section>
	<title>Features</title>

	<p>The Capacity Scheduler supports the following features:</p>
	<ul>
	<li>
	Multiple queues, possibly hierarchical/recursive, where a job is
	submitted to a queue.
	</li>
	<li>
	Queues are allocated a fraction of the capacity of the grid in the
	sense that a certain capacity of resources will be at their
	disposal. All jobs submitted to a queue will have access to the
	capacity allocated to the queue.
	</li>
	<li>
	Free resources can be allocated to any queue beyond it's capacity.
	When there is demand for these resources from queues running below
	capacity at a future point in time, as tasks scheduled on these
	resources complete, they will be assigned to jobs on queues
	running below the capacity.
	</li>
	<li>
	Queues optionally support job priorities (disabled by default).
	</li>
	<li>
	Within a queue, jobs with higher priority will have access to the
	queue's resources before jobs with lower priority. However, once a
	job is running, it will not be preempted for a higher priority job,
	though new tasks from the higher priority job will be
	preferentially scheduled.
	</li>
	<li>
	In order to prevent one or more users from monopolizing its
	resources, each queue enforces a limit on the percentage of
	resources allocated to a user at any given time, if there is
	competition for them.
	</li>
	<li>
	Queues can use idle resources of other queues. In order to prevent
	monopolizing of resources by particular queues, each queue can be
	set a cap on the maximum number of resources it can expand to in
	the presence of idle resources in other queues of the cluster.
	</li>
	<li>
	Support for memory-intensive jobs, wherein a job can optionally
	specify higher memory-requirements than the default, and the tasks
	of the job will only be run on TaskTrackers that have enough memory
	to spare.
	</li>
	<li>
	Support for refreshing/reloading some of the queue-properties
	without restarting the JobTracker, taking advantage of the
	<a href="ext:cluster-setup/RefreshingQueueConfiguration">
	queue-refresh</a> feature in the framework.
	</li>
	</ul>
	</section>

	<section>
	<title>Picking a Task to Run</title>

	<p>Note that many of these steps can be, and will be, enhanced over time
	to provide better algorithms.</p>

	<p>Whenever a TaskTracker is free, the Capacity Scheduler picks
	a queue which has most free space (whose ratio of # of running slots to
	capacity is the lowest).</p>

	<p>Once a queue is selected, the Scheduler picks a job in the queue. Jobs
	are sorted based on when they're submitted and their priorities (if the
	queue supports priorities). Jobs are considered in order, and a job is
	selected if its user is within the user-quota for the queue, i.e., the
	user is not already using queue resources above his/her limit. The
	Scheduler also makes sure that there is enough free memory in the
	TaskTracker to tun the job's task, in case the job has special memory
	requirements.</p>

	<p>Once a job is selected, the Scheduler picks a task to run. This logic
	to pick a task remains unchanged from earlier versions.</p>

	<section>
	<title>Scheduling Tasks Considering Memory Requirements</title>

	<p>
	The Capacity Scheduler supports scheduling of tasks on a
	TaskTracker based on a job's virtual memory requirements and
	the availability
	of enough virtual memory on the TaskTracker node. By doing so, it
	simplifies the virtual memory monitoring function on the
	TaskTracker node, described in the section on
	<a href="ext:cluster-setup/ConfiguringMemoryParameters">
	Monitoring Task Memory Usage</a> in the Cluster Setup guide.
	Refer to that section for more details on how memory for
	MapReduce tasks is handled.
	</p>

	<p>
	Virtual memory based task scheduling uses the same parameters as
	the memory monitoring function of the TaskTracker, and is enabled
	along with virtual memory monitoring. When enabled, the scheduler
	ensures that a task is scheduled on a TaskTracker only when the
	virtual memory required by the map or reduce task can be assured
	by the TaskTracker. That is, the task is scheduled only if the
	following constraint is satisfied:<br/>
	<code>
	Job's mapreduce.{map\|reduce}.memory.mb of the job <=
	total virtual memory for all map or reduce tasks on the TaskTracker -
	total virtual memory required for all running map or reduce tasks
	on the TaskTracker
	</code><br/>
	</p>

	<p>
	When a task at the front of the scheduler's queue cannot be scheduled
	on a TaskTracker due to insufficient memory, the scheduler creates
	a virtual <em>reservation</em> for this task. This can continue
	for all pending tasks on a job, subject to other capacity constraints.
	Once all tasks are either scheduled or have reservations, the
	scheduler will proceed to schedule other jobs's tasks that are not
	necessarily at the front of the queue, but meet memory constraints
	of the TaskTracker.
	By following this reservation procedure of reserving just enough
	TaskTrackers, the scheduler balances between not starving jobs with
	high memory requirements and under-utilizing cluster resources.
	</p>

	<p>
	Tasks of jobs that require more virtual memory than the
	per slot <code>mapreduce.cluster.{map\|reduce}memory.mb</code>
	value, are treated as occupying more than one slot, and account
	for a corresponding increased capacity usage for their queue.
	The number of slots they occupy is determined as:<br/>
	<code>
	Number of slots for a task = mapreduce.{map\|reduce}.memory.mb /
	mapreduce.cluster.{map\|reduce}memory.mb
	</code><br/>
	However, special tasks run by the framework like setup
	and cleanup tasks do not count for more than 1 slot,
	irrespective of their job's memory requirements.
	</p>

	</section>

	</section>

	<section>
	<title>Installation</title>

	<p>The Capacity Scheduler is available as a JAR file in the Hadoop
	tarball under the <em>contrib/capacity-scheduler</em> directory. The name of
	the JAR file would be on the lines of hadoop-*-capacity-scheduler.jar.</p>
	<p>You can also build the Scheduler from source by executing
	<em>ant package</em>, in which case it would be available under
	<em>build/contrib/capacity-scheduler</em>.</p>
	<p>To run the Capacity Scheduler in your Hadoop installation, you need
	to put it on the <em>CLASSPATH</em>. The easiest way is to copy the
	<code>hadoop-*-capacity-scheduler.jar</code> from
	to <code>HADOOP_HOME/lib</code>. Alternatively, you can modify
	<em>HADOOP_CLASSPATH</em> to include this jar, in
	<code>conf/hadoop-env.sh</code>.</p>
	</section>

	<section>
	<title>Configuration</title>

	<section>
	<title>Using the Capacity Scheduler</title>
	<p>
	To make the Hadoop framework use the Capacity Scheduler, set up
	the following property in the site configuration:</p>
	<table>
	<tr>
	<th>Name</th>
	<th>Value</th>
	</tr>
	<tr>
	<td>mapreduce.jobtracker.taskscheduler</td>
	<td>org.apache.hadoop.mapred.CapacityTaskScheduler</td>
	</tr>
	</table>
	</section>

	<section>
	<title>Setting Up Queues</title>
	<p>
	You can define multiple, possibly hierarchical queues to which users
	can submit jobs with the Capacity Scheduler. To define queues,
	various properties should be set in two configuration files -
	<a href="ext:cluster-setup/mapred-queues.xml">mapred-queues.xml</a>
	and
	<a href="ext:capacity-scheduler-conf">conf/capacity-scheduler.xml</a>
	.</p>
	<p><em>conf/capacity-scheduler.xml</em> can be used to configure (1)
	job-initialization-poller related properties and (2) the
	default values for various properties in the queues</p>
	<p><em>conf/mapred-queues.xml</em> contains the actual queue
	configuration including (1) framework specific properties like ACLs
	for controlling which users or groups have access to the queues and
	state of the queues and (2) the scheduler specific properties for
	each queue. If any of these scheduler specific properties are
	missing and not configured for a queue, then the properties in
	<em>conf/capacity-scheduler.xml</em> are used to set default values.
	More details about the properties that can be configured, and their
	semantics is mentioned below. Also, a default template for
	mapred-queues.xml tailored for using with
	Capacity-scheduler can be found
	<a href="ext:mapred-queues-capacity-scheduler">here</a>.</p>
	</section>

	<section>
	<title>Configuring Properties for Queues</title>

	<p>The Capacity Scheduler can be configured with several properties
	for each queue that control the behavior of the Scheduler. As
	described above, this scheduler specific configuration has to be in
	the <em>conf/mapred-queues.xml</em> along with the rest of the
	framework specific configuration. By
	default, the configuration is set up for one queue, named
	<em>default</em>.</p>
	<p>To specify a property for a specific queue that is defined in the
	mapred-queues.xml, you should set the corresponding property in a
	<property> tag explained
	<a href="ext:cluster-setup/property_tag">here</a>.
	</p>

	<p>The properties defined for queues and their descriptions are
	listed in the table below:</p>

	<table>
	<tr>
	<th>Name</th>
	<th>
	<a href="ext:commands-manual/RefreshQueues">
	Refresh-able?</a>
	</th>
	<th>Applicable to?</th>
	<th>Description</th>
	</tr>
	<tr>
	<td>capacity</td>
	<td>Yes</td>
	<td>Container queues as well as leaf queues</td>
	<td>For a root-level container queue, this is the percentage of the
	number of slots in the cluster that will be available for all its
	immediate children together. For a root-level leaf-queue, this is
	the percentage of the number of slots in the cluster that will be
	available for all its jobs. For a non-root level container queue,
	this is the percentage of the number of slots in its parent queue
	that will be available for all its children together. For a
	non-root-level leaf queue, this is the percentage of the number of
	slots in its parent queue that will be available for jobs in this
	queue. The sum of capacities for all children of a container queue
	should be less than or equal 100. The sum of capacities of all the
	root-level queues should be less than or equal to 100.
	</td>
	</tr>
	<tr>
	<td>maximum-capacity</td>
	<td>Yes</td>
	<td>Container queues as well as leaf queues</td>
	<td>
	A limit in percentage beyond which a non-root-level queue cannot use
	the capacity of its parent queue; for a root-level queue, this is
	the limit in percentage beyond which it cannot use the
	cluster-capacity. This property provides a means to limit how much
	excess capacity a queue can use. It can be used to prevent queues
	with long running jobs from occupying more than a certain percentage
	of the parent-queue or the cluster, which, in the absence of
	pre-emption, can lead to capacity guarantees of other queues getting
	affected.

	The maximum-capacity of a queue can only be greater than or equal to
	its capacity. By default, there is no limit for a queue. For a
	non-root-level queue this means it can occupy till the
	maximum-capacity of its parent, for a root-level queue, it means that
	it can occupy the whole cluster. A value of 100 implies that a queue
	can use the complete capacity of its parent, or the complete
	cluster-capacity in case of root-level-queues.
	</td>
	</tr>
	<tr>
	<td>supports-priority</td>
	<td>No</td>
	<td>Leaf queues only</td>
	<td>If true, priorities of jobs will be taken into account in scheduling
	decisions.
	</td>
	</tr>
	<tr>
	<td>minimum-user-limit-percent</td>
	<td>Yes</td>
	<td>Leaf queues only</td>
	<td>Each queue enforces a limit on the percentage of resources
	allocated to a user at any given time, if there is competition
	for them. This user limit can vary between a minimum and maximum
	value. The former depends on the number of users who have submitted
	jobs, and the latter is set to this property value. For example,
	suppose the value of this property is 25. If two users have
	submitted jobs to a queue, no single user can use more than 50%
	of the queue resources. If a third user submits a job, no single
	user can use more than 33% of the queue resources. With 4 or more
	users, no user can use more than 25% of the queue's resources. A
	value of 100 implies no user limits are imposed.
	</td>
	</tr>
	<tr>
	<td>maximum-initialized-jobs-per-user</td>
	<td>Yes</td>
	<td>Leaf queues only</td>
	<td>
	Maximum number of jobs which are allowed to be pre-initialized for
	a particular user in the queue. Once a job is scheduled, i.e.
	it starts running, then that job is not considered
	while scheduler computes the maximum job a user is allowed to
	initialize.
	</td>
	</tr>
	</table>
	<p>See <a href="ext:mapred-queues-capacity-scheduler">
	this configuration file</a> for a default configuration of queues in
	capacity-scheduler.</p>
	</section>

	<section>
	<title>Job Initialization Parameters</title>
	<p>Capacity scheduler lazily initializes the jobs before they are
	scheduled, for reducing the memory footprint on jobtracker.
	Following are the parameters, by which you can control the laziness
	of the job initialization. The following parameters can be
	configured in capacity-scheduler.xml:
	</p>

	<table>
	<tr><th>Name</th><th>Description</th></tr>
	<tr>
	<td>
	mapred.capacity-scheduler.init-poll-interval
	</td>
	<td>
	Amount of time in miliseconds which is used to poll the scheduler
	job queue to look for jobs to be initialized.
	</td>
	</tr>
	<tr>
	<td>
	mapred.capacity-scheduler.init-worker-threads
	</td>
	<td>
	Number of worker threads which would be used by Initialization
	poller to initialize jobs in a set of queue. If number mentioned
	in property is equal to number of job queues then a thread is
	assigned jobs from one queue. If the number configured is lesser than
	number of queues, then a thread can get jobs from more than one queue
	which it initializes in a round robin fashion. If the number configured
	is greater than number of queues, then number of threads spawned
	would be equal to number of job queues.
	</td>
	</tr>
	</table>
	</section>
	<section>
	<title>Reviewing the Configuration of the Capacity Scheduler</title>
	<p>
	Once the installation and configuration is completed, you can review
	it after starting the MapReduce cluster from the admin UI.
	</p>
	<ul>
	<li>Start the MapReduce cluster as usual.</li>
	<li>Open the JobTracker web UI.</li>
	<li>The queues you have configured should be listed under the <em>Scheduling
	Information</em> section of the page.</li>
	<li>The properties for the queues should be visible in the <em>Scheduling
	Information</em> column against each queue.</li>
	</ul>
	</section>

	</section>
	</body>

	</document>