blob: ca7e296fc8b3fb01f1d01d839c8774c994492d47 [file] [log] [blame]
<table class="configuration table table-bordered">
<thead>
<tr>
<th class="text-left" style="width: 20%">Key</th>
<th class="text-left" style="width: 15%">Default</th>
<th class="text-left" style="width: 10%">Type</th>
<th class="text-left" style="width: 55%">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><h5>cluster.io-pool.size</h5></td>
<td style="word-wrap: break-word;">(none)</td>
<td>Integer</td>
<td>The size of the IO executor pool used by the cluster to execute blocking IO operations (Master as well as TaskManager processes). By default it will use 4 * the number of CPU cores (hardware contexts) that the cluster process has access to. Increasing the pool size allows to run more IO operations concurrently.</td>
</tr>
<tr>
<td><h5>cluster.registration.error-delay</h5></td>
<td style="word-wrap: break-word;">10000</td>
<td>Long</td>
<td>The pause made after an registration attempt caused an exception (other than timeout) in milliseconds.</td>
</tr>
<tr>
<td><h5>cluster.registration.initial-timeout</h5></td>
<td style="word-wrap: break-word;">100</td>
<td>Long</td>
<td>Initial registration timeout between cluster components in milliseconds.</td>
</tr>
<tr>
<td><h5>cluster.registration.max-timeout</h5></td>
<td style="word-wrap: break-word;">30000</td>
<td>Long</td>
<td>Maximum registration timeout between cluster components in milliseconds.</td>
</tr>
<tr>
<td><h5>cluster.registration.refused-registration-delay</h5></td>
<td style="word-wrap: break-word;">30000</td>
<td>Long</td>
<td>The pause made after the registration attempt was refused in milliseconds.</td>
</tr>
<tr>
<td><h5>cluster.services.shutdown-timeout</h5></td>
<td style="word-wrap: break-word;">30000</td>
<td>Long</td>
<td>The shutdown timeout for cluster services like executors in milliseconds.</td>
</tr>
<tr>
<td><h5>heartbeat.interval</h5></td>
<td style="word-wrap: break-word;">10000</td>
<td>Long</td>
<td>Time interval between heartbeat RPC requests from the sender to the receiver side.</td>
</tr>
<tr>
<td><h5>heartbeat.rpc-failure-threshold</h5></td>
<td style="word-wrap: break-word;">2</td>
<td>Integer</td>
<td>The number of consecutive failed heartbeat RPCs until a heartbeat target is marked as unreachable. Failed heartbeat RPCs can be used to detect dead targets faster because they no longer receive the RPCs. The detection time is <code class="highlighter-rouge">heartbeat.interval</code> * <code class="highlighter-rouge">heartbeat.rpc-failure-threshold</code>. In environments with a flaky network, setting this value too low can produce false positives. In this case, we recommend to increase this value, but not higher than <code class="highlighter-rouge">heartbeat.timeout</code> / <code class="highlighter-rouge">heartbeat.interval</code>. The mechanism can be disabled by setting this option to <code class="highlighter-rouge">-1</code></td>
</tr>
<tr>
<td><h5>heartbeat.timeout</h5></td>
<td style="word-wrap: break-word;">50000</td>
<td>Long</td>
<td>Timeout for requesting and receiving heartbeats for both sender and receiver sides.</td>
</tr>
<tr>
<td><h5>jobmanager.execution.failover-strategy</h5></td>
<td style="word-wrap: break-word;">"region"</td>
<td>String</td>
<td>This option specifies how the job computation recovers from task failures. Accepted values are:<ul><li>'full': Restarts all tasks to recover the job.</li><li>'region': Restarts all tasks that could be affected by the task failure. More details can be found <a href="../../ops/state/task_failure_recovery/#restart-pipelined-region-failover-strategy">here</a>.</li></ul></td>
</tr>
</tbody>
</table>