| <table class="configuration table table-bordered"> |
| <thead> |
| <tr> |
| <th class="text-left" style="width: 20%">Key</th> |
| <th class="text-left" style="width: 15%">Default</th> |
| <th class="text-left" style="width: 10%">Type</th> |
| <th class="text-left" style="width: 55%">Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><h5>cluster.io-pool.size</h5></td> |
| <td style="word-wrap: break-word;">(none)</td> |
| <td>Integer</td> |
| <td>The size of the IO executor pool used by the cluster to execute blocking IO operations (Master as well as TaskManager processes). By default it will use 4 * the number of CPU cores (hardware contexts) that the cluster process has access to. Increasing the pool size allows to run more IO operations concurrently.</td> |
| </tr> |
| <tr> |
| <td><h5>cluster.registration.error-delay</h5></td> |
| <td style="word-wrap: break-word;">10000</td> |
| <td>Long</td> |
| <td>The pause made after an registration attempt caused an exception (other than timeout) in milliseconds.</td> |
| </tr> |
| <tr> |
| <td><h5>cluster.registration.initial-timeout</h5></td> |
| <td style="word-wrap: break-word;">100</td> |
| <td>Long</td> |
| <td>Initial registration timeout between cluster components in milliseconds.</td> |
| </tr> |
| <tr> |
| <td><h5>cluster.registration.max-timeout</h5></td> |
| <td style="word-wrap: break-word;">30000</td> |
| <td>Long</td> |
| <td>Maximum registration timeout between cluster components in milliseconds.</td> |
| </tr> |
| <tr> |
| <td><h5>cluster.registration.refused-registration-delay</h5></td> |
| <td style="word-wrap: break-word;">30000</td> |
| <td>Long</td> |
| <td>The pause made after the registration attempt was refused in milliseconds.</td> |
| </tr> |
| <tr> |
| <td><h5>cluster.services.shutdown-timeout</h5></td> |
| <td style="word-wrap: break-word;">30000</td> |
| <td>Long</td> |
| <td>The shutdown timeout for cluster services like executors in milliseconds.</td> |
| </tr> |
| <tr> |
| <td><h5>heartbeat.interval</h5></td> |
| <td style="word-wrap: break-word;">10000</td> |
| <td>Long</td> |
| <td>Time interval between heartbeat RPC requests from the sender to the receiver side.</td> |
| </tr> |
| <tr> |
| <td><h5>heartbeat.rpc-failure-threshold</h5></td> |
| <td style="word-wrap: break-word;">2</td> |
| <td>Integer</td> |
| <td>The number of consecutive failed heartbeat RPCs until a heartbeat target is marked as unreachable. Failed heartbeat RPCs can be used to detect dead targets faster because they no longer receive the RPCs. The detection time is <code class="highlighter-rouge">heartbeat.interval</code> * <code class="highlighter-rouge">heartbeat.rpc-failure-threshold</code>. In environments with a flaky network, setting this value too low can produce false positives. In this case, we recommend to increase this value, but not higher than <code class="highlighter-rouge">heartbeat.timeout</code> / <code class="highlighter-rouge">heartbeat.interval</code>. The mechanism can be disabled by setting this option to <code class="highlighter-rouge">-1</code></td> |
| </tr> |
| <tr> |
| <td><h5>heartbeat.timeout</h5></td> |
| <td style="word-wrap: break-word;">50000</td> |
| <td>Long</td> |
| <td>Timeout for requesting and receiving heartbeats for both sender and receiver sides.</td> |
| </tr> |
| <tr> |
| <td><h5>jobmanager.execution.failover-strategy</h5></td> |
| <td style="word-wrap: break-word;">"region"</td> |
| <td>String</td> |
| <td>This option specifies how the job computation recovers from task failures. Accepted values are:<ul><li>'full': Restarts all tasks to recover the job.</li><li>'region': Restarts all tasks that could be affected by the task failure. More details can be found <a href="../../ops/state/task_failure_recovery/#restart-pipelined-region-failover-strategy">here</a>.</li></ul></td> |
| </tr> |
| </tbody> |
| </table> |