blob: 0d14b114b620bf08850ade212a2b045007a74d13 [file]
<table class="configuration table table-bordered">
<thead>
<tr>
<th class="text-left" style="width: 20%">Key</th>
<th class="text-left" style="width: 15%">Default</th>
<th class="text-left" style="width: 10%">Type</th>
<th class="text-left" style="width: 55%">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><h5>auth-token</h5></td>
<td style="word-wrap: break-word;">(none)</td>
<td>String</td>
<td>Authentication token for secured Triton servers.</td>
</tr>
<tr>
<td><h5>circuit-breaker-enabled</h5></td>
<td style="word-wrap: break-word;">false</td>
<td>Boolean</td>
<td>Whether to enable circuit breaker protection. When enabled, the circuit breaker will automatically fail fast when the server is unhealthy, preventing cascading failures and reducing load on the failing server. The circuit breaker implements a three-state model: CLOSED (normal), OPEN (failing fast), and HALF_OPEN (testing recovery). Defaults to false.</td>
</tr>
<tr>
<td><h5>circuit-breaker-failure-threshold</h5></td>
<td style="word-wrap: break-word;">0.5</td>
<td>Double</td>
<td>Failure rate threshold (0.0-1.0) that triggers the circuit breaker to open. For example, 0.5 means the circuit will open when 50% of recent requests fail. Requires a minimum of 10 requests before evaluation. Defaults to 0.5 (50%). Only effective when circuit-breaker-enabled is true.</td>
</tr>
<tr>
<td><h5>circuit-breaker-half-open-requests</h5></td>
<td style="word-wrap: break-word;">3</td>
<td>Integer</td>
<td>Number of successful test requests required in HALF_OPEN state to close the circuit. If any request fails in HALF_OPEN state, the circuit immediately reopens. Defaults to 3 requests. Only effective when circuit-breaker-enabled is true.</td>
</tr>
<tr>
<td><h5>circuit-breaker-timeout</h5></td>
<td style="word-wrap: break-word;">1 min</td>
<td>Duration</td>
<td>Duration to keep the circuit breaker in OPEN state before transitioning to HALF_OPEN. In HALF_OPEN state, limited requests are allowed to probe if the server has recovered. Defaults to 60 seconds. Only effective when circuit-breaker-enabled is true.</td>
</tr>
<tr>
<td><h5>compression</h5></td>
<td style="word-wrap: break-word;">(none)</td>
<td>String</td>
<td>Compression algorithm for request body. Currently only <code class="highlighter-rouge">gzip</code> is supported. When enabled, the request body will be compressed to reduce network bandwidth.</td>
</tr>
<tr>
<td><h5>custom-headers</h5></td>
<td style="word-wrap: break-word;">(none)</td>
<td>Map</td>
<td>Custom HTTP headers as key-value pairs. Example: <code class="highlighter-rouge">'X-Custom-Header:value,X-Another:value2'</code></td>
</tr>
<tr>
<td><h5>default-value</h5></td>
<td style="word-wrap: break-word;">(none)</td>
<td>String</td>
<td>Fallback value to return when all retry attempts fail (transient errors) or when the request fails with a non-retryable error (4xx). This allows downstream processing to distinguish between successful and failed predictions without propagating exceptions. Format depends on output type: for STRING use plain text (e.g. <code class="highlighter-rouge">'FAILED'</code>); for numeric types use string representation (e.g. <code class="highlighter-rouge">'-1'</code>); for ARRAY types use JSON array format (e.g. <code class="highlighter-rouge">'[0.0, 0.0]'</code>); for SQL NULL use the literal <code class="highlighter-rouge">'null'</code>. Note: the lower-case literal <code class="highlighter-rouge">'null'</code> is ALWAYS interpreted as SQL NULL and cannot be used as a STRING sentinel; if you need a string-typed sentinel indicating failure, use <code class="highlighter-rouge">'NULL'</code>, <code class="highlighter-rouge">'FAILED'</code> or <code class="highlighter-rouge">'&lt;null&gt;'</code> instead. The value is parsed once at operator initialization; an unparseable value fails the job at startup rather than at the first error. If not specified, exceptions are thrown on failure.</td>
</tr>
<tr>
<td><h5>flatten-batch-dim</h5></td>
<td style="word-wrap: break-word;">false</td>
<td>Boolean</td>
<td>Whether to flatten the batch dimension for array inputs. When true, shape [1,N] becomes [N]. Defaults to false.</td>
</tr>
<tr>
<td><h5>health-check-enabled</h5></td>
<td style="word-wrap: break-word;">false</td>
<td>Boolean</td>
<td>Whether to enable periodic health checks for the Triton server. When enabled, the health checker will periodically call <code class="highlighter-rouge">/v2/health/live</code> endpoint to verify server availability. Defaults to false.</td>
</tr>
<tr>
<td><h5>health-check-interval</h5></td>
<td style="word-wrap: break-word;">30 s</td>
<td>Duration</td>
<td>Interval between health check requests. Shorter intervals provide faster failure detection but increase server load. Defaults to 30 seconds. Only effective when health-check-enabled is true.</td>
</tr>
<tr>
<td><h5>max-retries</h5></td>
<td style="word-wrap: break-word;">0</td>
<td>Integer</td>
<td>Maximum number of retries (additional attempts beyond the first) for failed inference requests. With max-retries=2 the request will be attempted up to 3 times in total (1 initial attempt + 2 retries). When set to 0 (default), no retry is performed. Only transient failures are retried: network errors and 5xx responses. Client-side 4xx errors, response parsing failures, and circuit breaker OPEN failures are never retried because they indicate a persistent condition. Must be &gt;= 0.</td>
</tr>
<tr>
<td><h5>priority</h5></td>
<td style="word-wrap: break-word;">(none)</td>
<td>Integer</td>
<td>Request priority level (0-255). Higher values indicate higher priority.</td>
</tr>
<tr>
<td><h5>retry-initial-backoff</h5></td>
<td style="word-wrap: break-word;">100 ms</td>
<td>Duration</td>
<td>Initial backoff duration between retry attempts. Uses exponential backoff with equal jitter: the nominal delay is initial-backoff * 2^attempt (first retry waits this duration, second retry waits 2x, third waits 4x, and so on), clamped to retry-max-backoff, then randomized in the range [delay/2, delay] to prevent a thundering herd of concurrent retries hitting the server at the exact same instant. Defaults to 100ms. Must be &gt; 0.</td>
</tr>
<tr>
<td><h5>retry-max-backoff</h5></td>
<td style="word-wrap: break-word;">30 s</td>
<td>Duration</td>
<td>Upper bound on the delay between retry attempts. Exponential backoff computed from retry-initial-backoff is clamped to this value so that a misconfigured max-retries cannot produce hours-long sleeps or overflow the delay computation. Defaults to 30s. Must be &gt;= retry-initial-backoff.</td>
</tr>
<tr>
<td><h5>sequence-end</h5></td>
<td style="word-wrap: break-word;">false</td>
<td>Boolean</td>
<td>Whether this request marks the end of a sequence for stateful models. When true, Triton will release the model's state after processing this request. See <a href="https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/architecture.html#stateful-models">Triton Stateful Models</a> for more details.</td>
</tr>
<tr>
<td><h5>sequence-id</h5></td>
<td style="word-wrap: break-word;">(none)</td>
<td>String</td>
<td>Sequence ID for stateful models. A sequence represents a series of inference requests that must be routed to the same model instance to maintain state across requests (e.g., for RNN/LSTM models). See <a href="https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/architecture.html#stateful-models">Triton Stateful Models</a> for more details.</td>
</tr>
<tr>
<td><h5>sequence-start</h5></td>
<td style="word-wrap: break-word;">false</td>
<td>Boolean</td>
<td>Whether this request marks the start of a new sequence for stateful models. When true, Triton will initialize the model's state before processing this request. See <a href="https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/architecture.html#stateful-models">Triton Stateful Models</a> for more details.</td>
</tr>
</tbody>
</table>