docs/layouts/shortcodes/generated/pipeline_configuration.html - flink - Git at Google

 <table class="configuration table table-bordered">
     <thead>
         <tr>
             <th class="text-left" style="width: 20%">Key</th>
             <th class="text-left" style="width: 15%">Default</th>
             <th class="text-left" style="width: 10%">Type</th>
             <th class="text-left" style="width: 55%">Description</th>
         </tr>
     </thead>
     <tbody>
         <tr>
             <td><h5>pipeline.auto-generate-uids</h5></td>
             <td style="word-wrap: break-word;">true</td>
             <td>Boolean</td>
             <td>When auto-generated UIDs are disabled, users are forced to manually specify UIDs on DataStream applications.<br /><br />It is highly recommended that users specify UIDs before deploying to production since they are used to match state in savepoints to operators in a job. Because auto-generated ID's are likely to change when modifying a job, specifying custom IDs allow an application to evolve over time without discarding state.</td>
         </tr>
         <tr>
             <td><h5>pipeline.auto-type-registration</h5></td>
             <td style="word-wrap: break-word;">true</td>
             <td>Boolean</td>
             <td>Controls whether Flink is automatically registering all types in the user programs with Kryo.</td>
         </tr>
         <tr>
             <td><h5>pipeline.auto-watermark-interval</h5></td>
             <td style="word-wrap: break-word;">200 ms</td>
             <td>Duration</td>
             <td>The interval of the automatic watermark emission. Watermarks are used throughout the streaming system to keep track of the progress of time. They are used, for example, for time based windowing.</td>
         </tr>
         <tr>
             <td><h5>pipeline.cached-files</h5></td>
             <td style="word-wrap: break-word;">(none)</td>
             <td>List&lt;String&gt;</td>
             <td>Files to be registered at the distributed cache under the given name. The files will be accessible from any user-defined function in the (distributed) runtime under a local path. Files may be local files (which will be distributed via BlobServer), or files in a distributed file system. The runtime will copy the files temporarily to a local cache, if needed.<br /><br />Example:<br /><code class="highlighter-rouge">name:file1,path:'file:///tmp/file1';name:file2,path:'hdfs:///tmp/file2'</code></td>
         </tr>
         <tr>
             <td><h5>pipeline.classpaths</h5></td>
             <td style="word-wrap: break-word;">(none)</td>
             <td>List&lt;String&gt;</td>
             <td>A semicolon-separated list of the classpaths to package with the job jars to be sent to the cluster. These have to be valid URLs.</td>
         </tr>
         <tr>
             <td><h5>pipeline.closure-cleaner-level</h5></td>
             <td style="word-wrap: break-word;">RECURSIVE</td>
             <td><p>Enum</p></td>
             <td>Configures the mode in which the closure cleaner works.<br /><br />Possible values:<ul><li>"NONE": Disables the closure cleaner completely.</li><li>"TOP_LEVEL": Cleans only the top-level class without recursing into fields.</li><li>"RECURSIVE": Cleans all fields recursively.</li></ul></td>
         </tr>
         <tr>
             <td><h5>pipeline.default-kryo-serializers</h5></td>
             <td style="word-wrap: break-word;">(none)</td>
             <td>List&lt;String&gt;</td>
             <td>Semicolon separated list of pairs of class names and Kryo serializers class names to be used as Kryo default serializers<br /><br />Example:<br /><code class="highlighter-rouge">class:org.example.ExampleClass,serializer:org.example.ExampleSerializer1; class:org.example.ExampleClass2,serializer:org.example.ExampleSerializer2</code></td>
         </tr>
         <tr>
             <td><h5>pipeline.force-avro</h5></td>
             <td style="word-wrap: break-word;">false</td>
             <td>Boolean</td>
             <td>Forces Flink to use the Apache Avro serializer for POJOs.<br /><br />Important: Make sure to include the <code class="highlighter-rouge">flink-avro</code> module.</td>
         </tr>
         <tr>
             <td><h5>pipeline.force-kryo</h5></td>
             <td style="word-wrap: break-word;">false</td>
             <td>Boolean</td>
             <td>If enabled, forces TypeExtractor to use Kryo serializer for POJOS even though we could analyze as POJO. In some cases this might be preferable. For example, when using interfaces with subclasses that cannot be analyzed as POJO.</td>
         </tr>
         <tr>
             <td><h5>pipeline.generic-types</h5></td>
             <td style="word-wrap: break-word;">true</td>
             <td>Boolean</td>
             <td>If the use of generic types is disabled, Flink will throw an <code class="highlighter-rouge">UnsupportedOperationException</code> whenever it encounters a data type that would go through Kryo for serialization.<br /><br />Disabling generic types can be helpful to eagerly find and eliminate the use of types that would go through Kryo serialization during runtime. Rather than checking types individually, using this option will throw exceptions eagerly in the places where generic types are used.<br /><br />We recommend to use this option only during development and pre-production phases, not during actual production use. The application program and/or the input data may be such that new, previously unseen, types occur at some point. In that case, setting this option would cause the program to fail.</td>
         </tr>
         <tr>
             <td><h5>pipeline.global-job-parameters</h5></td>
             <td style="word-wrap: break-word;">(none)</td>
             <td>Map</td>
             <td>Register a custom, serializable user configuration object. The configuration can be  accessed in operators</td>
         </tr>
         <tr>
             <td><h5>pipeline.jars</h5></td>
             <td style="word-wrap: break-word;">(none)</td>
             <td>List&lt;String&gt;</td>
             <td>A semicolon-separated list of the jars to package with the job jars to be sent to the cluster. These have to be valid paths.</td>
         </tr>
         <tr>
             <td><h5>pipeline.jobvertex-parallelism-overrides</h5></td>
             <td style="word-wrap: break-word;"></td>
             <td>Map</td>
             <td>A parallelism override map (jobVertexId -&gt; parallelism) which will be used to update the parallelism of the corresponding job vertices of submitted JobGraphs.</td>
         </tr>
         <tr>
             <td><h5>pipeline.max-parallelism</h5></td>
             <td style="word-wrap: break-word;">-1</td>
             <td>Integer</td>
             <td>The program-wide maximum parallelism used for operators which haven't specified a maximum parallelism. The maximum parallelism specifies the upper limit for dynamic scaling and the number of key groups used for partitioned state. Changing the value explicitly when recovery from original job will lead to state incompatibility. Must be less than or equal to 32768.</td>
         </tr>
         <tr>
             <td><h5>pipeline.name</h5></td>
             <td style="word-wrap: break-word;">(none)</td>
             <td>String</td>
             <td>The job name used for printing and logging.</td>
         </tr>
         <tr>
             <td><h5>pipeline.object-reuse</h5></td>
             <td style="word-wrap: break-word;">false</td>
             <td>Boolean</td>
             <td>When enabled objects that Flink internally uses for deserialization and passing data to user-code functions will be reused. Keep in mind that this can lead to bugs when the user-code function of an operation is not aware of this behaviour.</td>
         </tr>
         <tr>
             <td><h5>pipeline.operator-chaining</h5></td>
             <td style="word-wrap: break-word;">true</td>
             <td>Boolean</td>
             <td>Operator chaining allows non-shuffle operations to be co-located in the same thread fully avoiding serialization and de-serialization.</td>
         </tr>
         <tr>
             <td><h5>pipeline.registered-kryo-types</h5></td>
             <td style="word-wrap: break-word;">(none)</td>
             <td>List&lt;String&gt;</td>
             <td>Semicolon separated list of types to be registered with the serialization stack. If the type is eventually serialized as a POJO, then the type is registered with the POJO serializer. If the type ends up being serialized with Kryo, then it will be registered at Kryo to make sure that only tags are written.</td>
         </tr>
         <tr>
             <td><h5>pipeline.registered-pojo-types</h5></td>
             <td style="word-wrap: break-word;">(none)</td>
             <td>List&lt;String&gt;</td>
             <td>Semicolon separated list of types to be registered with the serialization stack. If the type is eventually serialized as a POJO, then the type is registered with the POJO serializer. If the type ends up being serialized with Kryo, then it will be registered at Kryo to make sure that only tags are written.</td>
         </tr>
         <tr>
             <td><h5>pipeline.vertex-description-mode</h5></td>
             <td style="word-wrap: break-word;">TREE</td>
             <td><p>Enum</p></td>
             <td>The mode how we organize description of a job vertex.<br /><br />Possible values:<ul><li>"TREE"</li><li>"CASCADING"</li></ul></td>
         </tr>
         <tr>
             <td><h5>pipeline.vertex-name-include-index-prefix</h5></td>
             <td style="word-wrap: break-word;">false</td>
             <td>Boolean</td>
             <td>Whether name of vertex includes topological index or not. When it is true, the name will have a prefix of index of the vertex, like '[vertex-0]Source: source'. It is false by default</td>
         </tr>
         <tr>
             <td><h5>pipeline.watermark-alignment.allow-unaligned-source-splits</h5></td>
             <td style="word-wrap: break-word;">false</td>
             <td>Boolean</td>
             <td>If watermark alignment is used, sources with multiple splits will attempt to pause/resume split readers to avoid watermark drift of source splits. However, if split readers don't support pause/resume, an UnsupportedOperationException will be thrown when there is an attempt to pause/resume. To allow use of split readers that don't support pause/resume and, hence, to allow unaligned splits while still using watermark alignment, set this parameter to true. The default value is false. Note: This parameter may be removed in future releases.</td>
         </tr>
     </tbody>
 </table>
	<table class="configuration table table-bordered">
	<thead>
	<tr>
	<th class="text-left" style="width: 20%">Key</th>
	<th class="text-left" style="width: 15%">Default</th>
	<th class="text-left" style="width: 10%">Type</th>
	<th class="text-left" style="width: 55%">Description</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td><h5>pipeline.auto-generate-uids</h5></td>
	<td style="word-wrap: break-word;">true</td>
	<td>Boolean</td>
	<td>When auto-generated UIDs are disabled, users are forced to manually specify UIDs on DataStream applications.<br /><br />It is highly recommended that users specify UIDs before deploying to production since they are used to match state in savepoints to operators in a job. Because auto-generated ID's are likely to change when modifying a job, specifying custom IDs allow an application to evolve over time without discarding state.</td>
	</tr>
	<tr>
	<td><h5>pipeline.auto-type-registration</h5></td>
	<td style="word-wrap: break-word;">true</td>
	<td>Boolean</td>
	<td>Controls whether Flink is automatically registering all types in the user programs with Kryo.</td>
	</tr>
	<tr>
	<td><h5>pipeline.auto-watermark-interval</h5></td>
	<td style="word-wrap: break-word;">200 ms</td>
	<td>Duration</td>
	<td>The interval of the automatic watermark emission. Watermarks are used throughout the streaming system to keep track of the progress of time. They are used, for example, for time based windowing.</td>
	</tr>
	<tr>
	<td><h5>pipeline.cached-files</h5></td>
	<td style="word-wrap: break-word;">(none)</td>
	<td>List<String></td>
	<td>Files to be registered at the distributed cache under the given name. The files will be accessible from any user-defined function in the (distributed) runtime under a local path. Files may be local files (which will be distributed via BlobServer), or files in a distributed file system. The runtime will copy the files temporarily to a local cache, if needed.<br /><br />Example:<br /><code class="highlighter-rouge">name:file1,path:'file:///tmp/file1';name:file2,path:'hdfs:///tmp/file2'</code></td>
	</tr>
	<tr>
	<td><h5>pipeline.classpaths</h5></td>
	<td style="word-wrap: break-word;">(none)</td>
	<td>List<String></td>
	<td>A semicolon-separated list of the classpaths to package with the job jars to be sent to the cluster. These have to be valid URLs.</td>
	</tr>
	<tr>
	<td><h5>pipeline.closure-cleaner-level</h5></td>
	<td style="word-wrap: break-word;">RECURSIVE</td>
	<td><p>Enum</p></td>
	<td>Configures the mode in which the closure cleaner works.<br /><br />Possible values:<ul><li>"NONE": Disables the closure cleaner completely.</li><li>"TOP_LEVEL": Cleans only the top-level class without recursing into fields.</li><li>"RECURSIVE": Cleans all fields recursively.</li></ul></td>
	</tr>
	<tr>
	<td><h5>pipeline.default-kryo-serializers</h5></td>
	<td style="word-wrap: break-word;">(none)</td>
	<td>List<String></td>
	<td>Semicolon separated list of pairs of class names and Kryo serializers class names to be used as Kryo default serializers<br /><br />Example:<br /><code class="highlighter-rouge">class:org.example.ExampleClass,serializer:org.example.ExampleSerializer1; class:org.example.ExampleClass2,serializer:org.example.ExampleSerializer2</code></td>
	</tr>
	<tr>
	<td><h5>pipeline.force-avro</h5></td>
	<td style="word-wrap: break-word;">false</td>
	<td>Boolean</td>
	<td>Forces Flink to use the Apache Avro serializer for POJOs.<br /><br />Important: Make sure to include the <code class="highlighter-rouge">flink-avro</code> module.</td>
	</tr>
	<tr>
	<td><h5>pipeline.force-kryo</h5></td>
	<td style="word-wrap: break-word;">false</td>
	<td>Boolean</td>
	<td>If enabled, forces TypeExtractor to use Kryo serializer for POJOS even though we could analyze as POJO. In some cases this might be preferable. For example, when using interfaces with subclasses that cannot be analyzed as POJO.</td>
	</tr>
	<tr>
	<td><h5>pipeline.generic-types</h5></td>
	<td style="word-wrap: break-word;">true</td>
	<td>Boolean</td>
	<td>If the use of generic types is disabled, Flink will throw an <code class="highlighter-rouge">UnsupportedOperationException</code> whenever it encounters a data type that would go through Kryo for serialization.<br /><br />Disabling generic types can be helpful to eagerly find and eliminate the use of types that would go through Kryo serialization during runtime. Rather than checking types individually, using this option will throw exceptions eagerly in the places where generic types are used.<br /><br />We recommend to use this option only during development and pre-production phases, not during actual production use. The application program and/or the input data may be such that new, previously unseen, types occur at some point. In that case, setting this option would cause the program to fail.</td>
	</tr>
	<tr>
	<td><h5>pipeline.global-job-parameters</h5></td>
	<td style="word-wrap: break-word;">(none)</td>
	<td>Map</td>
	<td>Register a custom, serializable user configuration object. The configuration can be accessed in operators</td>
	</tr>
	<tr>
	<td><h5>pipeline.jars</h5></td>
	<td style="word-wrap: break-word;">(none)</td>
	<td>List<String></td>
	<td>A semicolon-separated list of the jars to package with the job jars to be sent to the cluster. These have to be valid paths.</td>
	</tr>
	<tr>
	<td><h5>pipeline.jobvertex-parallelism-overrides</h5></td>
	<td style="word-wrap: break-word;"></td>
	<td>Map</td>
	<td>A parallelism override map (jobVertexId -> parallelism) which will be used to update the parallelism of the corresponding job vertices of submitted JobGraphs.</td>
	</tr>
	<tr>
	<td><h5>pipeline.max-parallelism</h5></td>
	<td style="word-wrap: break-word;">-1</td>
	<td>Integer</td>
	<td>The program-wide maximum parallelism used for operators which haven't specified a maximum parallelism. The maximum parallelism specifies the upper limit for dynamic scaling and the number of key groups used for partitioned state. Changing the value explicitly when recovery from original job will lead to state incompatibility. Must be less than or equal to 32768.</td>
	</tr>
	<tr>
	<td><h5>pipeline.name</h5></td>
	<td style="word-wrap: break-word;">(none)</td>
	<td>String</td>
	<td>The job name used for printing and logging.</td>
	</tr>
	<tr>
	<td><h5>pipeline.object-reuse</h5></td>
	<td style="word-wrap: break-word;">false</td>
	<td>Boolean</td>
	<td>When enabled objects that Flink internally uses for deserialization and passing data to user-code functions will be reused. Keep in mind that this can lead to bugs when the user-code function of an operation is not aware of this behaviour.</td>
	</tr>
	<tr>
	<td><h5>pipeline.operator-chaining</h5></td>
	<td style="word-wrap: break-word;">true</td>
	<td>Boolean</td>
	<td>Operator chaining allows non-shuffle operations to be co-located in the same thread fully avoiding serialization and de-serialization.</td>
	</tr>
	<tr>
	<td><h5>pipeline.registered-kryo-types</h5></td>
	<td style="word-wrap: break-word;">(none)</td>
	<td>List<String></td>
	<td>Semicolon separated list of types to be registered with the serialization stack. If the type is eventually serialized as a POJO, then the type is registered with the POJO serializer. If the type ends up being serialized with Kryo, then it will be registered at Kryo to make sure that only tags are written.</td>
	</tr>
	<tr>
	<td><h5>pipeline.registered-pojo-types</h5></td>
	<td style="word-wrap: break-word;">(none)</td>
	<td>List<String></td>
	<td>Semicolon separated list of types to be registered with the serialization stack. If the type is eventually serialized as a POJO, then the type is registered with the POJO serializer. If the type ends up being serialized with Kryo, then it will be registered at Kryo to make sure that only tags are written.</td>
	</tr>
	<tr>
	<td><h5>pipeline.vertex-description-mode</h5></td>
	<td style="word-wrap: break-word;">TREE</td>
	<td><p>Enum</p></td>
	<td>The mode how we organize description of a job vertex.<br /><br />Possible values:<ul><li>"TREE"</li><li>"CASCADING"</li></ul></td>
	</tr>
	<tr>
	<td><h5>pipeline.vertex-name-include-index-prefix</h5></td>
	<td style="word-wrap: break-word;">false</td>
	<td>Boolean</td>
	<td>Whether name of vertex includes topological index or not. When it is true, the name will have a prefix of index of the vertex, like '[vertex-0]Source: source'. It is false by default</td>
	</tr>
	<tr>
	<td><h5>pipeline.watermark-alignment.allow-unaligned-source-splits</h5></td>
	<td style="word-wrap: break-word;">false</td>
	<td>Boolean</td>
	<td>If watermark alignment is used, sources with multiple splits will attempt to pause/resume split readers to avoid watermark drift of source splits. However, if split readers don't support pause/resume, an UnsupportedOperationException will be thrown when there is an attempt to pause/resume. To allow use of split readers that don't support pause/resume and, hence, to allow unaligned splits while still using watermark alignment, set this parameter to true. The default value is false. Note: This parameter may be removed in future releases.</td>
	</tr>
	</tbody>
	</table>