website/www/site/data/capability_matrix.yaml - beam - Git at Google

 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.

 capability-matrix:
   columns:
     - class: dataflow
       name: Google Cloud Dataflow
     - class: flink
       name: Apache Flink
     - class: spark-rdd
       name: Apache Spark (RDD/DStream based)
     - class: spark-dataset
       name: Apache Spark Structured Streaming (Dataset based)
     - class: samza
       name: Apache Samza
     - class: nemo
       name: Apache Nemo
     - class: jet
       name: Hazelcast Jet
     - class: twister2
       name: Twister2
     - class: python direct
       name: Python Direct FnRunner
     - class: go direct
       name: Go Direct Runner

   categories:
     - description: What is being computed?
       anchor: what
       color-y: "fff"
       color-yb: "f6f6f6"
       color-p: "f9f9f9"
       color-pb: "d8d8d8"
       color-n: "e1e0e0"
       color-nb: "bcbcbc"
       rows:
         - name: ParDo
           description: Element-wise transformation parameterized by a chunk of user code. Elements are processed in bundles, with initialization and termination hooks. Bundle size is chosen by the runner and cannot be controlled by user code. ParDo processes a main input PCollection one element at a time, but provides side input access to additional PCollections.
           values:
             - class: dataflow
               l1: "Yes"
               l2: fully supported
               l3: Batch mode uses large bundle sizes. Streaming uses smaller bundle sizes.
             - class: flink
               l1: "Yes"
               l2: fully supported
               l3: ParDo itself, as per-element transformation with UDFs, is fully supported by Flink for both batch and streaming.
             - class: spark-rdd
               l1: "Yes"
               l2: fully supported
               l3: ParDo applies per-element transformations as Spark FlatMapFunction.
             - class: spark-dataset
               l1: "Partially"
               l2: fully supported in batch mode
               l3: ParDo applies per-element transformations as Spark FlatMapFunction.
             - class: samza
               l1: "Yes"
               l2: fully supported
               l3: Supported with per-element transformation.
             - class: nemo
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: jet
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: twister2
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: python direct
               l1: ""
               l2:
               l3: ""
             - class: go direct
               l1: ""
               l2:
               l3: ""
         - name: GroupByKey
           description: Grouping of key-value pairs per key, window, and pane. (See also other tabs.)
           values:
             - class: dataflow
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: flink
               l1: "Yes"
               l2: fully supported
               l3: "Uses Flink's keyBy for key grouping. When grouping by window in streaming (creating the panes) the Flink runner uses the Beam code. This guarantees support for all windowing and triggering mechanisms."
             - class: spark-rdd
               l1: "Partially"
               l2: fully supported in batch mode
               l3: "Using Spark's <tt>groupByKey</tt>. GroupByKey with multiple trigger firings in streaming mode is a work in progress."
             - class: spark-dataset
               l1: "Partially"
               l2: fully supported in batch mode
               l3: "Using Spark's <tt>groupByKey</tt>."
             - class: samza
               l1: "Yes"
               l2: fully supported
               l3: "Uses Samza's partitionBy for key grouping and Beam's logic for window aggregation and triggering."
             - class: nemo
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: jet
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: twister2
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: python direct
               l1: ""
               l2:
               l3: ""
             - class: go direct
               l1: ""
               l2:
               l3: ""
         - name: Flatten
           description: Concatenates multiple homogenously typed collections together.
           values:
             - class: dataflow
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: flink
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: spark-rdd
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: spark-dataset
               l1: "Partially"
               l2: fully supported in batch mode
               l3: Some corner cases like flatten on empty collections are not yet supported.
             - class: samza
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: nemo
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: jet
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: twister2
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: python direct
               l1: ""
               l2:
               l3: ""
             - class: go direct
               l1: ""
               l2:
               l3: ""
         - name: Combine
           description: 'Application of an associative, commutative operation over all values ("globally") or over all values associated with each key ("per key"). Can be implemented using ParDo, but often more efficient implementations exist.'
           values:
             - class: dataflow
               l1: "Yes"
               l2: "efficient execution"
               l3: ""
             - class: flink
               l1: "Yes"
               l2: "fully supported"
               l3: Uses a combiner for pre-aggregation for batch and streaming.
             - class: spark-rdd
               l1: "Yes"
               l2: fully supported
               l3: "Using Spark's <tt>combineByKey</tt> and <tt>aggregate</tt> functions."
             - class: spark-dataset
               l1: "Partially"
               l2: fully supported in batch mode
               l3: "Using Spark's <tt>Aggregator</tt> and agg function"
             - class: samza
               l1: "Yes"
               l2: fully supported
               l3: Use combiner for efficient pre-aggregation.
             - class: nemo
               l1: "Yes"
               l2: fully supported
               l3: "Batch mode uses pre-aggregation"
             - class: jet
               l1: "Yes"
               l2: fully supported
               l3: "Batch mode uses pre-aggregation"
             - class: twister2
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: python direct
               l1: ""
               l2:
               l3: ""
             - class: go direct
               l1: ""
               l2:
               l3: ""
         - name: Composite Transforms
           description: Allows easy extensibility for library writers.  In the near future, we expect there to be more information provided at this level -- customized metadata hooks for monitoring, additional runtime/environment hooks, etc.
           values:
             - class: dataflow
               l1: "Partially"
               l2: supported via inlining
               l3: Currently composite transformations are inlined during execution. The structure is later recreated from the names, but other transform level information (if added to the model) will be lost.
             - class: flink
               l1: "Partially"
               l2: supported via inlining
               l3: ""
             - class: spark-rdd
               l1: "Partially"
               l2: supported via inlining
               l3: ""
             - class: spark-dataset
               l1: "Partially"
               l2: supported via inlining only in batch mode
               l3: ""
             - class: samza
               l1: "Partially"
               l2: supported via inlining
               l3: ""
             - class: nemo
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: jet
               l1: "Partially"
               l2: supported via inlining
               l3: ""
             - class: twister2
               l1: "Partially"
               l2: supported via inlining
               l3: ""
             - class: python direct
               l1: ""
               l2:
               l3: ""
             - class: go direct
               l1: ""
               l2:
               l3: ""
         - name: Side Inputs
           description: Side inputs are additional <tt>PCollections</tt> whose contents are computed during pipeline execution and then made accessible to DoFn code. The exact shape of the side input depends both on the <tt>PCollectionView</tt> used to describe the access pattern (interable, map, singleton) and the window of the element from the main input that is currently being processed.
           values:
             - class: dataflow
               l1: "Yes"
               l2: some size restrictions in streaming
               l3: Batch mode supports a distributed implementation, but streaming mode may force some size restrictions. Neither mode is able to push lookups directly up into key-based sources.
             - class: flink
               l1: "Yes"
               l2: some size restrictions in streaming
               l3: Batch mode supports a distributed implementation, but streaming mode may force some size restrictions. Neither mode is able to push lookups directly up into key-based sources.
             - class: spark-rdd
               l1: "Yes"
               l2: fully supported
               l3: "Using Spark's broadcast variables. In streaming mode, side inputs may update but only between micro-batches."
             - class: spark-dataset
               l1: "Partially"
               l2: fully supported in batch mode
               l3: "Using Spark's broadcast variables."
             - class: samza
               l1: "Yes"
               l2: fully supported
               l3: Uses Samza's broadcast operator to distribute the side inputs.
             - class: nemo
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: jet
               l1: "Partially"
               l2: with restrictions
               l3: Supported only when the side input source is bounded and windowing uses global window
             - class: twister2
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: python direct
               l1: ""
               l2:
               l3: ""
             - class: go direct
               l1: ""
               l2:
               l3: ""
         - name: Source API
           description: Allows users to provide additional input sources. Supports both bounded and unbounded data. Includes hooks necessary to provide efficient parallelization (size estimation, progress information, dynamic splitting, etc).
           values:
             - class: dataflow
               l1: "Yes"
               l2: fully supported
               l3: Support includes autotuning features (https://cloud.google.com/dataflow/service/dataflow-service-desc#autotuning-features).
             - class: flink
               l1: "Yes"
               l2: fully supported
               l3:
             - class: spark-rdd
               l1: "Yes"
               l2: fully supported
               l3:
             - class: spark-dataset
               l1: "Partially"
               l2: bounded source only
               l3: "Using Spark's DatasourceV2 API in microbatch mode (Continuous streaming mode is tagged experimental in spark and does not support aggregation)."
             - class: samza
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: nemo
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: jet
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: twister2
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: python direct
               l1: ""
               l2:
               l3: ""
             - class: go direct
               l1: ""
               l2:
               l3: ""
         - name: Metrics
           description: Allow transforms to gather simple metrics across bundles in a <tt>PTransform</tt>. Provide a mechanism to obtain both committed and attempted metrics. Semantically similar to using an additional output, but support partial results as the transform executes, and support both committed and attempted values. Will likely want to augment <tt>Metrics</tt> to be more useful for processing unbounded data by making them windowed.
           values:
             - class: dataflow
               l1: "Partially"
               l2: ""
               l3: Gauge metrics are not supported. All other metric types are supported.
             - class: flink
               l1: "Partially"
               l2: All metrics types are supported.
               l3: Only attempted values are supported. No committed values for metrics.
             - class: spark-rdd
               l1: "Partially"
               l2: All metric types are supported.
               l3: Only attempted values are supported. No committed values for metrics.
             - class: spark-dataset
               l1: "Partially"
               l2: All metric types are supported in batch mode.
               l3: Only attempted values are supported. No committed values for metrics.
             - class: samza
               l1: "Partially"
               l2: Counter and Gauge are supported.
               l3: Only attempted values are supported. No committed values for metrics.
             - class: nemo
               l1: "No"
               l2: not implemented
               l3: ""
             - class: jet
               l1: "Partially"
               l2: All metrics types supported, both in batching and streaming mode.
               l3: Doesn't differentiate between committed and attempted values.
             - class: twister2
               l1: "No"
               l2: not implemented
               l3: ""
             - class: python direct
               l1: ""
               l2:
               l3: ""
             - class: go direct
               l1: ""
               l2:
               l3: ""
         - name: Stateful Processing
           description: Allows fine-grained access to per-key, per-window persistent state. Necessary for certain use cases (e.g. high-volume windows which store large amounts of data, but typically only access small portions of it; complex state machines; etc.) that are not easily or efficiently addressed via <tt>Combine</tt> or <tt>GroupByKey</tt>+<tt>ParDo</tt>.
           values:
             - class: dataflow
               l1: "Partially"
               l2: non-merging windows
               l3: State is supported for non-merging windows. SetState and MapState are not yet supported.
             - class: flink
               l1: "Partially"
               l2: non-merging windows
               l3: State is supported for non-merging windows. SetState and MapState are not yet supported.
             - class: spark-rdd
               l1: "Partially"
               l2: full support in batch mode
               l3:
             - class: spark-dataset
               l1: "No"
               l2: not implemented
               l3:
             - class: samza
               l1: "Partially"
               l2: non-merging windows
               l3: "States are backed up by either rocksDb KV store or in-memory hash map, and persist using changelog."
             - class: nemo
               l1: "No"
               l2: not implemented
               l3: ""
             - class: jet
               l1: "Partially"
               l2: non-merging windows
               l3: ""
             - class: twister2
               l1: "No"
               l2: not implemented
               l3: ""
             - class: python direct
               l1: ""
               l2:
               l3: ""
             - class: go direct
               l1: ""
               l2:
               l3: ""
     - description: Bounded Splittable DoFn Support Status
       anchor: what
       color-y: "fff"
       color-yb: "f6f6f6"
       color-p: "f9f9f9"
       color-pb: "d8d8d8"
       color-n: "e1e0e0"
       color-nb: "bcbcbc"
       rows:
         - name: Base
           description: ""
           values:
             - class: dataflow
               l1: "Partially"
               l2: Only Dataflow Runner V2 supports this.
               l3: ""
             - class: flink
               l1: "Partially"
               l2: Only portable Flink Runner supports this.
               l3: ""
             - class: spark-rdd
               l1:
               l2:
               l3: ""
             - class: spark-dataset
               l1:
               l2:
               l3: ""
             - class: samza
               l1:
               l2:
               l3: ""
             - class: nemo
               l1:
               l2:
               l3: ""
             - class: jet
               l1:
               l2:
               l3: ""
             - class: twister2
               l1:
               l2:
               l3: ""
             - class: python direct
               l1: "Yes"
               l2:
               l3:
             - class: go direct
               l1: "Yes"
               l2:
               l3:
         - name: Side Inputs
           description: ""
           values:
             - class: dataflow
               l1: "Partially"
               l2: Only Dataflow Runner V2 supports this.
               l3: ""
             - class: flink
               l1: "Partially"
               l2: Only portable Flink Runner supports this.
               l3: ""
             - class: spark-rdd
               l1:
               l2:
               l3: ""
             - class: spark-dataset
               l1:
               l2:
               l3: ""
             - class: samza
               l1:
               l2:
               l3: ""
             - class: nemo
               l1:
               l2:
               l3: ""
             - class: jet
               l1:
               l2:
               l3: ""
             - class: twister2
               l1:
               l2:
               l3: ""
             - class: python direct
               l1:
               l2:
               l3:
             - class: go direct
               l1: "Yes"
               l2:
               l3:
         - name: Splittable DoFn Initiated Checkpointing
           description: ""
           values:
             - class: dataflow
               l1: "Partially"
               l2: Only Dataflow Runner v2 supports this.
               l3: ""
             - class: flink
               l1: "Partially"
               l2: Only portable Flink Runner supports this.
               l3: ""
             - class: spark-rdd
               l1:
               l2:
               l3: ""
             - class: spark-dataset
               l1:
               l2:
               l3: ""
             - class: samza
               l1:
               l2:
               l3: ""
             - class: nemo
               l1:
               l2:
               l3: ""
             - class: jet
               l1:
               l2:
               l3: ""
             - class: twister2
               l1:
               l2:
               l3: ""
             - class: python direct
               l1: "Yes"
               l2:
               l3:
             - class: go direct
               l1: "No"
               l2:
               l3:
         - name: Dynamic Splitting
           description: ""
           values:
             - class: dataflow
               l1: "Partially"
               l2: Only Dataflow Runner V2 supports this.
               l3: ""
             - class: flink
               l1: "No"
               l2:
               l3: ""
             - class: spark-rdd
               l1:
               l2:
               l3: ""
             - class: spark-dataset
               l1:
               l2:
               l3: ""
             - class: samza
               l1:
               l2:
               l3: ""
             - class: nemo
               l1:
               l2:
               l3: ""
             - class: jet
               l1:
               l2:
               l3: ""
             - class: twister2
               l1:
               l2:
               l3: ""
             - class: python direct
               l1: "Yes"
               l2: Only with Python SDK
               l3:
             - class: go direct
               l1: "No"
               l2:
               l3:
         - name: Bundle Finalization
           description: ""
           values:
             - class: dataflow
               l1: "Partially"
               l2: Only Dataflow Runner V2 supports this.
               l3: ""
             - class: flink
               l1: "No"
               l2:
               l3: ""
             - class: spark-rdd
               l1:
               l2:
               l3: ""
             - class: spark-dataset
               l1:
               l2:
               l3: ""
             - class: samza
               l1:
               l2:
               l3: ""
             - class: nemo
               l1:
               l2:
               l3: ""
             - class: jet
               l1:
               l2:
               l3: ""
             - class: twister2
               l1:
               l2:
               l3: ""
             - class: python direct
               l1: "Yes"
               l2:
               l3:
             - class: go direct
               l1: "No"
               l2:
               l3:
     - description: Unbounded Splittable DoFn Support Status
       anchor: what
       color-y: "fff"
       color-yb: "f6f6f6"
       color-p: "f9f9f9"
       color-pb: "d8d8d8"
       color-n: "e1e0e0"
       color-nb: "bcbcbc"
       rows:
         - name: Base
           description: ""
           values:
             - class: dataflow
               l1: "Yes"
               l2:
               l3: ""
             - class: flink
               l1: "Yes"
               l2:
               l3: ""
             - class: spark-rdd
               l1:
               l2:
               l3: ""
             - class: spark-dataset
               l1:
               l2:
               l3: ""
             - class: samza
               l1:
               l2:
               l3: ""
             - class: nemo
               l1:
               l2:
               l3: ""
             - class: jet
               l1:
               l2:
               l3: ""
             - class: twister2
               l1:
               l2:
               l3: ""
             - class: python direct
               l1: "Yes"
               l2:
               l3:
             - class: go direct
               l1: "No"
               l2:
               l3:
         - name: Side Inputs
           description: ""
           values:
             - class: dataflow
               l1: "Partially"
               l2: Only Dataflow Runner V2 supports this.
               l3: ""
             - class: flink
               l1: "Partially"
               l2: Only portable Flink Runner supports this.
               l3: ""
             - class: spark-rdd
               l1:
               l2:
               l3: ""
             - class: spark-dataset
               l1:
               l2:
               l3: ""
             - class: samza
               l1:
               l2:
               l3: ""
             - class: nemo
               l1:
               l2:
               l3: ""
             - class: jet
               l1:
               l2:
               l3: ""
             - class: twister2
               l1:
               l2:
               l3: ""
             - class: python direct
               l1:
               l2:
               l3:
             - class: go direct
               l1: "Yes"
               l2:
               l3:
         - name: Splittable DoFn Initiated Checkpointing
           description: ""
           values:
             - class: dataflow
               l1: "Yes"
               l2:
               l3: ""
             - class: flink
               l1: "Yes"
               l2:
               l3: ""
             - class: spark-rdd
               l1:
               l2:
               l3: ""
             - class: spark-dataset
               l1:
               l2:
               l3: ""
             - class: samza
               l1:
               l2:
               l3: ""
             - class: nemo
               l1:
               l2:
               l3: ""
             - class: jet
               l1:
               l2:
               l3: ""
             - class: twister2
               l1:
               l2:
               l3: ""
             - class: python direct
               l1: "Yes"
               l2:
               l3:
             - class: go direct
               l1: "No"
               l2:
               l3:
         - name: Dynamic Splitting
           description: ""
           values:
             - class: dataflow
               l1: "No"
               l2:
               l3: ""
             - class: flink
               l1: "No"
               l2:
               l3: ""
             - class: spark-rdd
               l1:
               l2:
               l3: ""
             - class: spark-dataset
               l1:
               l2:
               l3: ""
             - class: samza
               l1:
               l2:
               l3: ""
             - class: nemo
               l1:
               l2:
               l3: ""
             - class: jet
               l1:
               l2:
               l3: ""
             - class: twister2
               l1:
               l2:
               l3: ""
             - class: python direct
               l1: "No"
               l2:
               l3:
             - class: go direct
               l1: "No"
               l2:
               l3:
         - name: Bundle Finalization
           description: ""
           values:
             - class: dataflow
               l1: "Partially"
               l2: Only Dataflow Runner V2 supports this.
               l3: ""
             - class: flink
               l1: "Partially"
               l2: Only portable Flink Runner supports this with checkpointing enabled.
               l3: ""
             - class: spark-rdd
               l1:
               l2:
               l3: ""
             - class: spark-dataset
               l1:
               l2:
               l3: ""
             - class: samza
               l1:
               l2:
               l3: ""
             - class: nemo
               l1:
               l2:
               l3: ""
             - class: jet
               l1:
               l2:
               l3: ""
             - class: twister2
               l1:
               l2:
               l3: ""
             - class: python direct
               l1: "Yes"
               l2:
               l3:
             - class: go direct
               l1: "No"
               l2:
               l3:
     - description: Where in event time?
       anchor: where
       color-y: "fff"
       color-yb: "f6f6f6"
       color-p: "f9f9f9"
       color-pb: "d8d8d8"
       color-n: "e1e0e0"
       color-nb: "bcbcbc"
       rows:
         - name: Global windows
           description: The default window which covers all of time. (Basically how traditional batch cases fit in the model.)
           values:
             - class: dataflow
               l1: "Yes"
               l2: default
               l3: ""
             - class: flink
               l1: "Yes"
               l2: supported
               l3: ""
             - class: spark-rdd
               l1: "Yes"
               l2: supported
               l3: ""
             - class: spark-dataset
               l1: "Partially"
               l2: fully supported in batch mode
               l3: ""
             - class: samza
               l1: "Yes"
               l2: supported
               l3: ""
             - class: nemo
               l1: "Yes"
               l2: supported
               l3: ""
             - class: jet
               l1: "Yes"
               l2: supported
               l3: ""
             - class: twister2
               l1: "Yes"
               l2: supported
               l3: ""
         - name: Fixed windows
           description: Fixed-size, timestamp-based windows. (Hourly, Daily, etc)
           values:
             - class: dataflow
               l1: "Yes"
               l2: built-in
               l3: ""
             - class: flink
               l1: "Yes"
               l2: supported
               l3: ""
             - class: spark-rdd
               l1: "Yes"
               l2: supported
               l3: ""
             - class: spark-dataset
               l1: "Partially"
               l2: fully supported in batch mode
               l3: ""
             - class: samza
               l1: "Yes"
               l2: supported
               l3: ""
             - class: nemo
               l1: "Yes"
               l2: supported
               l3: ""
             - class: jet
               l1: "Yes"
               l2: supported
               l3: ""
             - class: twister2
               l1: "Yes"
               l2: supported
               l3: ""
         - name: Sliding windows
           description: Possibly overlapping fixed-size timestamp-based windows (Every minute, use the last ten minutes of data.)
           values:
             - class: dataflow
               l1: "Yes"
               l2: built-in
               l3: ""
             - class: flink
               l1: "Yes"
               l2: supported
               l3: ""
             - class: spark-rdd
               l1: "Yes"
               l2: supported
               l3: ""
             - class: spark-dataset
               l1: "Partially"
               l2: fully supported in batch mode
               l3: ""
             - class: samza
               l1: "Yes"
               l2: supported
               l3: ""
             - class: nemo
               l1: "Yes"
               l2: supported
               l3: ""
             - class: jet
               l1: "Yes"
               l2: supported
               l3: ""
             - class: twister2
               l1: "Yes"
               l2: supported
               l3: ""
         - name: Session windows
           description: Based on bursts of activity separated by a gap size. Different per key.
           values:
             - class: dataflow
               l1: "Yes"
               l2: built-in
               l3: ""
             - class: flink
               l1: "Yes"
               l2: supported
               l3: ""
             - class: spark-rdd
               l1: "Yes"
               l2: supported
               l3: ""
             - class: spark-dataset
               l1: "Partially"
               l2: fully supported in batch mode
               l3: ""
             - class: samza
               l1: "Yes"
               l2: supported
               l3: ""
             - class: nemo
               l1: "Yes"
               l2: supported
               l3: ""
             - class: jet
               l1: "Yes"
               l2: supported
               l3: ""
             - class: twister2
               l1: "Yes"
               l2: supported
               l3: ""
         - name: Custom windows
           description: All windows must implement <tt>BoundedWindow</tt>, which specifies a max timestamp. Each <tt>WindowFn</tt> assigns elements to an associated window.
           values:
             - class: dataflow
               l1: "Yes"
               l2: supported
               l3: ""
             - class: flink
               l1: "Yes"
               l2: supported
               l3: ""
             - class: spark-rdd
               l1: "Yes"
               l2: supported
               l3: ""
             - class: spark-dataset
               l1: "Partially"
               l2: fully supported in batch mode
               l3: ""
             - class: samza
               l1: "Yes"
               l2: supported
               l3: ""
             - class: nemo
               l1: "Yes"
               l2: supported
               l3: ""
             - class: jet
               l1: "Yes"
               l2: supported
               l3: ""
             - class: twister2
               l1: "Yes"
               l2: supported
               l3: ""
         - name: Custom merging windows
           description: A custom <tt>WindowFn</tt> additionally specifies whether and how to merge windows.
           values:
             - class: dataflow
               l1: "Yes"
               l2: supported
               l3: ""
             - class: flink
               l1: "Yes"
               l2: supported
               l3: ""
             - class: spark-rdd
               l1: "Yes"
               l2: supported
               l3: ""
             - class: spark-dataset
               l1: "Partially"
               l2: fully supported in batch mode
               l3: ""
             - class: samza
               l1: "Yes"
               l2: supported
               l3: ""
             - class: nemo
               l1: "Yes"
               l2: supported
               l3: ""
             - class: jet
               l1: "Yes"
               l2: supported
               l3: ""
             - class: twister2
               l1: "Yes"
               l2: supported
               l3: ""
         - name: Timestamp control
           description: For a grouping transform, such as GBK or Combine, an OutputTimeFn specifies (1) how to combine input timestamps within a window and (2) how to merge aggregated timestamps when windows merge.
           values:
             - class: dataflow
               l1: "Yes"
               l2: supported
               l3: ""
             - class: flink
               l1: "Yes"
               l2: supported
               l3: ""
             - class: spark-rdd
               l1: "Yes"
               l2: supported
               l3: ""
             - class: spark-dataset
               l1: "Partially"
               l2: fully supported in batch mode
               l3: ""
             - class: samza
               l1: "Yes"
               l2: supported
               l3: ""
             - class: nemo
               l1: "Yes"
               l2: supported
               l3: ""
             - class: jet
               l1: "Yes"
               l2: supported
               l3: ""
             - class: twister2
               l1: "Yes"
               l2: supported
               l3: ""

     - description: When in processing time?
       anchor: when
       color-y: "fff"
       color-yb: "f6f6f6"
       color-p: "f9f9f9"
       color-pb: "d8d8d8"
       color-n: "e1e0e0"
       color-nb: "bcbcbc"
       rows:
         - name: Configurable triggering
           description: Triggering may be specified by the user (instead of simply driven by hardcoded defaults).
           values:
             - class: dataflow
               l1: "Yes"
               l2: fully supported
               l3: Fully supported in streaming mode. In batch mode, intermediate trigger firings are effectively meaningless.
             - class: flink
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: spark-rdd
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: spark-dataset
               l1: "Partially"
               l2: fully supported in batch mode
               l3: ""
             - class: samza
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: nemo
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: jet
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: twister2
               l1: "Yes"
               l2: fully supported
               l3: ""

         - name: Event-time triggers
           description: Triggers that fire in response to event-time completeness signals, such as watermarks progressing.
           values:
             - class: dataflow
               l1: "Yes"
               l2: yes in streaming, fixed granularity in batch
               l3: Fully supported in streaming mode. In batch mode, currently watermark progress jumps from the beginning of time to the end of time once the input has been fully consumed, thus no additional triggering granularity is available.
             - class: flink
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: spark-rdd
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: spark-dataset
               l1: "Partially"
               l2: fully supported in batch mode
               l3: ""
             - class: samza
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: nemo
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: jet
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: twister2
               l1: "Yes"
               l2: fully supported
               l3: ""

         - name: Processing-time triggers
           description: Triggers that fire in response to processing-time advancing.
           values:
             - class: dataflow
               l1: "Yes"
               l2: yes in streaming, fixed granularity in batch
               l3: Fully supported in streaming mode. In batch mode, from the perspective of triggers, processing time currently jumps from the beginning of time to the end of time once the input has been fully consumed, thus no additional triggering granularity is available.
             - class: flink
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: spark-rdd
               l1: "Yes"
               l2: "This is Spark streaming's native model"
               l3: "Spark processes streams in micro-batches. The micro-batch size is actually a pre-set, fixed, time interval. Currently, the runner takes the first window size in the pipeline and sets it's size as the batch interval. Any following window operations will be considered processing time windows and will affect triggering."
             - class: spark-dataset
               l1: "Partially"
               l2: fully supported in batch mode
               l3:
             - class: samza
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: nemo
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: jet
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: twister2
               l1: "Yes"
               l2: fully supported
               l3: ""

         - name: Count triggers
           description: Triggers that fire after seeing at least N elements.
           values:
             - class: dataflow
               l1: "Yes"
               l2: fully supported
               l3: Fully supported in streaming mode. In batch mode, elements are processed in the largest bundles possible, so count-based triggers are effectively meaningless.
             - class: flink
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: spark-rdd
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: spark-dataset
               l1: "Partially"
               l2: fully supported in batch mode
               l3: ""
             - class: samza
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: nemo
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: jet
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: twister2
               l1: "Yes"
               l2: fully supported
               l3: ""

         - name: Composite triggers
           description: Triggers which compose other triggers in more complex structures, such as logical AND, logical OR, early/on-time/late, etc.
           values:
             - class: dataflow
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: flink
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: spark-rdd
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: spark-dataset
               l1: "Partially"
               l2: fully supported in batch mode
               l3: ""
             - class: samza
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: nemo
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: jet
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: twister2
               l1: "Partially"
               l2:
               l3: ""

         - name: Allowed lateness
           description: A way to bound the useful lifetime of a window (in event time), after which any unemitted results may be materialized, the window contents may be garbage collected, and any addtional late data that arrive for the window may be discarded.
           values:
             - class: dataflow
               l1: "Yes"
               l2: fully supported
               l3: Fully supported in streaming mode. In batch mode no data is ever late.
             - class: flink
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: spark-rdd
               l1: "No"
               l2: ""
               l3: ""
             - class: spark-dataset
               l1: "No"
               l2: no streaming support in the runner
               l3: ""
             - class: samza
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: nemo
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: jet
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: twister2
               l1: "Partially"
               l2:
               l3: ""

         - name: Timers
           description: A fine-grained mechanism for performing work at some point in the future, in either the event-time or processing-time domain. Useful for orchestrating delayed events, timeouts, etc in complex state per-key, per-window state machines.
           values:
             - class: dataflow
               l1: "Partially"
               l2: non-merging windows
               l3: Dataflow supports timers in non-merging windows.
             - class: flink
               l1: "Partially"
               l2: non-merging windows
               l3: The Flink Runner supports timers in non-merging windows.
             - class: spark-rdd
               l1: "Partially"
               l2: fully supported in batch mode
               l3: ""
             - class: spark-dataset
               l1: "No"
               l2: not implemented
               l3: ""
             - class: samza
               l1: "Partially"
               l2: non-merging windows
               l3: The Samza Runner supports timers in non-merging windows.
             - class: nemo
               l1: "No"
               l2: not implemented
               l3: ""
             - class: jet
               l1: "Partially"
               l2: non-merging windows
               l3: ""
             - class: twister2
               l1: "Partially"
               l2:
               l3: ""

     - description: How do refinements relate?
       anchor: how
       color-y: "fff"
       color-yb: "f6f6f6"
       color-p: "f9f9f9"
       color-pb: "d8d8d8"
       color-n: "e1e0e0"
       color-nb: "bcbcbc"
       rows:
         - name: Discarding
           description: Elements are discarded from accumulated state as their pane is fired.
           values:
             - class: dataflow
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: flink
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: spark-rdd
               l1: "Yes"
               l2: fully supported
               l3: "Spark streaming natively discards elements after firing."
             - class: spark-dataset
               l1: "Partially"
               l2: fully supported in batch mode
               l3: ""
             - class: samza
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: nemo
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: jet
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: twister2
               l1: "Yes"
               l2: fully supported
               l3: ""

         - name: Accumulating
           description: Elements are accumulated in state across multiple pane firings for the same window.
           values:
             - class: dataflow
               l1: "Yes"
               l2: fully supported
               l3: Requires that the accumulated pane fits in memory, after being passed through the combiner (if relevant)
             - class: flink
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: spark-rdd
               l1: "No"
               l2: ""
               l3: ""
             - class: spark-dataset
               l1: "No"
               l2: ""
               l3: ""
             - class: samza
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: nemo
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: jet
               l1: "Yes"
               l2: fully supported
               l3: ""
             - class: twister2
               l1: "Yes"
               l2: fully supported
               l3: ""

     - description: Additional common features not yet part of the Beam model
       anchor: misc
       color-y: "fff"
       color-yb: "f6f6f6"
       color-p: "f9f9f9"
       color-pb: "d8d8d8"
       color-n: "e1e0e0"
       color-nb: "bcbcbc"
       rows:
         - name: Drain
           description: APIs and semantics for draining a pipeline are under discussion. This would cause incomplete aggregations to be emitted regardless of trigger and tagged with metadata indicating it is incompleted.
           values:
             - class: dataflow
               l1: "Partially"
               l2:
               l3: Dataflow has a native drain operation, but it does not work in the presence of event time timer loops. Final implemention pending model support.
             - class: flink
               l1: "Partially"
               l2:
               l3: Flink supports taking a "savepoint" of the pipeline and shutting the pipeline down after its completion.
             - class: spark-rdd
               l1:
               l2:
               l3:
             - class: spark-dataset
               l1:
               l2:
               l3:
             - class: samza
               l1:
               l2:
               l3:
             - class: nemo
               l1:
               l2:
               l3:
             - class: jet
               l1:
               l2:
               l3:
             - class: twister2
               l1:
               l2:
               l3:
         - name: Checkpoint
           description: APIs and semantics for saving a pipeline checkpoint are under discussion. This would be a runner-specific materialization of the pipeline state required to resume or duplicate the pipeline.
           values:
             - class: dataflow
               l1: "No"
               l2:
               l3:
             - class: flink
               l1: "Partially"
               l2:
               l3: Flink has a native savepoint capability.
             - class: spark-rdd
               l1: "Partially"
               l2:
               l3: Spark has a native savepoint capability.
             - class: spark-dataset
               l1: "No"
               l2:
               l3: not implemented
             - class: samza
               l1: "Partially"
               l2:
               l3: Samza has a native checkpoint capability.
             - class: nemo
               l1:
               l2:
               l3:
             - class: jet
               l1:
               l2:
               l3:
             - class: twister2
               l1:
               l2:
               l3: