layout: section title: ‘Beam Design Documents’ section_menu: section-menu/contribute.html permalink: /contribute/design-documents/

Design Documents

This is a collection of documents that may or may not be up to date.

Documents by category

Project Incubation (2016)

  • Technical Vision [doc], [slides]
  • Repository Structure [doc]
  • Flink runner: Current status and development roadmap [doc]
  • Spark Runner Technical Vision [doc]
  • PPMC deep dive [slides]

Beam Model

  • Checkpoints [doc]
  • A New DoFn [doc], [slides]
  • Proposed Splittable DoFn API changes [doc]
  • Splittable DoFn (Obsoletes Source API) [doc]
    • Reimplementing Beam API classes on top of Splittable DoFn on top of Source API [doc]
    • New TextIO features based on SDF [doc]
    • Watch transform [doc]
    • Bundles w/ SplittableDoFns [doc]
  • State and Timers for DoFn [doc]
  • ContextFn [doc]
  • Static Display Data [doc]
  • Lateness (and Panes) in Apache Beam [doc]
  • Triggers in Apache Beam [doc]
  • Triggering is for sinks [doc] (not implemented)
  • Pipeline Drain [doc]
  • Pipelines Considered Harmful [doc]
  • Side-Channel Inputs [doc]
  • Dynamic Pipeline Options [doc]
  • SDK Support for Reading Dynamic PipelineOptions [doc]
  • Fine-grained Resource Configuration in Beam [doc]
  • External Join with KV Stores [doc]
  • Error Reporting Callback (WIP) [doc]
  • Snapshotting and Updating Beam Pipelines [doc]
  • Requiring PTransform to set a coder on its resulting collections [mail]
  • Support of @RequiresStableInput annotation [doc], [mail]
  • [PROPOSAL] @onwindowexpiration [mail]
  • AutoValue Coding and Row Support [doc]

IO / Filesystem

  • IOChannelFactory Redesign [doc]
  • Configurable BeamFileSystem [doc]
  • New API for writing files in Beam [doc]
  • Dynamic file-based sinks [doc]
  • Event Time and Watermarks in KafkaIO [doc]
  • Exactly-once Kafka sink [doc]

Metrics

  • Get Metrics API: Metric Extraction via proto RPC API. [doc]
  • Metrics API [doc]
  • I/O Metrics [doc]
  • Metrics extraction independent from runners / execution engines [doc]
  • Watermark Metrics [doc]
  • Support Dropwizard Metrics in Beam [doc]

Runners

  • Runner Authoring Guide [doc] (obsoletes [doc] and [doc])
  • Composite PInputs, POutputs, and the Runner API [doc]
  • Side Input Architecture for Apache Beam [doc]
  • Runner supported features plugin [doc]
  • Structured streaming Spark Runner [doc]

SQL / Schema

  • Streams and Tables [doc]
  • Streaming SQL [doc]
  • Schema-Aware PCollections [doc]
  • Pubsub to Beam SQL [doc]
  • Apache Beam Proposal: design of DSL SQL interface [doc]
  • Calcite/Beam SQL Windowing [doc]
  • Reject Unsupported Windowing Strategies in JOIN [doc]
  • Beam DSL_SQL branch API review [doc]
  • Complex Types Support for Beam SQL DDL [mail]
  • [SQL] Reject unsupported inputs to Joins [mail]
  • Integrating runners & IO [doc]
  • Beam SQL Pipeline Options [doc]
  • Unbounded limit [doc]
  • Portable Beam Schemas [doc]

Portability

  • Fn API
    • Apache Beam Fn API Overview [doc]
    • Processing a Bundle [doc]
    • Progress [doc]
    • Graphical view of progress [doc]
    • Fn State API and Bundle Processing [doc]
    • Checkpointing and splitting of Beam bundles over the Fn API, with application to SDF [doc]
    • How to send and receive data [doc]
    • Defining and adding SDK Metrics [doc]
    • SDK harness container contract [doc]
    • Structure and Lifting of Combines [doc]
  • Cross-language Beam Pipelines [doc]
  • SDK X with Runner Y using Runner API [doc]
  • Flink Portable Runner Overview [doc]
  • Launching portable pipeline on Flink Runner [doc]
  • Portability support [table]
  • Portability Prototype [doc]
  • Portable Artifact Staging [doc]
  • Portable Beam on Flink [doc]
  • Portability API: How to Checkpoint and Split Bundles [doc]
  • Portability API: How to Finalize Bundles [doc]
  • Side Input in Universal Reference Runner [doc]
  • Spark Portable Runner Overview [doc]
  • Cross-Language Pipelines & Legacy IO [doc]

Build / Testing

  • More Expressive PAsserts [doc]
  • Mergebot design document [doc]
  • Performance tests for commonly used file-based I/O PTransforms [doc]
  • Performance tests results analysis and basic regression detection [doc]
  • Eventual PAssert [doc]
  • Testing I/O Transforms in Apache Beam [doc]
  • Reproducible Environment for Jenkins Tests By Using Container [doc]
  • Keeping precommit times fast [doc]
  • Increase Beam post-commit tests stability [doc]
  • Beam-Site Automation Reliability [doc]
  • Managing outdated dependencies [doc]
  • Automation For Beam Dependency Check [doc]
  • Test performance of core Apache Beam operations [doc]
  • Add static code analysis quality gates to Beam [doc]

Python

  • Beam Python User State and Timer APIs [doc]
  • Python Kafka connector [doc]
  • Python 3 support [doc]
  • Splittable DoFn for Python SDK [doc]
  • Parquet IO for Python SDK [doc]
  • Building Python Wheels [doc]

Go

  • Apache Beam Go SDK design [doc]
  • Go SDK Vanity Import Path [doc]
  • Go SDK Integration Tests [doc]

Other

  • Euphoria - High-Level Java 8 DSL [doc]
  • Apache Beam Code Review Guide [doc]

Some of documents are available on this google drive

To add new design document it is recommended to use this design document template