blob: f3e44dfcf2d0c3f66fb40ce527584f3dabf08690 [file] [log] [blame]
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Apache Beam – Apache Beam Roadmap Highlights</title><link>/roadmap/</link><description>Recent content in Apache Beam Roadmap Highlights on Apache Beam</description><generator>Hugo -- gohugo.io</generator><language>en</language><atom:link href="/roadmap/index.xml" rel="self" type="application/rss+xml"/><item><title>Roadmap: Beam SQL Roadmap</title><link>/roadmap/sql/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/roadmap/sql/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="sql">SQL&lt;/h1>
&lt;p>This roadmap is in progress. In the meantime, here are available resources:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/dsls/sql/overview">Beam SQL documentation&lt;/a>&lt;/li>
&lt;li>Issues: &lt;a href="https://github.com/apache/beam/issues?q=is%3Aopen+is%3Aissue+label%3Adsl-sql">dsl-sql&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Roadmap: Connectors - Go SDK</title><link>/roadmap/connectors-go-sdk/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/roadmap/connectors-go-sdk/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;p>Roadmap for connectors developed using Go SDK.&lt;/p>
&lt;ul>
&lt;li>Go SDK plans to utilize currently available Java and Python connectors
through cross-language transforms feature.
&lt;ul>
&lt;li>KafkaIO via Java - DONE&lt;/li>
&lt;li>BigQuery via Java - In Progress&lt;/li>
&lt;li>Beam SQL via Java&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>The Go SDK supports SplittableDoFns for bounded pipelines, so scalable bounded pipelines are possible.
&lt;ul>
&lt;li>The textio package supports &lt;a href="https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam/io/textio#ReadSdf">ReadSdf&lt;/a> and &lt;a href="https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam/io/textio#ReadAllSdf">ReadAllSdf&lt;/a> for efficient batch text reads.&lt;/li>
&lt;li>A general FileIO will be produced to simplify adding new file based connectors.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul></description></item><item><title>Roadmap: Connectors - Java SDK</title><link>/roadmap/connectors-java-sdk/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/roadmap/connectors-java-sdk/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;p>Roadmap for connectors developed using Java SDK.&lt;/p>
&lt;h1 id="couchbase">Couchbase&lt;/h1>
&lt;p>Couchbase is a NoSQL document-oriented database. See
&lt;a href="https://github.com/apache/beam/issues/18381">Issue 18381&lt;/a> for more details on the
planned Beam connector for Couchbase.&lt;/p>
&lt;h1 id="influxdb">InfluxDB&lt;/h1>
&lt;p>InfluxDB is a database for fast and highly available storage and retrieval
of time series data. See &lt;a href="https://issues.apache.org/jira/browse/BEAM-2546">BEAM-2546&lt;/a> for
more details on the planned Beam connector for InfluxDB.&lt;/p>
&lt;h1 id="memcached">Memcached&lt;/h1>
&lt;p>Memcached is a distributed memory caching system. See
&lt;a href="https://issues.apache.org/jira/browse/BEAM-1678">BEAM-1678&lt;/a> for more details on the
planned Beam connector for Memcached.&lt;/p></description></item><item><title>Roadmap: Connectors - Python SDK</title><link>/roadmap/connectors-python-sdk/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/roadmap/connectors-python-sdk/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;p>Roadmap for connectors developed using Python SDK.&lt;/p>
&lt;h1 id="kafka">Kafka&lt;/h1>
&lt;p>An Apache Kafka connectors for Python SDK that is fully developed using
Splittable DoFn API is planned. This is partially blocked till
Splittable DoFn work related to portability framework is finalized.
See &lt;a href="https://issues.apache.org/jira/browse/BEAM-3788">BEAM-3788&lt;/a> for more details.&lt;/p>
&lt;h1 id="parquet">Parquet&lt;/h1>
&lt;p>A Python connector for Parquet file format is currently in development.
See &lt;a href="https://issues.apache.org/jira/browse/BEAM-4444">BEAM-4444&lt;/a> for more details.&lt;/p></description></item><item><title>Roadmap: Euphoria API Roadmap</title><link>/roadmap/euphoria/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/roadmap/euphoria/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="euphoria-api">Euphoria API&lt;/h1>
&lt;p>Easy to use Java 8 DSL for the Beam Java SDK. Provides a high-level abstraction of Beam transformations, which is both easy to read and write. Can be used as a complement to existing Beam pipelines (convertible back and forth). You can have a glimpse of the API at &lt;a href="/documentation/sdks/java/euphoria/#wordcount-example">WordCount example&lt;/a>.&lt;/p>
&lt;ul>
&lt;li>Issues: &lt;a href="https://github.com/apache/beam/issues?q=is%3Aopen+is%3Aissue+label%3Adsl-euphoria">dsl-euphoria&lt;/a> / &lt;a href="https://issues.apache.org/jira/browse/BEAM-3900">BEAM-3900&lt;/a>&lt;/li>
&lt;li>Contact: &lt;a href="mailto:dmvk@apache.org">David Moravek&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="salted-join-implementation">&amp;ldquo;Salted&amp;rdquo; join implementation&lt;/h2>
&lt;p>Implementation of a join, that can handle large scale join of highly skewed data sets. This implementation breaks
the large keys into multiple splits, using key distribution approximated by count min sketch data structure.&lt;/p>
&lt;h2 id="pipeline-sampling">Pipeline sampling&lt;/h2>
&lt;p>In order to pick the right translation for the operator without user interference, we can leverage knowledge from
previous pipeline runs. We want to provide a convenient and portable way to gather this knowledge.&lt;/p>
&lt;h2 id="fluent-api">Fluent API&lt;/h2>
&lt;p>Implementation of an easy to use Fluent API on top of Euphoria DSL.&lt;/p>
&lt;h2 id="side-outputs">Side Outputs&lt;/h2>
&lt;p>An convenient API for multiple outputs.&lt;/p>
&lt;h2 id="table-stream-joins">Table-stream joins&lt;/h2>
&lt;p>Introduce API for converting streams to tables (KStream &amp;lt;-&amp;gt; KTable approach) and various types of (windowed and unwindowed) joins on them.&lt;/p></description></item><item><title>Roadmap: Flink Runner Roadmap</title><link>/roadmap/dataflow-runner/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/roadmap/dataflow-runner/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="google-cloud-dataflow-runner-roadmap">Google Cloud Dataflow Runner Roadmap&lt;/h1>
&lt;p>This roadmap is in progress. In the meantime, here are available resources:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/runners/dataflow">Runner documentation&lt;/a>&lt;/li>
&lt;li>Issues: &lt;a href="https://github.com/apache/beam/issues?q=is%3Aopen+is%3Aissue+label%3Arunner-dataflow">runner-dataflow&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Roadmap: Flink Runner Roadmap</title><link>/roadmap/flink-runner/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/roadmap/flink-runner/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="apache-flink-runner-roadmap">Apache Flink Runner Roadmap&lt;/h1>
&lt;p>This roadmap is in progress. In the meantime, here are available resources:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/runners/flink">Runner documentation&lt;/a>&lt;/li>
&lt;li>Issues: &lt;a href="https://github.com/apache/beam/issues?q=is%3Aopen+is%3Aissue+label%3Arunner-flink">runner-flink&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Roadmap: Go SDK Roadmap</title><link>/roadmap/go-sdk/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/roadmap/go-sdk/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="go-sdk-roadmap">Go SDK Roadmap&lt;/h1>
&lt;p>The Go SDK is &lt;a href="/blog/go-sdk-release/">fully released as of v2.33.0&lt;/a>.&lt;/p>
&lt;p>The Go SDK the first SDK purely on the &lt;a href="/roadmap/portability/">Beam Portability Framework&lt;/a>
and can execute pipelines on portable runners, like Flink, Spark, Samza, and Google Cloud Dataflow.&lt;/p>
&lt;p>Current roadmap:&lt;/p>
&lt;ul>
&lt;li>continue building up unbounded pipeline facing features, as described on the &lt;a href="https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK">Beam Dev Wiki&lt;/a>.&lt;/li>
&lt;li>improve IO support via cross language transforms, and add scalable native transforms. &lt;a href="/roadmap/connectors-go-sdk/">Go SDK Connector Roadmap&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Otherwise, improving examples and documentation for devs and users alike is ongoing.
Contributions are welcome. Please contact the &lt;a href="mailto:dev@beam.apache.org?subject=%5BGo%20SDK%5D%20How%20can%20I%20help%3F">dev list&lt;/a>
for assistance in finding a place to help out.&lt;/p>
&lt;ul>
&lt;li>Issues: &lt;a href="https://github.com/apache/beam/issues?q=is%3Aopen+is%3Aissue+label%3Asdk-go">sdk-go&lt;/a>&lt;/li>
&lt;li>Contact: Robert Burke (@lostluck) &lt;a href="mailto:lostluck@apache.org?subject=%5BGo%20SDK%20Roadmap%5D">Email&lt;/a> - Please also cc the &lt;a href="mailto:dev@beam.apache.org">dev@beam.apache.org&lt;/a> list. I strongly prefer public discussion of Go SDK matters.&lt;/li>
&lt;/ul></description></item><item><title>Roadmap: Java SDK Roadmap</title><link>/roadmap/java-sdk/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/roadmap/java-sdk/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="java-sdk-roadmap">Java SDK Roadmap&lt;/h1>
&lt;h2 id="next-java-lts-version-support-java-21">Next Java LTS version support (Java 21)&lt;/h2>
&lt;p>Work to support the next LTS release of Java is in progress. For more details
about the scope and info on the various tasks please see the GitHub Issue.&lt;/p>
&lt;ul>
&lt;li>GitHub: &lt;a href="https://github.com/apache/beam/issues/28120">#28120&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Roadmap: Multi-SDK Connector Efforts</title><link>/roadmap/connectors-multi-sdk/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/roadmap/connectors-multi-sdk/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;p>Connector-related efforts that will benefit multiple SDKs.&lt;/p>
&lt;h1 id="splittable-dofn">Splittable DoFn&lt;/h1>
&lt;p>Splittable DoFn is the next generation sources framework for Beam that will
replace current frameworks for developing bounded and unbounded sources.
Splittable DoFn is being developed along side current Beam portability
efforts. See &lt;a href="/roadmap/portability/">Beam portability framework roadmap&lt;/a> for more details.&lt;/p>
&lt;h1 id="cross-language-transforms">Cross-language transforms&lt;/h1>
&lt;p>&lt;em>Last updated on May 2020.&lt;/em>&lt;/p>
&lt;p>As an added benefit of Beam portability effort, we are able to utilize Beam transforms across SDKs. This has many benefits.&lt;/p>
&lt;ul>
&lt;li>Connector sharing across SDKs. For example,
&lt;ul>
&lt;li>Beam pipelines written using Python and Go SDKs will be able to utilize the vast selection of connectors that are currently implemented for Java SDK.&lt;/li>
&lt;li>Java SDK will be able to utilize connectors for systems that only offer a Python API.&lt;/li>
&lt;li>Go SDK, will be able to utilize connectors currently available for Java and Python SDKs.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Ease of developing and maintaining Beam transforms - in general, with cross-language transforms, Beam transform authors will be able to implement new Beam transforms using a
language of choice and utilize these transforms from other languages reducing the maintenance and support overheads.&lt;/li>
&lt;li>&lt;a href="/documentation/dsls/sql/overview/">Beam SQL&lt;/a>, that is currently only available to Java SDK, will become available to Python and Go SDKs.&lt;/li>
&lt;li>&lt;a href="https://www.tensorflow.org/tfx/transform/get_started">Beam TFX transforms&lt;/a>, that are currently only available to Beam Python SDK pipelines will become available to Java and Go SDKs.&lt;/li>
&lt;/ul>
&lt;h2 id="completed-and-ongoing-efforts">Completed and Ongoing Efforts&lt;/h2>
&lt;p>Many efforts related to cross-language transforms are currently in flux. Some of the completed and ongoing efforts are given below.&lt;/p>
&lt;h3 id="cross-language-transforms-api-and-expansion-service">Cross-language transforms API and expansion service&lt;/h3>
&lt;p>Work related to developing/updating the cross-language transforms API for Java/Python/Go SDKs and work related to cross-language transform expansion services.&lt;/p>
&lt;ul>
&lt;li>Basic API for Java SDK - completed&lt;/li>
&lt;li>Basic API for Python SDK - completed&lt;/li>
&lt;li>Basic API for Go SDK - In progress&lt;/li>
&lt;li>Basic cross-language transform expansion service for Java and Python SDKs - completed&lt;/li>
&lt;li>Artifact staging - mostly completed - &lt;a href="https://lists.apache.org/thread.html/6fcee7047f53cf1c0636fb65367ef70842016d57effe2e5795c4137d@%3Cdev.beam.apache.org%3E">email thread&lt;/a>, &lt;a href="https://docs.google.com/document/d/1XaiNekAY2sptuQRIXpjGAyaYdSc-wlJ-VKjl04c8N48/edit#heading=h.900gc947qrw8">doc&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="support-for-flink-runner">Support for Flink runner&lt;/h3>
&lt;p>Work related to making cross-language transforms available for Flink runner.&lt;/p>
&lt;ul>
&lt;li>Basic support for executing cross-language transforms on portable Flink runner - completed&lt;/li>
&lt;/ul>
&lt;h3 id="support-for-dataflow-runner">Support for Dataflow runner&lt;/h3>
&lt;p>Work related to making cross-language transforms available for Dataflow runner.&lt;/p>
&lt;ul>
&lt;li>Basic support for executing cross-language transforms on Dataflow runner
&lt;ul>
&lt;li>This work requires updates to Dataflow service&amp;rsquo;s job submission and job execution logic. This is currently being developed at Google.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="support-for-direct-runner">Support for Direct runner&lt;/h3>
&lt;p>Work related to making cross-language transforms available on Direct runner&lt;/p>
&lt;ul>
&lt;li>Basic support for executing cross-language transforms on Pyton Direct runner - completed&lt;/li>
&lt;li>Basic support for executing cross-language transforms on Java Direct runner - Not started&lt;/li>
&lt;/ul>
&lt;h3 id="connectortransform-support">Connector/transform support&lt;/h3>
&lt;p>Ongoing and planned work related to making existing connectors/transforms available to other SDKs through the cross-language transforms framework.&lt;/p>
&lt;ul>
&lt;li>Java JdbcIO - completed - &lt;a href="https://issues.apache.org/jira/browse/BEAM-10135">BEAM-10135&lt;/a>, &lt;a href="https://issues.apache.org/jira/browse/BEAM-10136">BEAM-10136&lt;/a>&lt;/li>
&lt;li>Java KafkaIO - completed - &lt;a href="https://issues.apache.org/jira/browse/BEAM-7029">BEAM-7029&lt;/a>&lt;/li>
&lt;li>Java KinesisIO - completed - &lt;a href="https://issues.apache.org/jira/browse/BEAM-10137">BEAM-10137&lt;/a>, &lt;a href="https://issues.apache.org/jira/browse/BEAM-10138">BEAM-10138&lt;/a>&lt;/li>
&lt;li>Java PubSubIO - In progress - &lt;a href="https://issues.apache.org/jira/browse/BEAM-7738">BEAM-7738&lt;/a>&lt;/li>
&lt;li>Java SnowflakeIO - completed - &lt;a href="https://issues.apache.org/jira/browse/BEAM-9897">BEAM-9897&lt;/a>, &lt;a href="https://issues.apache.org/jira/browse/BEAM-9898">BEAM-9898&lt;/a>&lt;/li>
&lt;li>Java SpannerIO - In progress - &lt;a href="https://issues.apache.org/jira/browse/BEAM-10139">BEAM-10139&lt;/a>, &lt;a href="https://issues.apache.org/jira/browse/BEAM-10140">BEAM-10140&lt;/a>&lt;/li>
&lt;li>Java SQL - completed - &lt;a href="https://issues.apache.org/jira/browse/BEAM-8603">BEAM-8603&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="portable-beam-schema">Portable Beam schema&lt;/h3>
&lt;p>Portable Beam schema support will provide a generalized mechanism for serializing and transferring data across language boundaries which will be extremely useful for pipelines that employ cross-language transforms.&lt;/p>
&lt;ul>
&lt;li>Make row coder a standard coder and implement in python - completed - &lt;a href="https://issues.apache.org/jira/browse/BEAM-7886">BEAM-7886&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="integrationperformance-testing">Integration/Performance testing&lt;/h3>
&lt;ul>
&lt;li>Add an integration test suite for cross-language transforms on Flink runner - In progress - &lt;a href="https://issues.apache.org/jira/browse/BEAM-6683">BEAM-6683&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="documentation">Documentation&lt;/h3>
&lt;p>Work related to adding documenting on cross-language transforms to Beam Website.&lt;/p>
&lt;ul>
&lt;li>Document cross-language transforms API for Java/Python - Not started&lt;/li>
&lt;li>Document API for making existing transforms available as cross-language transforms for Java/Python - Not started&lt;/li>
&lt;/ul></description></item><item><title>Roadmap: Nemo Runner Roadmap</title><link>/roadmap/nemo-runner/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/roadmap/nemo-runner/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="apache-nemo-runner-roadmap">Apache Nemo Runner Roadmap&lt;/h1>
&lt;p>This roadmap is in progress. In the meantime, here are available resources:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/runners/nemo">Runner documentation&lt;/a>&lt;/li>
&lt;li>JIRA: &lt;a href="https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20component%20%3D%20runner-nemo">runner-nemo&lt;/a> / &lt;a href="https://issues.apache.org/jira/projects/NEMO/issues/filter=allopenissues">nemo-jira&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Roadmap: Portability Framework Roadmap</title><link>/roadmap/portability/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/roadmap/portability/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="portability-framework-roadmap">Portability Framework Roadmap&lt;/h1>
&lt;h2 id="overview">Overview&lt;/h2>
&lt;p>Interoperability between SDKs and runners is a key aspect of Apache
Beam. Previously, the reality was that most runners supported the
Java SDK only, because each SDK-runner combination required non-trivial
work on both sides. Most runners are also currently written in Java,
which makes support of non-Java SDKs far more expensive. The
&lt;em>portability framework&lt;/em> rectified this situation and provided
full interoperability across the Beam ecosystem.&lt;/p>
&lt;p>The portability framework introduces well-defined, language-neutral
data structures and protocols between the SDK and runner. This interop
layer &amp;ndash; called the &lt;em>portability API&lt;/em> &amp;ndash; ensures that SDKs and runners
can work with each other uniformly, reducing the interoperability
burden for both SDKs and runners to a constant effort. It notably
ensures that &lt;em>new&lt;/em> SDKs automatically work with existing runners and
vice versa. The framework introduces a new runner, the &lt;em>Universal
Local Runner (ULR)&lt;/em>, as a practical reference implementation that
complements the direct runners. Finally, it enables cross-language
pipelines (sharing I/O or transformations across SDKs) and
user-customized &lt;a href="/documentation/runtime/environments/">execution environments&lt;/a>
(&amp;ldquo;custom containers&amp;rdquo;).&lt;/p>
&lt;p>The portability API consists of a set of smaller contracts that
isolate SDKs and runners for job submission, management and
execution. These contracts use protobufs and &lt;a href="https://grpc.io">gRPC&lt;/a> for broad language
support.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Job submission and management&lt;/strong>: The &lt;em>Runner API&lt;/em> defines a
language-neutral pipeline representation with transformations
specifying the execution environment as a docker container
image. The latter both allows the execution side to set up the
right environment as well as opens the door for custom containers
and cross-environment pipelines. The &lt;em>Job API&lt;/em> allows pipeline
execution and configuration to be managed uniformly.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Job execution&lt;/strong>: The &lt;em>SDK harness&lt;/em> is a SDK-provided
program responsible for executing user code and is run separately
from the runner. The &lt;em>Fn API&lt;/em> defines an execution-time binary
contract between the SDK harness and the runner that describes how
execution tasks are managed and how data is transferred. In
addition, the runner needs to handle progress and monitoring in an
efficient and language-neutral way. SDK harness initialization
relies on the &lt;em>Provision&lt;/em> and &lt;em>Artifact APIs&lt;/em> for obtaining staged
files, pipeline options and environment information. Docker
provides isolation between the runner and SDK/user environments to
the benefit of both as defined by the &lt;em>container contract&lt;/em>. The
containerization of the SDK gives it (and the user, unless the SDK
is closed) full control over its own environment without risk of
dependency conflicts. The runner has significant freedom regarding
how it manages the SDK harness containers.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>The goal is that all (non-direct) runners and SDKs eventually support
the portability API, perhaps exclusively.&lt;/p>
&lt;p>If you are interested in digging in to the designs, you can find
them on the &lt;a href="https://cwiki.apache.org/confluence/display/BEAM/Design+Documents">Beam developers&amp;rsquo; wiki&lt;/a>.
Another overview can be found &lt;a href="https://docs.google.com/presentation/d/1Yg8Xm4fb-oRjiLQjwLt5153hpwwTLclZrVOKP2hQifo/edit#slide=id.g42e4c9aad6_1_3070">here&lt;/a>.&lt;/p>
&lt;h2 id="status">Status&lt;/h2>
&lt;p>All SDKs currently support the portability framework.
There is also a Python Universal Local Runner and shared java runners library.
Performance is good and multi-language pipelines are supported.
Currently, the Flink and Spark runners support portable pipeline execution
(which is used by default for SDKs other than Java),
as does Dataflow when using the &lt;a href="https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline#dataflow-runner-v2">Dataflow Runner v2&lt;/a>.
See the
&lt;a href="https://s.apache.org/apache-beam-portability-support-table">Portability support table&lt;/a>
for details.&lt;/p>
&lt;h2 id="issues">Issues&lt;/h2>
&lt;p>The portability effort touches every component, so the &amp;ldquo;portability&amp;rdquo;
label is used to identify all portability-related issues. Pure
design or proto definitions should use the &amp;ldquo;beam-model&amp;rdquo; component. A
common pattern for new portability features is that the overall
feature is in &amp;ldquo;beam-model&amp;rdquo; with subtasks for each SDK and runner in
their respective components.&lt;/p>
&lt;p>&lt;strong>Issues:&lt;/strong> &lt;a href="https://github.com/apache/beam/issues?q=is%3Aopen+is%3Aissue+label%3Aportability">query&lt;/a>&lt;/p>
&lt;p>Prerequisites: &lt;a href="https://docs.docker.com/compose/install/">Docker&lt;/a>, &lt;a href="https://docs.python-guide.org/starting/install3/linux/">Python&lt;/a>, &lt;a href="https://openjdk.java.net/install/">Java 8&lt;/a>&lt;/p>
&lt;h3 id="python-on-flink">Running Python wordcount on Flink&lt;/h3>
&lt;p>The Beam Flink runner can run Python pipelines in batch and streaming modes.
Please see the &lt;a href="/documentation/runners/flink/">Flink Runner page&lt;/a> for more information on
how to run portable pipelines on top of Flink.&lt;/p>
&lt;h3 id="python-on-spark">Running Python wordcount on Spark&lt;/h3>
&lt;p>The Beam Spark runner can run Python pipelines in batch mode.
Please see the &lt;a href="/documentation/runners/spark/">Spark Runner page&lt;/a> for more information on
how to run portable pipelines on top of Spark.&lt;/p>
&lt;p>Python streaming mode is not yet supported on Spark.&lt;/p>
&lt;h2 id="sdk-harness-config">SDK Harness Configuration&lt;/h2>
&lt;p>See &lt;a href="/documentation/runtime/sdk-harness-config/">here&lt;/a> for more information on SDK harness deployment options
and &lt;a href="https://docs.google.com/presentation/d/1Cso0XP9dmj77OD9Bd53C1M3W1sPJF0ZnA20gzb2BPhE/edit?usp=sharing">here&lt;/a>
for what goes into writing a portable SDK.&lt;/p></description></item><item><title>Roadmap: Python SDK Roadmap</title><link>/roadmap/python-sdk/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/roadmap/python-sdk/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="python-sdk-roadmap">Python SDK Roadmap&lt;/h1>
&lt;h2 id="sunsetting-python-2-support">Sunsetting Python 2 Support&lt;/h2>
&lt;p>The Apache Beam community voted to join &lt;a href="https://python3statement.org/">the pledge&lt;/a> to sunset Python 2 support in 2020. The Beam community will discontinue Python 2 support in 2020 and cannot guarantee long-term functional support or maintenance of Beam on Python 2. To ensure minimal disruption to your service, we strongly recommend that you upgrade to Python 3 as soon as possible.&lt;/p>
&lt;h2 id="python-3-support">Python 3 Support&lt;/h2>
&lt;p>Apache Beam 2.14.0 and higher support Python 3.5, 3.6, and 3.7. We&amp;rsquo;re continuing to &lt;a href="https://issues.apache.org/jira/browse/BEAM-1251?focusedCommentId=16890504&amp;amp;page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-1689050">improve&lt;/a> the experience for Python 3 users and add support for Python 3.x minor versions (&lt;a href="https://issues.apache.org/jira/browse/BEAM-8494">BEAM-8494&lt;/a>):&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245&amp;amp;view=detail">Kanban Board&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://docs.google.com/document/d/1s1BJVCY65LB_SYK1SU1u7NbZiFANoq-nEYaEvzRbYlA">Python 3 Conversion Quick Start Guide&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://issues.apache.org/jira/browse/BEAM-1251">Tracking Issue&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE">Original Proposal&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Contributions and feedback are welcome!&lt;/p>
&lt;p>If you are interested in helping, you can select an unassigned issue on the Kanban board and assign it to yourself. If you cannot assign the issue to yourself, comment on the issue. When submitting a new PR, please tag &lt;a href="https://github.com/aaltay">@aaltay&lt;/a>, and &lt;a href="https://github.com/tvalentyn">@tvalentyn&lt;/a>.&lt;/p>
&lt;p>To report a Python 3 related issue, create a subtask in &lt;a href="https://issues.apache.org/jira/browse/BEAM-1251">BEAM-1251&lt;/a> and cc: [~altay] and [~tvalentyn] in a JIRA comment. The best way to help us identify and investigate the issue is with a minimal pipeline that reproduces the issue.&lt;/p>
&lt;p>You can also discuss encountered issues on user@ or dev@ mailing lists as appropriate.&lt;/p></description></item><item><title>Roadmap: Samza Runner Roadmap</title><link>/roadmap/samza-runner/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/roadmap/samza-runner/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="samza-runner-roadmap">Samza Runner Roadmap&lt;/h1>
&lt;p>This roadmap is in progress. In the meantime, here are available resources:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/runners/samza">Runner documentation&lt;/a>&lt;/li>
&lt;li>Issues: &lt;a href="https://github.com/apache/beam/issues?q=is%3Aopen+is%3Aissue+label%3Arunner-samza">runner-samza&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Roadmap: Spark Runner Roadmap</title><link>/roadmap/spark-runner/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/roadmap/spark-runner/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="apache-spark-runner-roadmap">Apache Spark Runner Roadmap&lt;/h1>
&lt;h2 id="spark-3">Spark 3&lt;/h2>
&lt;p>Support for Spark 3 in Beam&amp;rsquo;s Spark runner is ongoing. For info on the various
tasks please refer to the JIRA ticket.&lt;/p>
&lt;ul>
&lt;li>JIRA: &lt;a href="https://issues.apache.org/jira/browse/BEAM-7093">BEAM-7093&lt;/a>&lt;/li>
&lt;li>Contact: &lt;a href="mailto:iemejia@apache.org">Ismaël Mejía&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Roadmap: Twister2 Runner Roadmap</title><link>/roadmap/twister2-runner/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/roadmap/twister2-runner/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="twister2-runner-roadmap">Twister2 Runner Roadmap&lt;/h1>
&lt;p>This roadmap is in progress. In the meantime, here are available resources:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/runners/twister2">Runner documentation&lt;/a>&lt;/li>
&lt;li>Issues: &lt;a href="https://github.com/apache/beam/issues?q=is%3Aopen+is%3Aissue+label%3Arunner-twister2">runner-twister2&lt;/a>&lt;/li>
&lt;/ul></description></item></channel></rss>