blob: c4ada7fd500993f018f36966b0a7ff38a89f80a1 [file] [log] [blame]
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Apache Beam – python</title><link>/categories/python/</link><description>Recent content in python on Apache Beam</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Wed, 09 Nov 2022 00:00:01 -0800</lastBuildDate><atom:link href="/categories/python/index.xml" rel="self" type="application/rss+xml"/><item><title>Blog: New Resources Available for Beam ML</title><link>/blog/ml-resources/</link><pubDate>Wed, 09 Nov 2022 00:00:01 -0800</pubDate><guid>/blog/ml-resources/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;p>If you&amp;rsquo;ve been paying attention, over the past year you&amp;rsquo;ve noticed that
Beam has released a number of features designed to make Machine Learning
easy. Ranging from things like the introduction of the &lt;code>RunInference&lt;/code>
transform to the continued refining of &lt;code>Beam Dataframes&lt;/code>, this has been
an area where we&amp;rsquo;ve seen Beam make huge strides. While development has
advanced quickly, however, until recently there has been a lack of
resources to help people discover and use these new features.&lt;/p>
&lt;p>Over the past several months, we&amp;rsquo;ve been hard at work building out
documentation and notebooks to make it easier to use these new features
and to show how Beam can be used to solve common Machine Learning problems.
We&amp;rsquo;re now happy to present this new and improved Beam ML experience!&lt;/p>
&lt;p>To get started, we encourage you to visit Beam&amp;rsquo;s new &lt;a href="/documentation/ml/overview/">AI/ML landing page&lt;/a>.
We&amp;rsquo;ve got plenty of content on things like &lt;a href="/documentation/ml/multi-model-pipelines/">multi-model pipelines&lt;/a>,
&lt;a href="/documentation/ml/runinference-metrics/">performing inference with metrics&lt;/a>,
&lt;a href="/documentation/ml/online-clustering/">online training&lt;/a>, and much more.&lt;/p>
&lt;p>&lt;img class="center-block"
src="/images/blog/ml-landing.png"
alt="ML landing page">&lt;/p>
&lt;p>We&amp;rsquo;ve also introduced a number of example &lt;a href="https://github.com/apache/beam/tree/master/examples/notebooks/beam-ml">Jupyter Notebooks&lt;/a>
showing how to use built in beam transforms like &lt;code>RunInference&lt;/code> and &lt;code>Beam Dataframes&lt;/code>.&lt;/p>
&lt;p>&lt;img class="center-block"
src="/images/blog/ensemble-model-notebook.png"
alt="Example ensemble notebook with RunInference">&lt;/p>
&lt;p>Adding more examples and notebooks will be a point of emphasis going forward.
For our next round of improvements, we are planning on adding examples of
using RunInference with &amp;gt;30GB models, with multi-language pipelines, with
common Beam concepts, and with TensorRT. We will also add examples showing
other pieces of the Machine Learning lifecycle like model evaluation with TFMA,
per-entity training, and more online training.&lt;/p>
&lt;p>We hope you find this useful! As always, if you see any areas for improvement, please &lt;a href="https://github.com/apache/beam/issues/new/choose">open an issue&lt;/a>
or a &lt;a href="https://github.com/apache/beam/pulls">pull request&lt;/a>!&lt;/p></description></item><item><title>Blog: Improved Annotation Support for the Python SDK</title><link>/blog/python-improved-annotations/</link><pubDate>Fri, 21 Aug 2020 00:00:01 -0800</pubDate><guid>/blog/python-improved-annotations/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;p>The importance of static type checking in a dynamically
typed language like Python is not up for debate. Type hints
allow developers to leverage a strong typing system to:&lt;/p>
&lt;ul>
&lt;li>write better code,&lt;/li>
&lt;li>self-document ambiguous programming logic, and&lt;/li>
&lt;li>inform intelligent code completion in IDEs like PyCharm.&lt;/li>
&lt;/ul>
&lt;p>This is why we&amp;rsquo;re excited to announce upcoming improvements to
the &lt;code>typehints&lt;/code> module of Beam&amp;rsquo;s Python SDK, including support
for typed PCollections and Python 3 style annotations on PTransforms.&lt;/p>
&lt;h1 id="improved-annotations">Improved Annotations&lt;/h1>
&lt;p>Today, you have the option to declare type hints on PTransforms using either
class decorators or inline functions.&lt;/p>
&lt;p>For instance, a PTransform with decorated type hints might look like this:&lt;/p>
&lt;pre tabindex="0">&lt;code>@beam.typehints.with_input_types(int)
@beam.typehints.with_output_types(str)
class IntToStr(beam.PTransform):
def expand(self, pcoll):
return pcoll | beam.Map(lambda num: str(num))
strings = numbers | beam.ParDo(IntToStr())
&lt;/code>&lt;/pre>&lt;p>Using inline functions instead, the same transform would look like this:&lt;/p>
&lt;pre tabindex="0">&lt;code>class IntToStr(beam.PTransform):
def expand(self, pcoll):
return pcoll | beam.Map(lambda num: str(num))
strings = numbers | beam.ParDo(IntToStr()).with_input_types(int).with_output_types(str)
&lt;/code>&lt;/pre>&lt;p>Both methods have problems. Class decorators are syntax-heavy,
requiring two additional lines of code, whereas inline functions provide type hints
that aren&amp;rsquo;t reusable across other instances of the same transform. Additionally, both
methods are incompatible with static type checkers like MyPy.&lt;/p>
&lt;p>With Python 3 annotations however, we can subvert these problems to provide a
clean and reusable type hint experience. Our previous transform now looks like this:&lt;/p>
&lt;pre tabindex="0">&lt;code>class IntToStr(beam.PTransform):
def expand(self, pcoll: PCollection[int]) -&amp;gt; PCollection[str]:
return pcoll | beam.Map(lambda num: str(num))
strings = numbers | beam.ParDo(IntToStr())
&lt;/code>&lt;/pre>&lt;p>These type hints will actively hook into the internal Beam typing system to
play a role in pipeline type checking, and runtime type checking.&lt;/p>
&lt;p>So how does this work?&lt;/p>
&lt;h2 id="typed-pcollections">Typed PCollections&lt;/h2>
&lt;p>You guessed it! The PCollection class inherits from &lt;code>typing.Generic&lt;/code>, allowing it to be
parameterized with either zero types (denoted &lt;code>PCollection&lt;/code>) or one type (denoted &lt;code>PCollection[T]&lt;/code>).&lt;/p>
&lt;ul>
&lt;li>A PCollection with zero types is implicitly converted to &lt;code>PCollection[Any]&lt;/code>.&lt;/li>
&lt;li>A PCollection with one type can have any nested type (e.g. &lt;code>Union[int, str]&lt;/code>).&lt;/li>
&lt;/ul>
&lt;p>Internally, Beam&amp;rsquo;s typing system makes these annotations compatible with other
type hints by removing the outer PCollection container.&lt;/p>
&lt;h2 id="pbegin-pdone-none">PBegin, PDone, None&lt;/h2>
&lt;p>Finally, besides PCollection, a valid annotation on the &lt;code>expand(...)&lt;/code> method of a PTransform is
&lt;code>PBegin&lt;/code> or &lt;code>None&lt;/code>. These are generally used for PTransforms that begin or end with an I/O operation.&lt;/p>
&lt;p>For instance, when saving data, your transform&amp;rsquo;s output type should be &lt;code>None&lt;/code>.&lt;/p>
&lt;pre tabindex="0">&lt;code>class SaveResults(beam.PTransform):
def expand(self, pcoll: PCollection[str]) -&amp;gt; None:
return pcoll | beam.io.WriteToBigQuery(...)
&lt;/code>&lt;/pre>&lt;h1 id="next-steps">Next Steps&lt;/h1>
&lt;p>What are you waiting for.. start using annotations on your transforms!&lt;/p>
&lt;p>For more background on type hints in Python, see:
&lt;a href="/documentation/sdks/python-type-safety/">Ensuring Python Type Safety&lt;/a>.&lt;/p>
&lt;p>Finally, please
&lt;a href="/community/contact-us/">let us know&lt;/a>
if you encounter any issues.&lt;/p></description></item><item><title>Blog: Performance-Driven Runtime Type Checking for the Python SDK</title><link>/blog/python-performance-runtime-type-checking/</link><pubDate>Fri, 21 Aug 2020 00:00:01 -0800</pubDate><guid>/blog/python-performance-runtime-type-checking/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;p>In this blog post, we&amp;rsquo;re announcing the upcoming release of a new, opt-in
runtime type checking system for Beam&amp;rsquo;s Python SDK that&amp;rsquo;s optimized for performance
in both development and production environments.&lt;/p>
&lt;p>But let&amp;rsquo;s take a step back - why do we even care about runtime type checking
in the first place? Let&amp;rsquo;s look at an example.&lt;/p>
&lt;pre tabindex="0">&lt;code>class MultiplyNumberByTwo(beam.DoFn):
def process(self, element: int):
return element * 2
p = Pipeline()
p | beam.Create([&amp;#39;1&amp;#39;, &amp;#39;2&amp;#39;] | beam.ParDo(MultiplyNumberByTwo())
&lt;/code>&lt;/pre>&lt;p>In this code, we passed a list of strings to a DoFn that&amp;rsquo;s clearly intended for use with
integers. Luckily, this code will throw an error during pipeline construction because
the inferred output type of &lt;code>beam.Create(['1', '2'])&lt;/code> is &lt;code>str&lt;/code> which is incompatible with
the declared input type of &lt;code>MultiplyNumberByTwo.process&lt;/code> which is &lt;code>int&lt;/code>.&lt;/p>
&lt;p>However, what if we turned pipeline type checking off using the &lt;code>no_pipeline_type_check&lt;/code>
flag? Or more realistically, what if the input PCollection to &lt;code>MultiplyNumberByTwo&lt;/code> arrived
from a database, meaning that the output data type can only be known at runtime?&lt;/p>
&lt;p>In either case, no error would be thrown during pipeline construction.
And even at runtime, this code works. Each string would be multiplied by 2,
yielding a result of &lt;code>['11', '22']&lt;/code>, but that&amp;rsquo;s certainly not the outcome we want.&lt;/p>
&lt;p>So how do you debug this breed of &amp;ldquo;hidden&amp;rdquo; errors? More broadly speaking, how do you debug
any typing or serialization error in Beam?&lt;/p>
&lt;p>The answer is to use runtime type checking.&lt;/p>
&lt;h1 id="runtime-type-checking-rtc">Runtime Type Checking (RTC)&lt;/h1>
&lt;p>This feature works by checking that actual input and output values satisfy the declared
type constraints during pipeline execution. If you ran the code from before with
&lt;code>runtime_type_check&lt;/code> on, you would receive the following error message:&lt;/p>
&lt;pre tabindex="0">&lt;code>Type hint violation for &amp;#39;ParDo(MultiplyByTwo)&amp;#39;: requires &amp;lt;class &amp;#39;int&amp;#39;&amp;gt; but got &amp;lt;class &amp;#39;str&amp;#39;&amp;gt; for element
&lt;/code>&lt;/pre>&lt;p>This is an actionable error message - it tells you that either your code has a bug
or that your declared type hints are incorrect. Sounds simple enough, so what&amp;rsquo;s the catch?&lt;/p>
&lt;p>&lt;em>It is soooo slowwwwww.&lt;/em> See for yourself.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Element Size&lt;/th>
&lt;th>Normal Pipeline&lt;/th>
&lt;th>Runtime Type Checking Pipeline&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>1&lt;/td>
&lt;td>5.3 sec&lt;/td>
&lt;td>5.6 sec&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>2,001&lt;/td>
&lt;td>9.4 sec&lt;/td>
&lt;td>57.2 sec&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>10,001&lt;/td>
&lt;td>24.5 sec&lt;/td>
&lt;td>259.8 sec&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>18,001&lt;/td>
&lt;td>38.7 sec&lt;/td>
&lt;td>450.5 sec&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>In this micro-benchmark, the pipeline with runtime type checking was over 10x slower,
with the gap only increasing as our input PCollection increased in size.&lt;/p>
&lt;p>So, is there any production-friendly alternative?&lt;/p>
&lt;h1 id="performance-runtime-type-check">Performance Runtime Type Check&lt;/h1>
&lt;p>There is! We developed a new flag called &lt;code>performance_runtime_type_check&lt;/code> that
minimizes its footprint on the pipeline&amp;rsquo;s time complexity using a combination of&lt;/p>
&lt;ul>
&lt;li>efficient Cython code,&lt;/li>
&lt;li>smart sampling techniques, and&lt;/li>
&lt;li>optimized mega type-hints.&lt;/li>
&lt;/ul>
&lt;p>So what do the new numbers look like?&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Element Size&lt;/th>
&lt;th>Normal&lt;/th>
&lt;th>RTC&lt;/th>
&lt;th>Performance RTC&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>1&lt;/td>
&lt;td>5.3 sec&lt;/td>
&lt;td>5.6 sec&lt;/td>
&lt;td>5.4 sec&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>2,001&lt;/td>
&lt;td>9.4 sec&lt;/td>
&lt;td>57.2 sec&lt;/td>
&lt;td>11.2 sec&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>10,001&lt;/td>
&lt;td>24.5 sec&lt;/td>
&lt;td>259.8 sec&lt;/td>
&lt;td>25.5 sec&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>18,001&lt;/td>
&lt;td>38.7 sec&lt;/td>
&lt;td>450.5 sec&lt;/td>
&lt;td>39.4 sec&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>On average, the new Performance RTC is 4.4% slower than a normal pipeline whereas the old RTC
is over 900% slower! Additionally, as the size of the input PCollection increases, the fixed cost
of setting up the Performance RTC system is spread across each element, decreasing the relative
impact on the overall pipeline. With 18,001 elements, the difference is less than 1 second.&lt;/p>
&lt;h2 id="how-does-it-work">How does it work?&lt;/h2>
&lt;p>There are three key factors responsible for this upgrade in performance.&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Instead of type checking all values, we only type check a subset of values, known as
a sample in statistics. Initially, we sample a substantial number of elements, but as our
confidence that the element type won&amp;rsquo;t change over time increases, we reduce our
sampling rate (up to a fixed minimum).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Whereas the old RTC system used heavy wrappers to perform the type check, the new RTC system
moves the type check to a Cython-optimized, non-decorated portion of the codebase. For reference,
Cython is a programming language that gives C-like performance to Python code.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Finally, we use a single mega type hint to type-check only the output values of transforms
instead of type-checking both the input and output values separately. This mega typehint is composed of
the original transform&amp;rsquo;s output type constraints along with all consumer transforms&amp;rsquo; input type
constraints. Using this mega type hint allows us to reduce overhead while simultaneously allowing
us to throw &lt;em>more actionable errors&lt;/em>. For instance, consider the following error (which was
generated from the old RTC system):&lt;/p>
&lt;/li>
&lt;/ol>
&lt;pre tabindex="0">&lt;code>Runtime type violation detected within ParDo(DownstreamDoFn): Type-hint for argument: &amp;#39;element&amp;#39; violated. Expected an instance of &amp;lt;class ‘str’&amp;gt;, instead found 9, an instance of &amp;lt;class ‘int’&amp;gt;.
&lt;/code>&lt;/pre>&lt;p>This error tells us that the &lt;code>DownstreamDoFn&lt;/code> received an &lt;code>int&lt;/code> when it was expecting a &lt;code>str&lt;/code>, but doesn&amp;rsquo;t tell us
who created that &lt;code>int&lt;/code> in the first place. Who is the offending upstream transform that&amp;rsquo;s responsible for
this &lt;code>int&lt;/code>? Presumably, &lt;em>that&lt;/em> transform&amp;rsquo;s output type hints were too expansive (e.g. &lt;code>Any&lt;/code>) or otherwise non-existent because
no error was thrown during the runtime type check of its output.&lt;/p>
&lt;p>The problem here boils down to a lack of context. If we knew who our consumers were when type
checking our output, we could simultaneously type check our output value against our output type
constraints and every consumers&amp;rsquo; input type constraints to know whether there is &lt;em>any&lt;/em> possibility
for a mismatch. This is exactly what the mega type hint does, and it allows us to throw errors
at the point of declaration rather than the point of exception, saving you valuable time
while providing higher quality error messages.&lt;/p>
&lt;p>So what would the same error look like using Performance RTC? It&amp;rsquo;s the exact same string but with one additional line:&lt;/p>
&lt;pre tabindex="0">&lt;code>[while running &amp;#39;ParDo(UpstreamDoFn)&amp;#39;]
&lt;/code>&lt;/pre>&lt;p>And that&amp;rsquo;s much more actionable for an investigation :)&lt;/p>
&lt;h1 id="next-steps">Next Steps&lt;/h1>
&lt;p>Go play with the new &lt;code>performance_runtime_type_check&lt;/code> feature!&lt;/p>
&lt;p>It&amp;rsquo;s in an experimental state so please
&lt;a href="/community/contact-us/">let us know&lt;/a>
if you encounter any issues.&lt;/p></description></item><item><title>Blog: Python SDK Typing Changes</title><link>/blog/python-typing/</link><pubDate>Thu, 28 May 2020 00:00:01 -0800</pubDate><guid>/blog/python-typing/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;p>Beam Python has recently increased its support and integration of Python 3 type
annotations for improved code clarity and type correctness checks.
Read on to find out what&amp;rsquo;s new.&lt;/p>
&lt;p>Python supports type annotations on functions (PEP 484). Static type checkers,
such as mypy, are used to verify adherence to these types.
For example:&lt;/p>
&lt;pre tabindex="0">&lt;code>def f(v: int) -&amp;gt; int:
return v[0]
&lt;/code>&lt;/pre>&lt;p>Running mypy on the above code will give the error:
&lt;code>Value of type &amp;quot;int&amp;quot; is not indexable&lt;/code>.&lt;/p>
&lt;p>We&amp;rsquo;ve recently made changes to Beam in 2 areas:&lt;/p>
&lt;p>Adding type annotations throughout Beam. Type annotations make a large and
sophisticated codebase like Beam easier to comprehend and navigate in your
favorite IDE.&lt;/p>
&lt;p>Second, we&amp;rsquo;ve added support for Python 3 type annotations. This allows SDK
users to specify a DoFn&amp;rsquo;s type hints in one place.
We&amp;rsquo;ve also expanded Beam&amp;rsquo;s support of &lt;code>typing&lt;/code> module types.&lt;/p>
&lt;p>For more background see:
&lt;a href="/documentation/sdks/python-type-safety/">Ensuring Python Type Safety&lt;/a>.&lt;/p>
&lt;h1 id="beam-is-typed">Beam Is Typed&lt;/h1>
&lt;p>In tandem with the new type annotation support within DoFns, we&amp;rsquo;ve invested a
great deal of time adding type annotations to the Beam python code itself.
With this in place, we have begun using mypy, a static type
checker, as part of Beam&amp;rsquo;s code review process, which ensures higher quality
contributions and fewer bugs.
The added context and insight that type annotations add throughout Beam is
useful for all Beam developers, contributors and end users alike, but
it is especially beneficial for developers who are new to the project.
If you use an IDE that understands type annotations, it will provide richer
type completions and warnings than before.
You&amp;rsquo;ll also be able to use your IDE to inspect the types of Beam functions and
transforms to better understand how they work, which will ease your own
development.
Finally, once Beam is fully annotated, end users will be able to benefit from
the use of static type analysis on their own pipelines and custom transforms.&lt;/p>
&lt;h1 id="new-ways-to-annotate">New Ways to Annotate&lt;/h1>
&lt;h2 id="python-3-syntax-annotations">Python 3 Syntax Annotations&lt;/h2>
&lt;p>Coming in Beam 2.21 (BEAM-8280), you will be able to use Python annotation
syntax to specify input and output types.&lt;/p>
&lt;p>For example, this new form:&lt;/p>
&lt;pre tabindex="0">&lt;code>class MyDoFn(beam.DoFn):
def process(self, element: int) -&amp;gt; typing.Text:
yield str(element)
&lt;/code>&lt;/pre>&lt;p>is equivalent to this:&lt;/p>
&lt;pre tabindex="0">&lt;code>@apache_beam.typehints.with_input_types(int)
@apache_beam.typehints.with_output_types(typing.Text)
class MyDoFn(beam.DoFn):
def process(self, element):
yield str(element)
&lt;/code>&lt;/pre>&lt;p>One of the advantages of the new form is that you may already be using it
in tandem with a static type checker such as mypy, thus getting additional
runtime type checking for free.&lt;/p>
&lt;p>This feature will be enabled by default, and there will be 2 mechanisms in
place to disable it:&lt;/p>
&lt;ol>
&lt;li>Calling &lt;code>apache_beam.typehints.disable_type_annotations()&lt;/code> before pipeline
construction will disable the new feature completely.&lt;/li>
&lt;li>Decorating a function with &lt;code>@apache_beam.typehints.no_annotations&lt;/code> will
tell Beam to ignore annotations for it.&lt;/li>
&lt;/ol>
&lt;p>Uses of Beam&amp;rsquo;s &lt;code>with_input_type&lt;/code>, &lt;code>with_output_type&lt;/code> methods and decorators will
still work and take precedence over annotations.&lt;/p>
&lt;h3 id="sidebar">Sidebar&lt;/h3>
&lt;p>You might ask: couldn&amp;rsquo;t we use mypy to type check Beam pipelines?
There are several reasons why this is not the case.&lt;/p>
&lt;ul>
&lt;li>Pipelines are constructed at runtime and may depend on information that is
only known at that time, such as a config file or database table schema.&lt;/li>
&lt;li>PCollections don&amp;rsquo;t have the necessary type information, so mypy sees them as
effectively containing any element type.
This may change in the future.&lt;/li>
&lt;li>Transforms using lambdas (ex: &lt;code>beam.Map(lambda x: (1, x)&lt;/code>) cannot be
annotated properly using PEP 484.
However, Beam does a best-effort attempt to analyze the output type
from the bytecode.&lt;/li>
&lt;/ul>
&lt;h2 id="typing-module-support">Typing Module Support&lt;/h2>
&lt;p>Python&amp;rsquo;s &lt;a href="https://docs.python.org/3/library/typing.html">typing&lt;/a> module defines
types used in type annotations. This is what we call &amp;ldquo;native&amp;rdquo; types.
While Beam has its own typing types, it also supports native types.
While both Beam and native types are supported, for new code we encourage using
native typing types. Native types have as these are supported by additional tools.&lt;/p>
&lt;p>While working on Python 3 annotations syntax support, we&amp;rsquo;ve also discovered and
fixed issues with native type support. There may still be bugs and unsupported
native types. Please
&lt;a href="/community/contact-us/">let us know&lt;/a> if you encounter
issues.&lt;/p></description></item><item><title>Blog: Dataflow Python SDK is now public!</title><link>/blog/python-sdk-now-public/</link><pubDate>Thu, 25 Feb 2016 13:00:00 -0800</pubDate><guid>/blog/python-sdk-now-public/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;p>When the Apache Beam project proposed entry into the &lt;a href="https://wiki.apache.org/incubator/BeamProposal">Apache Incubator&lt;/a> the proposal
included the &lt;a href="https://github.com/GoogleCloudPlatform/DataflowJavaSDK">Dataflow Java SDK&lt;/a>. In the long term, however, Apache Beam aims to support SDKs implemented in multiple languages, such as Python.&lt;/p>
&lt;p>Today, Google submitted the &lt;a href="https://github.com/GoogleCloudPlatform/DataflowPythonSDK">Dataflow Python (2.x) SDK&lt;/a> on GitHub. Google is committed to including the in progress python SDK in Apache Beam and, in that spirit, we&amp;rsquo;ve moved development of the Python SDK to a public repository. While this SDK will not be included with the initial (incubating) releases of Apache Beam, our we plan on incorporating the Python SDK into beam during incubation. We want to take the time to implement changes from the &lt;a href="https://goo.gl/nk5OM0">technical vision&lt;/a> into the Java SDK before we introduce a Python SDK for Apache Beam. We believe this will allow us to work on the model and SDKs in an ordered fashion.&lt;/p>
&lt;p>You can look for the Apache Beam Python SDK in the coming months once we finish forking and refactoring the Java SDK.&lt;/p>
&lt;p>Best,&lt;/p>
&lt;p>Apache Beam Team&lt;/p></description></item></channel></rss>