| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" |
| "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> |
| |
| |
| <html xmlns="http://www.w3.org/1999/xhtml"> |
| <head> |
| <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> |
| |
| <title>apache_beam.io package — Apache Beam documentation</title> |
| |
| <link rel="stylesheet" href="_static/sphinxdoc.css" type="text/css" /> |
| <link rel="stylesheet" href="_static/pygments.css" type="text/css" /> |
| |
| <script type="text/javascript"> |
| var DOCUMENTATION_OPTIONS = { |
| URL_ROOT: './', |
| VERSION: '', |
| COLLAPSE_INDEX: false, |
| FILE_SUFFIX: '.html', |
| HAS_SOURCE: true, |
| SOURCELINK_SUFFIX: '.txt' |
| }; |
| </script> |
| <script type="text/javascript" src="_static/jquery.js"></script> |
| <script type="text/javascript" src="_static/underscore.js"></script> |
| <script type="text/javascript" src="_static/doctools.js"></script> |
| <link rel="index" title="Index" href="genindex.html" /> |
| <link rel="search" title="Search" href="search.html" /> |
| <link rel="next" title="apache_beam.io.gcp package" href="apache_beam.io.gcp.html" /> |
| <link rel="prev" title="apache_beam.internal.gcp package" href="apache_beam.internal.gcp.html" /> |
| </head> |
| <body role="document"> |
| <div class="related" role="navigation" aria-label="related navigation"> |
| <h3>Navigation</h3> |
| <ul> |
| <li class="right" style="margin-right: 10px"> |
| <a href="genindex.html" title="General Index" |
| accesskey="I">index</a></li> |
| <li class="right" > |
| <a href="py-modindex.html" title="Python Module Index" |
| >modules</a> |</li> |
| <li class="right" > |
| <a href="apache_beam.io.gcp.html" title="apache_beam.io.gcp package" |
| accesskey="N">next</a> |</li> |
| <li class="right" > |
| <a href="apache_beam.internal.gcp.html" title="apache_beam.internal.gcp package" |
| accesskey="P">previous</a> |</li> |
| <li class="nav-item nav-item-0"><a href="index.html">Apache Beam documentation</a> »</li> |
| <li class="nav-item nav-item-1"><a href="apache_beam.html" accesskey="U">apache_beam package</a> »</li> |
| </ul> |
| </div> |
| <div class="sphinxsidebar" role="navigation" aria-label="main navigation"> |
| <div class="sphinxsidebarwrapper"> |
| <h3><a href="index.html">Table Of Contents</a></h3> |
| <ul> |
| <li><a class="reference internal" href="#">apache_beam.io package</a><ul> |
| <li><a class="reference internal" href="#subpackages">Subpackages</a></li> |
| <li><a class="reference internal" href="#submodules">Submodules</a></li> |
| <li><a class="reference internal" href="#module-apache_beam.io.avroio">apache_beam.io.avroio module</a></li> |
| <li><a class="reference internal" href="#module-apache_beam.io.concat_source">apache_beam.io.concat_source module</a></li> |
| <li><a class="reference internal" href="#module-apache_beam.io.filebasedsink">apache_beam.io.filebasedsink module</a></li> |
| <li><a class="reference internal" href="#module-apache_beam.io.filebasedsource">apache_beam.io.filebasedsource module</a></li> |
| <li><a class="reference internal" href="#module-apache_beam.io.filesystem">apache_beam.io.filesystem module</a></li> |
| <li><a class="reference internal" href="#module-apache_beam.io.filesystems">apache_beam.io.filesystems module</a></li> |
| <li><a class="reference internal" href="#module-apache_beam.io.iobase">apache_beam.io.iobase module</a></li> |
| <li><a class="reference internal" href="#module-apache_beam.io.localfilesystem">apache_beam.io.localfilesystem module</a></li> |
| <li><a class="reference internal" href="#module-apache_beam.io.range_trackers">apache_beam.io.range_trackers module</a></li> |
| <li><a class="reference internal" href="#module-apache_beam.io.source_test_utils">apache_beam.io.source_test_utils module</a></li> |
| <li><a class="reference internal" href="#module-apache_beam.io.textio">apache_beam.io.textio module</a></li> |
| <li><a class="reference internal" href="#module-apache_beam.io.tfrecordio">apache_beam.io.tfrecordio module</a></li> |
| <li><a class="reference internal" href="#module-apache_beam.io">Module contents</a></li> |
| </ul> |
| </li> |
| </ul> |
| |
| <h4>Previous topic</h4> |
| <p class="topless"><a href="apache_beam.internal.gcp.html" |
| title="previous chapter">apache_beam.internal.gcp package</a></p> |
| <h4>Next topic</h4> |
| <p class="topless"><a href="apache_beam.io.gcp.html" |
| title="next chapter">apache_beam.io.gcp package</a></p> |
| <div role="note" aria-label="source link"> |
| <h3>This Page</h3> |
| <ul class="this-page-menu"> |
| <li><a href="_sources/apache_beam.io.rst.txt" |
| rel="nofollow">Show Source</a></li> |
| </ul> |
| </div> |
| <div id="searchbox" style="display: none" role="search"> |
| <h3>Quick search</h3> |
| <form class="search" action="search.html" method="get"> |
| <div><input type="text" name="q" /></div> |
| <div><input type="submit" value="Go" /></div> |
| <input type="hidden" name="check_keywords" value="yes" /> |
| <input type="hidden" name="area" value="default" /> |
| </form> |
| </div> |
| <script type="text/javascript">$('#searchbox').show(0);</script> |
| </div> |
| </div> |
| |
| <div class="document"> |
| <div class="documentwrapper"> |
| <div class="bodywrapper"> |
| <div class="body" role="main"> |
| |
| <div class="section" id="apache-beam-io-package"> |
| <h1>apache_beam.io package<a class="headerlink" href="#apache-beam-io-package" title="Permalink to this headline">¶</a></h1> |
| <div class="section" id="subpackages"> |
| <h2>Subpackages<a class="headerlink" href="#subpackages" title="Permalink to this headline">¶</a></h2> |
| <div class="toctree-wrapper compound"> |
| <ul> |
| <li class="toctree-l1"><a class="reference internal" href="apache_beam.io.gcp.html">apache_beam.io.gcp package</a><ul> |
| <li class="toctree-l2"><a class="reference internal" href="apache_beam.io.gcp.html#subpackages">Subpackages</a><ul> |
| <li class="toctree-l3"><a class="reference internal" href="apache_beam.io.gcp.datastore.html">apache_beam.io.gcp.datastore package</a><ul> |
| <li class="toctree-l4"><a class="reference internal" href="apache_beam.io.gcp.datastore.html#subpackages">Subpackages</a><ul> |
| <li class="toctree-l5"><a class="reference internal" href="apache_beam.io.gcp.datastore.v1.html">apache_beam.io.gcp.datastore.v1 package</a><ul> |
| <li class="toctree-l6"><a class="reference internal" href="apache_beam.io.gcp.datastore.v1.html#submodules">Submodules</a></li> |
| <li class="toctree-l6"><a class="reference internal" href="apache_beam.io.gcp.datastore.v1.html#module-apache_beam.io.gcp.datastore.v1.datastoreio">apache_beam.io.gcp.datastore.v1.datastoreio module</a></li> |
| <li class="toctree-l6"><a class="reference internal" href="apache_beam.io.gcp.datastore.v1.html#module-apache_beam.io.gcp.datastore.v1.fake_datastore">apache_beam.io.gcp.datastore.v1.fake_datastore module</a></li> |
| <li class="toctree-l6"><a class="reference internal" href="apache_beam.io.gcp.datastore.v1.html#module-apache_beam.io.gcp.datastore.v1.helper">apache_beam.io.gcp.datastore.v1.helper module</a></li> |
| <li class="toctree-l6"><a class="reference internal" href="apache_beam.io.gcp.datastore.v1.html#module-apache_beam.io.gcp.datastore.v1.query_splitter">apache_beam.io.gcp.datastore.v1.query_splitter module</a></li> |
| <li class="toctree-l6"><a class="reference internal" href="apache_beam.io.gcp.datastore.v1.html#module-apache_beam.io.gcp.datastore.v1">Module contents</a></li> |
| </ul> |
| </li> |
| </ul> |
| </li> |
| <li class="toctree-l4"><a class="reference internal" href="apache_beam.io.gcp.datastore.html#module-apache_beam.io.gcp.datastore">Module contents</a></li> |
| </ul> |
| </li> |
| <li class="toctree-l3"><a class="reference internal" href="apache_beam.io.gcp.internal.html">apache_beam.io.gcp.internal package</a><ul> |
| <li class="toctree-l4"><a class="reference internal" href="apache_beam.io.gcp.internal.html#module-apache_beam.io.gcp.internal">Module contents</a></li> |
| </ul> |
| </li> |
| <li class="toctree-l3"><a class="reference internal" href="apache_beam.io.gcp.tests.html">apache_beam.io.gcp.tests package</a><ul> |
| <li class="toctree-l4"><a class="reference internal" href="apache_beam.io.gcp.tests.html#submodules">Submodules</a></li> |
| <li class="toctree-l4"><a class="reference internal" href="apache_beam.io.gcp.tests.html#module-apache_beam.io.gcp.tests.bigquery_matcher">apache_beam.io.gcp.tests.bigquery_matcher module</a></li> |
| <li class="toctree-l4"><a class="reference internal" href="apache_beam.io.gcp.tests.html#module-apache_beam.io.gcp.tests">Module contents</a></li> |
| </ul> |
| </li> |
| </ul> |
| </li> |
| <li class="toctree-l2"><a class="reference internal" href="apache_beam.io.gcp.html#submodules">Submodules</a></li> |
| <li class="toctree-l2"><a class="reference internal" href="apache_beam.io.gcp.html#module-apache_beam.io.gcp.bigquery">apache_beam.io.gcp.bigquery module</a></li> |
| <li class="toctree-l2"><a class="reference internal" href="apache_beam.io.gcp.html#module-apache_beam.io.gcp.gcsfilesystem">apache_beam.io.gcp.gcsfilesystem module</a></li> |
| <li class="toctree-l2"><a class="reference internal" href="apache_beam.io.gcp.html#module-apache_beam.io.gcp.gcsio">apache_beam.io.gcp.gcsio module</a></li> |
| <li class="toctree-l2"><a class="reference internal" href="apache_beam.io.gcp.html#module-apache_beam.io.gcp.pubsub">apache_beam.io.gcp.pubsub module</a></li> |
| <li class="toctree-l2"><a class="reference internal" href="apache_beam.io.gcp.html#module-apache_beam.io.gcp">Module contents</a></li> |
| </ul> |
| </li> |
| </ul> |
| </div> |
| </div> |
| <div class="section" id="submodules"> |
| <h2>Submodules<a class="headerlink" href="#submodules" title="Permalink to this headline">¶</a></h2> |
| </div> |
| <div class="section" id="module-apache_beam.io.avroio"> |
| <span id="apache-beam-io-avroio-module"></span><h2>apache_beam.io.avroio module<a class="headerlink" href="#module-apache_beam.io.avroio" title="Permalink to this headline">¶</a></h2> |
| <p>Implements a source for reading Avro files.</p> |
| <dl class="class"> |
| <dt id="apache_beam.io.avroio.ReadFromAvro"> |
| <em class="property">class </em><code class="descclassname">apache_beam.io.avroio.</code><code class="descname">ReadFromAvro</code><span class="sig-paren">(</span><em>file_pattern=None</em>, <em>min_bundle_size=0</em>, <em>validate=True</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/avroio.html#ReadFromAvro"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.avroio.ReadFromAvro" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Bases: <a class="reference internal" href="apache_beam.transforms.html#apache_beam.transforms.ptransform.PTransform" title="apache_beam.transforms.ptransform.PTransform"><code class="xref py py-class docutils literal"><span class="pre">apache_beam.transforms.ptransform.PTransform</span></code></a></p> |
| <p>A <code class="docutils literal"><span class="pre">PTransform</span></code> for reading avro files.</p> |
| <dl class="method"> |
| <dt id="apache_beam.io.avroio.ReadFromAvro.display_data"> |
| <code class="descname">display_data</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/avroio.html#ReadFromAvro.display_data"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.avroio.ReadFromAvro.display_data" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.avroio.ReadFromAvro.expand"> |
| <code class="descname">expand</code><span class="sig-paren">(</span><em>pvalue</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/avroio.html#ReadFromAvro.expand"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.avroio.ReadFromAvro.expand" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| </dd></dl> |
| |
| <dl class="class"> |
| <dt id="apache_beam.io.avroio.WriteToAvro"> |
| <em class="property">class </em><code class="descclassname">apache_beam.io.avroio.</code><code class="descname">WriteToAvro</code><span class="sig-paren">(</span><em>file_path_prefix</em>, <em>schema</em>, <em>codec='deflate'</em>, <em>file_name_suffix=''</em>, <em>num_shards=0</em>, <em>shard_name_template=None</em>, <em>mime_type='application/x-avro'</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/avroio.html#WriteToAvro"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.avroio.WriteToAvro" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Bases: <a class="reference internal" href="apache_beam.transforms.html#apache_beam.transforms.ptransform.PTransform" title="apache_beam.transforms.ptransform.PTransform"><code class="xref py py-class docutils literal"><span class="pre">apache_beam.transforms.ptransform.PTransform</span></code></a></p> |
| <p>A <code class="docutils literal"><span class="pre">PTransform</span></code> for writing avro files.</p> |
| <dl class="method"> |
| <dt id="apache_beam.io.avroio.WriteToAvro.display_data"> |
| <code class="descname">display_data</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/avroio.html#WriteToAvro.display_data"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.avroio.WriteToAvro.display_data" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.avroio.WriteToAvro.expand"> |
| <code class="descname">expand</code><span class="sig-paren">(</span><em>pcoll</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/avroio.html#WriteToAvro.expand"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.avroio.WriteToAvro.expand" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| </dd></dl> |
| |
| </div> |
| <div class="section" id="module-apache_beam.io.concat_source"> |
| <span id="apache-beam-io-concat-source-module"></span><h2>apache_beam.io.concat_source module<a class="headerlink" href="#module-apache_beam.io.concat_source" title="Permalink to this headline">¶</a></h2> |
| <p>For internal use only; no backwards-compatibility guarantees.</p> |
| <p>Concat Source, which reads the union of several other sources.</p> |
| <dl class="class"> |
| <dt id="apache_beam.io.concat_source.ConcatRangeTracker"> |
| <em class="property">class </em><code class="descclassname">apache_beam.io.concat_source.</code><code class="descname">ConcatRangeTracker</code><span class="sig-paren">(</span><em>start</em>, <em>end</em>, <em>source_bundles</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/concat_source.html#ConcatRangeTracker"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.concat_source.ConcatRangeTracker" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Bases: <a class="reference internal" href="#apache_beam.io.iobase.RangeTracker" title="apache_beam.io.iobase.RangeTracker"><code class="xref py py-class docutils literal"><span class="pre">apache_beam.io.iobase.RangeTracker</span></code></a></p> |
| <p>For internal use only; no backwards-compatibility guarantees.</p> |
| <p>Range tracker for ConcatSource</p> |
| <dl class="method"> |
| <dt id="apache_beam.io.concat_source.ConcatRangeTracker.fraction_consumed"> |
| <code class="descname">fraction_consumed</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/concat_source.html#ConcatRangeTracker.fraction_consumed"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.concat_source.ConcatRangeTracker.fraction_consumed" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.concat_source.ConcatRangeTracker.global_to_local"> |
| <code class="descname">global_to_local</code><span class="sig-paren">(</span><em>frac</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/concat_source.html#ConcatRangeTracker.global_to_local"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.concat_source.ConcatRangeTracker.global_to_local" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.concat_source.ConcatRangeTracker.local_to_global"> |
| <code class="descname">local_to_global</code><span class="sig-paren">(</span><em>source_ix</em>, <em>source_frac</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/concat_source.html#ConcatRangeTracker.local_to_global"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.concat_source.ConcatRangeTracker.local_to_global" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.concat_source.ConcatRangeTracker.position_at_fraction"> |
| <code class="descname">position_at_fraction</code><span class="sig-paren">(</span><em>fraction</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/concat_source.html#ConcatRangeTracker.position_at_fraction"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.concat_source.ConcatRangeTracker.position_at_fraction" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.concat_source.ConcatRangeTracker.set_current_position"> |
| <code class="descname">set_current_position</code><span class="sig-paren">(</span><em>pos</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/concat_source.html#ConcatRangeTracker.set_current_position"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.concat_source.ConcatRangeTracker.set_current_position" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.concat_source.ConcatRangeTracker.start_position"> |
| <code class="descname">start_position</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/concat_source.html#ConcatRangeTracker.start_position"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.concat_source.ConcatRangeTracker.start_position" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.concat_source.ConcatRangeTracker.stop_position"> |
| <code class="descname">stop_position</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/concat_source.html#ConcatRangeTracker.stop_position"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.concat_source.ConcatRangeTracker.stop_position" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.concat_source.ConcatRangeTracker.sub_range_tracker"> |
| <code class="descname">sub_range_tracker</code><span class="sig-paren">(</span><em>source_ix</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/concat_source.html#ConcatRangeTracker.sub_range_tracker"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.concat_source.ConcatRangeTracker.sub_range_tracker" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.concat_source.ConcatRangeTracker.try_claim"> |
| <code class="descname">try_claim</code><span class="sig-paren">(</span><em>pos</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/concat_source.html#ConcatRangeTracker.try_claim"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.concat_source.ConcatRangeTracker.try_claim" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.concat_source.ConcatRangeTracker.try_split"> |
| <code class="descname">try_split</code><span class="sig-paren">(</span><em>pos</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/concat_source.html#ConcatRangeTracker.try_split"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.concat_source.ConcatRangeTracker.try_split" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| </dd></dl> |
| |
| <dl class="class"> |
| <dt id="apache_beam.io.concat_source.ConcatSource"> |
| <em class="property">class </em><code class="descclassname">apache_beam.io.concat_source.</code><code class="descname">ConcatSource</code><span class="sig-paren">(</span><em>sources</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/concat_source.html#ConcatSource"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.concat_source.ConcatSource" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Bases: <a class="reference internal" href="#apache_beam.io.iobase.BoundedSource" title="apache_beam.io.iobase.BoundedSource"><code class="xref py py-class docutils literal"><span class="pre">apache_beam.io.iobase.BoundedSource</span></code></a></p> |
| <p>For internal use only; no backwards-compatibility guarantees.</p> |
| <p>A <code class="docutils literal"><span class="pre">BoundedSource</span></code> that can group a set of <code class="docutils literal"><span class="pre">BoundedSources</span></code>.</p> |
| <p>Primarily for internal use, use the <code class="docutils literal"><span class="pre">apache_beam.Flatten</span></code> transform |
| to create the union of several reads.</p> |
| <dl class="method"> |
| <dt id="apache_beam.io.concat_source.ConcatSource.default_output_coder"> |
| <code class="descname">default_output_coder</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/concat_source.html#ConcatSource.default_output_coder"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.concat_source.ConcatSource.default_output_coder" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.concat_source.ConcatSource.estimate_size"> |
| <code class="descname">estimate_size</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/concat_source.html#ConcatSource.estimate_size"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.concat_source.ConcatSource.estimate_size" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.concat_source.ConcatSource.get_range_tracker"> |
| <code class="descname">get_range_tracker</code><span class="sig-paren">(</span><em>start_position=None</em>, <em>stop_position=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/concat_source.html#ConcatSource.get_range_tracker"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.concat_source.ConcatSource.get_range_tracker" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.concat_source.ConcatSource.read"> |
| <code class="descname">read</code><span class="sig-paren">(</span><em>range_tracker</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/concat_source.html#ConcatSource.read"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.concat_source.ConcatSource.read" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="attribute"> |
| <dt id="apache_beam.io.concat_source.ConcatSource.sources"> |
| <code class="descname">sources</code><a class="headerlink" href="#apache_beam.io.concat_source.ConcatSource.sources" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.concat_source.ConcatSource.split"> |
| <code class="descname">split</code><span class="sig-paren">(</span><em>desired_bundle_size=None</em>, <em>start_position=None</em>, <em>stop_position=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/concat_source.html#ConcatSource.split"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.concat_source.ConcatSource.split" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| </dd></dl> |
| |
| </div> |
| <div class="section" id="module-apache_beam.io.filebasedsink"> |
| <span id="apache-beam-io-filebasedsink-module"></span><h2>apache_beam.io.filebasedsink module<a class="headerlink" href="#module-apache_beam.io.filebasedsink" title="Permalink to this headline">¶</a></h2> |
| <p>File-based sink.</p> |
| <dl class="class"> |
| <dt id="apache_beam.io.filebasedsink.FileBasedSink"> |
| <em class="property">class </em><code class="descclassname">apache_beam.io.filebasedsink.</code><code class="descname">FileBasedSink</code><span class="sig-paren">(</span><em>file_path_prefix</em>, <em>coder</em>, <em>file_name_suffix=''</em>, <em>num_shards=0</em>, <em>shard_name_template=None</em>, <em>mime_type='application/octet-stream'</em>, <em>compression_type='auto'</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filebasedsink.html#FileBasedSink"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filebasedsink.FileBasedSink" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Bases: <a class="reference internal" href="#apache_beam.io.iobase.Sink" title="apache_beam.io.iobase.Sink"><code class="xref py py-class docutils literal"><span class="pre">apache_beam.io.iobase.Sink</span></code></a></p> |
| <p>A sink to a GCS or local files.</p> |
| <p>To implement a file-based sink, extend this class and override |
| either <code class="docutils literal"><span class="pre">write_record()</span></code> or <code class="docutils literal"><span class="pre">write_encoded_record()</span></code>.</p> |
| <p>If needed, also overwrite <code class="docutils literal"><span class="pre">open()</span></code> and/or <code class="docutils literal"><span class="pre">close()</span></code> to customize the |
| file handling or write headers and footers.</p> |
| <p>The output of this write is a PCollection of all written shards.</p> |
| <dl class="method"> |
| <dt id="apache_beam.io.filebasedsink.FileBasedSink.close"> |
| <code class="descname">close</code><span class="sig-paren">(</span><em>file_handle</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filebasedsink.html#FileBasedSink.close"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filebasedsink.FileBasedSink.close" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Finalize and close the file handle returned from <code class="docutils literal"><span class="pre">open()</span></code>.</p> |
| <p>Called after all records are written.</p> |
| <p>By default, calls <code class="docutils literal"><span class="pre">file_handle.close()</span></code> iff it is not None.</p> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filebasedsink.FileBasedSink.display_data"> |
| <code class="descname">display_data</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filebasedsink.html#FileBasedSink.display_data"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filebasedsink.FileBasedSink.display_data" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filebasedsink.FileBasedSink.finalize_write"> |
| <code class="descname">finalize_write</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filebasedsink.html#FileBasedSink.finalize_write"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filebasedsink.FileBasedSink.finalize_write" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filebasedsink.FileBasedSink.initialize_write"> |
| <code class="descname">initialize_write</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filebasedsink.html#FileBasedSink.initialize_write"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filebasedsink.FileBasedSink.initialize_write" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filebasedsink.FileBasedSink.open"> |
| <code class="descname">open</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filebasedsink.html#FileBasedSink.open"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filebasedsink.FileBasedSink.open" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Opens <code class="docutils literal"><span class="pre">temp_path</span></code>, returning an opaque file handle object.</p> |
| <p>The returned file handle is passed to <code class="docutils literal"><span class="pre">write_[encoded_]record</span></code> and |
| <code class="docutils literal"><span class="pre">close</span></code>.</p> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filebasedsink.FileBasedSink.open_writer"> |
| <code class="descname">open_writer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filebasedsink.html#FileBasedSink.open_writer"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filebasedsink.FileBasedSink.open_writer" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filebasedsink.FileBasedSink.write_encoded_record"> |
| <code class="descname">write_encoded_record</code><span class="sig-paren">(</span><em>file_handle</em>, <em>encoded_value</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filebasedsink.html#FileBasedSink.write_encoded_record"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filebasedsink.FileBasedSink.write_encoded_record" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Writes a single encoded record to the file handle returned by <code class="docutils literal"><span class="pre">open()</span></code>.</p> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filebasedsink.FileBasedSink.write_record"> |
| <code class="descname">write_record</code><span class="sig-paren">(</span><em>file_handle</em>, <em>value</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filebasedsink.html#FileBasedSink.write_record"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filebasedsink.FileBasedSink.write_record" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Writes a single record go the file handle returned by <code class="docutils literal"><span class="pre">open()</span></code>.</p> |
| <p>By default, calls <code class="docutils literal"><span class="pre">write_encoded_record</span></code> after encoding the record with |
| this sink’s Coder.</p> |
| </dd></dl> |
| |
| </dd></dl> |
| |
| </div> |
| <div class="section" id="module-apache_beam.io.filebasedsource"> |
| <span id="apache-beam-io-filebasedsource-module"></span><h2>apache_beam.io.filebasedsource module<a class="headerlink" href="#module-apache_beam.io.filebasedsource" title="Permalink to this headline">¶</a></h2> |
| <p>A framework for developing sources for new file types.</p> |
| <p>To create a source for a new file type a sub-class of <code class="docutils literal"><span class="pre">FileBasedSource</span></code> should |
| be created. Sub-classes of <code class="docutils literal"><span class="pre">FileBasedSource</span></code> must implement the method |
| <code class="docutils literal"><span class="pre">FileBasedSource.read_records()</span></code>. Please read the documentation of that method |
| for more details.</p> |
| <p>For an example implementation of <code class="docutils literal"><span class="pre">FileBasedSource</span></code> see <code class="docutils literal"><span class="pre">avroio.AvroSource</span></code>.</p> |
| <dl class="class"> |
| <dt id="apache_beam.io.filebasedsource.FileBasedSource"> |
| <em class="property">class </em><code class="descclassname">apache_beam.io.filebasedsource.</code><code class="descname">FileBasedSource</code><span class="sig-paren">(</span><em>file_pattern</em>, <em>min_bundle_size=0</em>, <em>compression_type='auto'</em>, <em>splittable=True</em>, <em>validate=True</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filebasedsource.html#FileBasedSource"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filebasedsource.FileBasedSource" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Bases: <a class="reference internal" href="#apache_beam.io.iobase.BoundedSource" title="apache_beam.io.iobase.BoundedSource"><code class="xref py py-class docutils literal"><span class="pre">apache_beam.io.iobase.BoundedSource</span></code></a></p> |
| <p>A <code class="docutils literal"><span class="pre">BoundedSource</span></code> for reading a file glob of a given type.</p> |
| <dl class="attribute"> |
| <dt id="apache_beam.io.filebasedsource.FileBasedSource.MIN_FRACTION_OF_FILES_TO_STAT"> |
| <code class="descname">MIN_FRACTION_OF_FILES_TO_STAT</code><em class="property"> = 0.01</em><a class="headerlink" href="#apache_beam.io.filebasedsource.FileBasedSource.MIN_FRACTION_OF_FILES_TO_STAT" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="attribute"> |
| <dt id="apache_beam.io.filebasedsource.FileBasedSource.MIN_NUMBER_OF_FILES_TO_STAT"> |
| <code class="descname">MIN_NUMBER_OF_FILES_TO_STAT</code><em class="property"> = 100</em><a class="headerlink" href="#apache_beam.io.filebasedsource.FileBasedSource.MIN_NUMBER_OF_FILES_TO_STAT" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filebasedsource.FileBasedSource.display_data"> |
| <code class="descname">display_data</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filebasedsource.html#FileBasedSource.display_data"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filebasedsource.FileBasedSource.display_data" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filebasedsource.FileBasedSource.estimate_size"> |
| <code class="descname">estimate_size</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filebasedsource.html#FileBasedSource.estimate_size"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filebasedsource.FileBasedSource.estimate_size" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filebasedsource.FileBasedSource.get_range_tracker"> |
| <code class="descname">get_range_tracker</code><span class="sig-paren">(</span><em>start_position</em>, <em>stop_position</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filebasedsource.html#FileBasedSource.get_range_tracker"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filebasedsource.FileBasedSource.get_range_tracker" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filebasedsource.FileBasedSource.open_file"> |
| <code class="descname">open_file</code><span class="sig-paren">(</span><em>file_name</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filebasedsource.html#FileBasedSource.open_file"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filebasedsource.FileBasedSource.open_file" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filebasedsource.FileBasedSource.read"> |
| <code class="descname">read</code><span class="sig-paren">(</span><em>range_tracker</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filebasedsource.html#FileBasedSource.read"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filebasedsource.FileBasedSource.read" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filebasedsource.FileBasedSource.read_records"> |
| <code class="descname">read_records</code><span class="sig-paren">(</span><em>file_name</em>, <em>offset_range_tracker</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filebasedsource.html#FileBasedSource.read_records"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filebasedsource.FileBasedSource.read_records" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Returns a generator of records created by reading file ‘file_name’.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple"> |
| <li><strong>file_name</strong> – a <code class="docutils literal"><span class="pre">string</span></code> that gives the name of the file to be read. Method |
| <code class="docutils literal"><span class="pre">FileBasedSource.open_file()</span></code> must be used to open the file |
| and create a seekable file object.</li> |
| <li><strong>offset_range_tracker</strong> – a object of type <code class="docutils literal"><span class="pre">OffsetRangeTracker</span></code>. This |
| defines the byte range of the file that should be |
| read. See documentation in |
| <code class="docutils literal"><span class="pre">iobase.BoundedSource.read()</span></code> for more information |
| on reading records while complying to the range |
| defined by a given <code class="docutils literal"><span class="pre">RangeTracker</span></code>.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first last">an iterator that gives the records read from the given file.</p> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filebasedsource.FileBasedSource.split"> |
| <code class="descname">split</code><span class="sig-paren">(</span><em>desired_bundle_size=None</em>, <em>start_position=None</em>, <em>stop_position=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filebasedsource.html#FileBasedSource.split"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filebasedsource.FileBasedSource.split" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="attribute"> |
| <dt id="apache_beam.io.filebasedsource.FileBasedSource.splittable"> |
| <code class="descname">splittable</code><a class="headerlink" href="#apache_beam.io.filebasedsource.FileBasedSource.splittable" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| </dd></dl> |
| |
| </div> |
| <div class="section" id="module-apache_beam.io.filesystem"> |
| <span id="apache-beam-io-filesystem-module"></span><h2>apache_beam.io.filesystem module<a class="headerlink" href="#module-apache_beam.io.filesystem" title="Permalink to this headline">¶</a></h2> |
| <p>File system abstraction for file-based sources and sinks.</p> |
| <dl class="class"> |
| <dt id="apache_beam.io.filesystem.CompressionTypes"> |
| <em class="property">class </em><code class="descclassname">apache_beam.io.filesystem.</code><code class="descname">CompressionTypes</code><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#CompressionTypes"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.CompressionTypes" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Bases: <code class="xref py py-class docutils literal"><span class="pre">object</span></code></p> |
| <p>Enum-like class representing known compression types.</p> |
| <dl class="attribute"> |
| <dt id="apache_beam.io.filesystem.CompressionTypes.AUTO"> |
| <code class="descname">AUTO</code><em class="property"> = 'auto'</em><a class="headerlink" href="#apache_beam.io.filesystem.CompressionTypes.AUTO" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="attribute"> |
| <dt id="apache_beam.io.filesystem.CompressionTypes.BZIP2"> |
| <code class="descname">BZIP2</code><em class="property"> = 'bzip2'</em><a class="headerlink" href="#apache_beam.io.filesystem.CompressionTypes.BZIP2" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="attribute"> |
| <dt id="apache_beam.io.filesystem.CompressionTypes.GZIP"> |
| <code class="descname">GZIP</code><em class="property"> = 'gzip'</em><a class="headerlink" href="#apache_beam.io.filesystem.CompressionTypes.GZIP" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="attribute"> |
| <dt id="apache_beam.io.filesystem.CompressionTypes.UNCOMPRESSED"> |
| <code class="descname">UNCOMPRESSED</code><em class="property"> = 'uncompressed'</em><a class="headerlink" href="#apache_beam.io.filesystem.CompressionTypes.UNCOMPRESSED" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="classmethod"> |
| <dt id="apache_beam.io.filesystem.CompressionTypes.detect_compression_type"> |
| <em class="property">classmethod </em><code class="descname">detect_compression_type</code><span class="sig-paren">(</span><em>file_path</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#CompressionTypes.detect_compression_type"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.CompressionTypes.detect_compression_type" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Returns the compression type of a file (based on its suffix).</p> |
| </dd></dl> |
| |
| <dl class="classmethod"> |
| <dt id="apache_beam.io.filesystem.CompressionTypes.is_valid_compression_type"> |
| <em class="property">classmethod </em><code class="descname">is_valid_compression_type</code><span class="sig-paren">(</span><em>compression_type</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#CompressionTypes.is_valid_compression_type"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.CompressionTypes.is_valid_compression_type" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Returns True for valid compression types, False otherwise.</p> |
| </dd></dl> |
| |
| <dl class="classmethod"> |
| <dt id="apache_beam.io.filesystem.CompressionTypes.mime_type"> |
| <em class="property">classmethod </em><code class="descname">mime_type</code><span class="sig-paren">(</span><em>compression_type</em>, <em>default='application/octet-stream'</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#CompressionTypes.mime_type"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.CompressionTypes.mime_type" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| </dd></dl> |
| |
| <dl class="class"> |
| <dt id="apache_beam.io.filesystem.CompressedFile"> |
| <em class="property">class </em><code class="descclassname">apache_beam.io.filesystem.</code><code class="descname">CompressedFile</code><span class="sig-paren">(</span><em>fileobj</em>, <em>compression_type='gzip'</em>, <em>read_size=16777216</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#CompressedFile"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.CompressedFile" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Bases: <code class="xref py py-class docutils literal"><span class="pre">object</span></code></p> |
| <p>File wrapper for easier handling of compressed files.</p> |
| <dl class="method"> |
| <dt id="apache_beam.io.filesystem.CompressedFile.close"> |
| <code class="descname">close</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#CompressedFile.close"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.CompressedFile.close" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filesystem.CompressedFile.closed"> |
| <code class="descname">closed</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#CompressedFile.closed"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.CompressedFile.closed" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filesystem.CompressedFile.flush"> |
| <code class="descname">flush</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#CompressedFile.flush"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.CompressedFile.flush" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filesystem.CompressedFile.read"> |
| <code class="descname">read</code><span class="sig-paren">(</span><em>num_bytes</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#CompressedFile.read"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.CompressedFile.read" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filesystem.CompressedFile.readable"> |
| <code class="descname">readable</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#CompressedFile.readable"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.CompressedFile.readable" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filesystem.CompressedFile.readline"> |
| <code class="descname">readline</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#CompressedFile.readline"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.CompressedFile.readline" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Equivalent to standard file.readline(). Same return conventions apply.</p> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filesystem.CompressedFile.seek"> |
| <code class="descname">seek</code><span class="sig-paren">(</span><em>offset</em>, <em>whence=0</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#CompressedFile.seek"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.CompressedFile.seek" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Set the file’s current offset.</p> |
| <dl class="docutils"> |
| <dt>Seeking behavior:</dt> |
| <dd><ul class="first last simple"> |
| <li>seeking from the end (SEEK_END) the whole file is decompressed once to |
| determine it’s size. Therefore it is preferred to use |
| SEEK_SET or SEEK_CUR to avoid the processing overhead</li> |
| <li>seeking backwards from the current position rewinds the file to 0 |
| and decompresses the chunks to the requested offset</li> |
| <li>seeking is only supported in files opened for reading</li> |
| <li>if the new offset is out of bound, it is adjusted to either 0 or EOF.</li> |
| </ul> |
| </dd> |
| </dl> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple"> |
| <li><strong>offset</strong> – seek offset in the uncompressed content represented as number</li> |
| <li><strong>whence</strong> – seek mode. Supported modes are os.SEEK_SET (absolute seek), |
| os.SEEK_CUR (seek relative to the current position), and os.SEEK_END |
| (seek relative to the end, offset should be negative).</li> |
| </ul> |
| </td> |
| </tr> |
| <tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><ul class="first last simple"> |
| <li><code class="xref py py-exc docutils literal"><span class="pre">IOError</span></code> – When this buffer is closed.</li> |
| <li><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code> – When whence is invalid or the file is not seekable</li> |
| </ul> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="attribute"> |
| <dt id="apache_beam.io.filesystem.CompressedFile.seekable"> |
| <code class="descname">seekable</code><a class="headerlink" href="#apache_beam.io.filesystem.CompressedFile.seekable" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filesystem.CompressedFile.tell"> |
| <code class="descname">tell</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#CompressedFile.tell"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.CompressedFile.tell" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Returns current position in uncompressed file.</p> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filesystem.CompressedFile.write"> |
| <code class="descname">write</code><span class="sig-paren">(</span><em>data</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#CompressedFile.write"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.CompressedFile.write" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Write data to file.</p> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filesystem.CompressedFile.writeable"> |
| <code class="descname">writeable</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#CompressedFile.writeable"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.CompressedFile.writeable" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| </dd></dl> |
| |
| <dl class="class"> |
| <dt id="apache_beam.io.filesystem.FileMetadata"> |
| <em class="property">class </em><code class="descclassname">apache_beam.io.filesystem.</code><code class="descname">FileMetadata</code><span class="sig-paren">(</span><em>path</em>, <em>size_in_bytes</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#FileMetadata"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.FileMetadata" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Bases: <code class="xref py py-class docutils literal"><span class="pre">object</span></code></p> |
| <p>Metadata about a file path that is the output of FileSystem.match</p> |
| </dd></dl> |
| |
| <dl class="class"> |
| <dt id="apache_beam.io.filesystem.FileSystem"> |
| <em class="property">class </em><code class="descclassname">apache_beam.io.filesystem.</code><code class="descname">FileSystem</code><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#FileSystem"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.FileSystem" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Bases: <code class="xref py py-class docutils literal"><span class="pre">object</span></code></p> |
| <p>A class that defines the functions that can be performed on a filesystem.</p> |
| <p>All methods are abstract and they are for file system providers to |
| implement. Clients should use the FileSystemUtil class to interact with |
| the correct file system based on the provided file pattern scheme.</p> |
| <dl class="attribute"> |
| <dt id="apache_beam.io.filesystem.FileSystem.CHUNK_SIZE"> |
| <code class="descname">CHUNK_SIZE</code><em class="property"> = 1</em><a class="headerlink" href="#apache_beam.io.filesystem.FileSystem.CHUNK_SIZE" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filesystem.FileSystem.copy"> |
| <code class="descname">copy</code><span class="sig-paren">(</span><em>source_file_names</em>, <em>destination_file_names</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#FileSystem.copy"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.FileSystem.copy" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Recursively copy the file tree from the source to the destination</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple"> |
| <li><strong>source_file_names</strong> – list of source file objects that needs to be copied</li> |
| <li><strong>destination_file_names</strong> – list of destination of the new object</li> |
| </ul> |
| </td> |
| </tr> |
| <tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><p class="first last"><code class="docutils literal"><span class="pre">BeamIOError</span></code> if any of the copy operations fail</p> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filesystem.FileSystem.create"> |
| <code class="descname">create</code><span class="sig-paren">(</span><em>path</em>, <em>mime_type='application/octet-stream'</em>, <em>compression_type='auto'</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#FileSystem.create"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.FileSystem.create" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Returns a write channel for the given file path.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple"> |
| <li><strong>path</strong> – string path of the file object to be written to the system</li> |
| <li><strong>mime_type</strong> – MIME type to specify the type of content in the file object</li> |
| <li><strong>compression_type</strong> – Type of compression to be used for this object</li> |
| </ul> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <p>Returns: file handle with a close function for the user to use</p> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filesystem.FileSystem.delete"> |
| <code class="descname">delete</code><span class="sig-paren">(</span><em>paths</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#FileSystem.delete"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.FileSystem.delete" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Deletes files or directories at the provided paths. |
| Directories will be deleted recursively.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>paths</strong> – list of paths that give the file objects to be deleted</td> |
| </tr> |
| <tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><code class="docutils literal"><span class="pre">BeamIOError</span></code> if any of the delete operations fail</td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filesystem.FileSystem.exists"> |
| <code class="descname">exists</code><span class="sig-paren">(</span><em>path</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#FileSystem.exists"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.FileSystem.exists" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Check if the provided path exists on the FileSystem.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>path</strong> – string path that needs to be checked.</td> |
| </tr> |
| </tbody> |
| </table> |
| <p>Returns: boolean flag indicating if path exists</p> |
| </dd></dl> |
| |
| <dl class="classmethod"> |
| <dt id="apache_beam.io.filesystem.FileSystem.get_all_subclasses"> |
| <em class="property">classmethod </em><code class="descname">get_all_subclasses</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#FileSystem.get_all_subclasses"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.FileSystem.get_all_subclasses" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Get all the subclasses of the FileSystem class</p> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filesystem.FileSystem.join"> |
| <code class="descname">join</code><span class="sig-paren">(</span><em>basepath</em>, <em>*paths</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#FileSystem.join"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.FileSystem.join" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Join two or more pathname components for the filesystem</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple"> |
| <li><strong>basepath</strong> – string path of the first component of the path</li> |
| <li><strong>paths</strong> – path components to be added</li> |
| </ul> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <p>Returns: full path after combining all the passed components</p> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filesystem.FileSystem.match"> |
| <code class="descname">match</code><span class="sig-paren">(</span><em>patterns</em>, <em>limits=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#FileSystem.match"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.FileSystem.match" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Find all matching paths to the patterns provided.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple"> |
| <li><strong>patterns</strong> – list of string for the file path pattern to match against</li> |
| <li><strong>limits</strong> – list of maximum number of responses that need to be fetched</li> |
| </ul> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <p>Returns: list of <code class="docutils literal"><span class="pre">MatchResult</span></code> objects.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Raises:</th><td class="field-body"><code class="docutils literal"><span class="pre">BeamIOError</span></code> if any of the pattern match operations fail</td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filesystem.FileSystem.mkdirs"> |
| <code class="descname">mkdirs</code><span class="sig-paren">(</span><em>path</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#FileSystem.mkdirs"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.FileSystem.mkdirs" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Recursively create directories for the provided path.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>path</strong> – string path of the directory structure that should be created</td> |
| </tr> |
| <tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body">IOError if leaf directory already exists.</td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filesystem.FileSystem.open"> |
| <code class="descname">open</code><span class="sig-paren">(</span><em>path</em>, <em>mime_type='application/octet-stream'</em>, <em>compression_type='auto'</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#FileSystem.open"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.FileSystem.open" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Returns a read channel for the given file path.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple"> |
| <li><strong>path</strong> – string path of the file object to be written to the system</li> |
| <li><strong>mime_type</strong> – MIME type to specify the type of content in the file object</li> |
| <li><strong>compression_type</strong> – Type of compression to be used for this object</li> |
| </ul> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <p>Returns: file handle with a close function for the user to use</p> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filesystem.FileSystem.rename"> |
| <code class="descname">rename</code><span class="sig-paren">(</span><em>source_file_names</em>, <em>destination_file_names</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#FileSystem.rename"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.FileSystem.rename" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Rename the files at the source list to the destination list. |
| Source and destination lists should be of the same size.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple"> |
| <li><strong>source_file_names</strong> – List of file paths that need to be moved</li> |
| <li><strong>destination_file_names</strong> – List of destination_file_names for the files</li> |
| </ul> |
| </td> |
| </tr> |
| <tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><p class="first last"><code class="docutils literal"><span class="pre">BeamIOError</span></code> if any of the rename operations fail</p> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="classmethod"> |
| <dt id="apache_beam.io.filesystem.FileSystem.scheme"> |
| <em class="property">classmethod </em><code class="descname">scheme</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#FileSystem.scheme"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.FileSystem.scheme" title="Permalink to this definition">¶</a></dt> |
| <dd><p>URI scheme for the FileSystem</p> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.filesystem.FileSystem.split"> |
| <code class="descname">split</code><span class="sig-paren">(</span><em>path</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#FileSystem.split"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.FileSystem.split" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Splits the given path into two parts.</p> |
| <p>Splits the path into a pair (head, tail) such that tail contains the last |
| component of the path and head contains everything up to that.</p> |
| <p>For file-systems other than the local file-system, head should include the |
| prefix.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>path</strong> – path as a string</td> |
| </tr> |
| <tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body">a pair of path components as strings.</td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| </dd></dl> |
| |
| <dl class="class"> |
| <dt id="apache_beam.io.filesystem.MatchResult"> |
| <em class="property">class </em><code class="descclassname">apache_beam.io.filesystem.</code><code class="descname">MatchResult</code><span class="sig-paren">(</span><em>pattern</em>, <em>metadata_list</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystem.html#MatchResult"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystem.MatchResult" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Bases: <code class="xref py py-class docutils literal"><span class="pre">object</span></code></p> |
| <p>Result from the <code class="docutils literal"><span class="pre">FileSystem</span></code> match operation which contains the list |
| of matched FileMetadata.</p> |
| </dd></dl> |
| |
| </div> |
| <div class="section" id="module-apache_beam.io.filesystems"> |
| <span id="apache-beam-io-filesystems-module"></span><h2>apache_beam.io.filesystems module<a class="headerlink" href="#module-apache_beam.io.filesystems" title="Permalink to this headline">¶</a></h2> |
| <p>FileSystems interface class for accessing the correct filesystem</p> |
| <dl class="class"> |
| <dt id="apache_beam.io.filesystems.FileSystems"> |
| <em class="property">class </em><code class="descclassname">apache_beam.io.filesystems.</code><code class="descname">FileSystems</code><a class="reference internal" href="_modules/apache_beam/io/filesystems.html#FileSystems"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystems.FileSystems" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Bases: <code class="xref py py-class docutils literal"><span class="pre">object</span></code></p> |
| <p>A class that defines the functions that can be performed on a filesystem. |
| All methods are static and access the underlying registered filesystems.</p> |
| <dl class="attribute"> |
| <dt id="apache_beam.io.filesystems.FileSystems.URI_SCHEMA_PATTERN"> |
| <code class="descname">URI_SCHEMA_PATTERN</code><em class="property"> = <_sre.SRE_Pattern object></em><a class="headerlink" href="#apache_beam.io.filesystems.FileSystems.URI_SCHEMA_PATTERN" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="staticmethod"> |
| <dt id="apache_beam.io.filesystems.FileSystems.copy"> |
| <em class="property">static </em><code class="descname">copy</code><span class="sig-paren">(</span><em>source_file_names</em>, <em>destination_file_names</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystems.html#FileSystems.copy"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystems.FileSystems.copy" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Recursively copy the file list from the source to the destination</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple"> |
| <li><strong>source_file_names</strong> – list of source file objects that needs to be copied</li> |
| <li><strong>destination_file_names</strong> – list of destination of the new object</li> |
| </ul> |
| </td> |
| </tr> |
| <tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><p class="first last"><code class="docutils literal"><span class="pre">BeamIOError</span></code> if any of the copy operations fail</p> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="staticmethod"> |
| <dt id="apache_beam.io.filesystems.FileSystems.create"> |
| <em class="property">static </em><code class="descname">create</code><span class="sig-paren">(</span><em>path</em>, <em>mime_type='application/octet-stream'</em>, <em>compression_type='auto'</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystems.html#FileSystems.create"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystems.FileSystems.create" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Returns a write channel for the given file path.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple"> |
| <li><strong>path</strong> – string path of the file object to be written to the system</li> |
| <li><strong>mime_type</strong> – MIME type to specify the type of content in the file object</li> |
| <li><strong>compression_type</strong> – Type of compression to be used for this object. See |
| <code class="docutils literal"><span class="pre">CompressionTypes</span></code> for possible values.</li> |
| </ul> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <p>Returns: file handle with a <code class="docutils literal"><span class="pre">close</span></code> function for the user to use.</p> |
| </dd></dl> |
| |
| <dl class="staticmethod"> |
| <dt id="apache_beam.io.filesystems.FileSystems.delete"> |
| <em class="property">static </em><code class="descname">delete</code><span class="sig-paren">(</span><em>paths</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystems.html#FileSystems.delete"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystems.FileSystems.delete" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Deletes files or directories at the provided paths. |
| Directories will be deleted recursively.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>paths</strong> – list of paths that give the file objects to be deleted</td> |
| </tr> |
| <tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><code class="docutils literal"><span class="pre">BeamIOError</span></code> if any of the delete operations fail</td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="staticmethod"> |
| <dt id="apache_beam.io.filesystems.FileSystems.exists"> |
| <em class="property">static </em><code class="descname">exists</code><span class="sig-paren">(</span><em>path</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystems.html#FileSystems.exists"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystems.FileSystems.exists" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Check if the provided path exists on the FileSystem.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>path</strong> – string path that needs to be checked.</td> |
| </tr> |
| </tbody> |
| </table> |
| <p>Returns: boolean flag indicating if path exists</p> |
| </dd></dl> |
| |
| <dl class="staticmethod"> |
| <dt id="apache_beam.io.filesystems.FileSystems.get_chunk_size"> |
| <em class="property">static </em><code class="descname">get_chunk_size</code><span class="sig-paren">(</span><em>path</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystems.html#FileSystems.get_chunk_size"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystems.FileSystems.get_chunk_size" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Get the correct chunk size for the FileSystem.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>path</strong> – string path that needs to be checked.</td> |
| </tr> |
| </tbody> |
| </table> |
| <p>Returns: integer size for parallelization in the FS operations.</p> |
| </dd></dl> |
| |
| <dl class="staticmethod"> |
| <dt id="apache_beam.io.filesystems.FileSystems.get_filesystem"> |
| <em class="property">static </em><code class="descname">get_filesystem</code><span class="sig-paren">(</span><em>path</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystems.html#FileSystems.get_filesystem"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystems.FileSystems.get_filesystem" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Get the correct filesystem for the specified path</p> |
| </dd></dl> |
| |
| <dl class="staticmethod"> |
| <dt id="apache_beam.io.filesystems.FileSystems.get_scheme"> |
| <em class="property">static </em><code class="descname">get_scheme</code><span class="sig-paren">(</span><em>path</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystems.html#FileSystems.get_scheme"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystems.FileSystems.get_scheme" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="staticmethod"> |
| <dt id="apache_beam.io.filesystems.FileSystems.join"> |
| <em class="property">static </em><code class="descname">join</code><span class="sig-paren">(</span><em>basepath</em>, <em>*paths</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystems.html#FileSystems.join"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystems.FileSystems.join" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Join two or more pathname components for the filesystem</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple"> |
| <li><strong>basepath</strong> – string path of the first component of the path</li> |
| <li><strong>paths</strong> – path components to be added</li> |
| </ul> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <p>Returns: full path after combining all the passed components</p> |
| </dd></dl> |
| |
| <dl class="staticmethod"> |
| <dt id="apache_beam.io.filesystems.FileSystems.match"> |
| <em class="property">static </em><code class="descname">match</code><span class="sig-paren">(</span><em>patterns</em>, <em>limits=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystems.html#FileSystems.match"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystems.FileSystems.match" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Find all matching paths to the patterns provided.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple"> |
| <li><strong>patterns</strong> – list of string for the file path pattern to match against</li> |
| <li><strong>limits</strong> – list of maximum number of responses that need to be fetched</li> |
| </ul> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <p>Returns: list of <code class="docutils literal"><span class="pre">MatchResult</span></code> objects.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Raises:</th><td class="field-body"><code class="docutils literal"><span class="pre">BeamIOError</span></code> if any of the pattern match operations fail</td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="staticmethod"> |
| <dt id="apache_beam.io.filesystems.FileSystems.mkdirs"> |
| <em class="property">static </em><code class="descname">mkdirs</code><span class="sig-paren">(</span><em>path</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystems.html#FileSystems.mkdirs"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystems.FileSystems.mkdirs" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Recursively create directories for the provided path.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>path</strong> – string path of the directory structure that should be created</td> |
| </tr> |
| <tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body">IOError if leaf directory already exists.</td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="staticmethod"> |
| <dt id="apache_beam.io.filesystems.FileSystems.open"> |
| <em class="property">static </em><code class="descname">open</code><span class="sig-paren">(</span><em>path</em>, <em>mime_type='application/octet-stream'</em>, <em>compression_type='auto'</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystems.html#FileSystems.open"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystems.FileSystems.open" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Returns a read channel for the given file path.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple"> |
| <li><strong>path</strong> – string path of the file object to be written to the system</li> |
| <li><strong>mime_type</strong> – MIME type to specify the type of content in the file object</li> |
| <li><strong>compression_type</strong> – Type of compression to be used for this object. See |
| <code class="docutils literal"><span class="pre">CompressionTypes</span></code> for possible values.</li> |
| </ul> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <p>Returns: file handle with a <code class="docutils literal"><span class="pre">close</span></code> function for the user to use.</p> |
| </dd></dl> |
| |
| <dl class="staticmethod"> |
| <dt id="apache_beam.io.filesystems.FileSystems.rename"> |
| <em class="property">static </em><code class="descname">rename</code><span class="sig-paren">(</span><em>source_file_names</em>, <em>destination_file_names</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystems.html#FileSystems.rename"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystems.FileSystems.rename" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Rename the files at the source list to the destination list. |
| Source and destination lists should be of the same size.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple"> |
| <li><strong>source_file_names</strong> – List of file paths that need to be moved</li> |
| <li><strong>destination_file_names</strong> – List of destination_file_names for the files</li> |
| </ul> |
| </td> |
| </tr> |
| <tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><p class="first last"><code class="docutils literal"><span class="pre">BeamIOError</span></code> if any of the rename operations fail</p> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="staticmethod"> |
| <dt id="apache_beam.io.filesystems.FileSystems.split"> |
| <em class="property">static </em><code class="descname">split</code><span class="sig-paren">(</span><em>path</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/filesystems.html#FileSystems.split"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.filesystems.FileSystems.split" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Splits the given path into two parts.</p> |
| <p>Splits the path into a pair (head, tail) such that tail contains the last |
| component of the path and head contains everything up to that.</p> |
| <p>For file-systems other than the local file-system, head should include the |
| prefix.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>path</strong> – path as a string</td> |
| </tr> |
| <tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body">a pair of path components as strings.</td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| </dd></dl> |
| |
| </div> |
| <div class="section" id="module-apache_beam.io.iobase"> |
| <span id="apache-beam-io-iobase-module"></span><h2>apache_beam.io.iobase module<a class="headerlink" href="#module-apache_beam.io.iobase" title="Permalink to this headline">¶</a></h2> |
| <p>Sources and sinks.</p> |
| <p>A Source manages record-oriented data input from a particular kind of source |
| (e.g. a set of files, a database table, etc.). The reader() method of a source |
| returns a reader object supporting the iterator protocol; iteration yields |
| raw records of unprocessed, serialized data.</p> |
| <p>A Sink manages record-oriented data output to a particular kind of sink |
| (e.g. a set of files, a database table, etc.). The writer() method of a sink |
| returns a writer object supporting writing records of serialized data to |
| the sink.</p> |
| <dl class="class"> |
| <dt id="apache_beam.io.iobase.BoundedSource"> |
| <em class="property">class </em><code class="descclassname">apache_beam.io.iobase.</code><code class="descname">BoundedSource</code><a class="reference internal" href="_modules/apache_beam/io/iobase.html#BoundedSource"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.BoundedSource" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Bases: <a class="reference internal" href="apache_beam.transforms.html#apache_beam.transforms.display.HasDisplayData" title="apache_beam.transforms.display.HasDisplayData"><code class="xref py py-class docutils literal"><span class="pre">apache_beam.transforms.display.HasDisplayData</span></code></a></p> |
| <p>A source that reads a finite amount of input records.</p> |
| <p>This class defines following operations which can be used to read the source |
| efficiently.</p> |
| <ul class="simple"> |
| <li>Size estimation - method <code class="docutils literal"><span class="pre">estimate_size()</span></code> may return an accurate |
| estimation in bytes for the size of the source.</li> |
| <li>Splitting into bundles of a given size - method <code class="docutils literal"><span class="pre">split()</span></code> can be used to |
| split the source into a set of sub-sources (bundles) based on a desired |
| bundle size.</li> |
| <li>Getting a RangeTracker - method <code class="docutils literal"><span class="pre">get_range_tracker()</span></code> should return a |
| <code class="docutils literal"><span class="pre">RangeTracker</span></code> object for a given position range for the position type |
| of the records returned by the source.</li> |
| <li>Reading the data - method <code class="docutils literal"><span class="pre">read()</span></code> can be used to read data from the |
| source while respecting the boundaries defined by a given |
| <code class="docutils literal"><span class="pre">RangeTracker</span></code>.</li> |
| </ul> |
| <p>A runner will perform reading the source in two steps.</p> |
| <ol class="arabic simple"> |
| <li>Method <code class="docutils literal"><span class="pre">get_range_tracker()</span></code> will be invoked with start and end |
| positions to obtain a <code class="docutils literal"><span class="pre">RangeTracker</span></code> for the range of positions the |
| runner intends to read. Source must define a default initial start and end |
| position range. These positions must be used if the start and/or end |
| positions passed to the method <code class="docutils literal"><span class="pre">get_range_tracker()</span></code> are <code class="docutils literal"><span class="pre">None</span></code></li> |
| <li>Method read() will be invoked with the <code class="docutils literal"><span class="pre">RangeTracker</span></code> obtained in the |
| previous step.</li> |
| </ol> |
| <p><strong>Mutability</strong></p> |
| <p>A <code class="docutils literal"><span class="pre">BoundedSource</span></code> object should not be mutated while |
| its methods (for example, <code class="docutils literal"><span class="pre">read()</span></code>) are being invoked by a runner. Runner |
| implementations may invoke methods of <code class="docutils literal"><span class="pre">BoundedSource</span></code> objects through |
| multi-threaded and/or reentrant execution modes.</p> |
| <dl class="method"> |
| <dt id="apache_beam.io.iobase.BoundedSource.default_output_coder"> |
| <code class="descname">default_output_coder</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/iobase.html#BoundedSource.default_output_coder"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.BoundedSource.default_output_coder" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Coder that should be used for the records returned by the source.</p> |
| <p>Should be overridden by sources that produce objects that can be encoded |
| more efficiently than pickling.</p> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.iobase.BoundedSource.estimate_size"> |
| <code class="descname">estimate_size</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/iobase.html#BoundedSource.estimate_size"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.BoundedSource.estimate_size" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Estimates the size of source in bytes.</p> |
| <p>An estimate of the total size (in bytes) of the data that would be read |
| from this source. This estimate is in terms of external storage size, |
| before performing decompression or other processing.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Returns:</th><td class="field-body">estimated size of the source if the size can be determined, <code class="docutils literal"><span class="pre">None</span></code> |
| otherwise.</td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.iobase.BoundedSource.get_range_tracker"> |
| <code class="descname">get_range_tracker</code><span class="sig-paren">(</span><em>start_position</em>, <em>stop_position</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/iobase.html#BoundedSource.get_range_tracker"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.BoundedSource.get_range_tracker" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Returns a RangeTracker for a given position range.</p> |
| <p>Framework may invoke <code class="docutils literal"><span class="pre">read()</span></code> method with the RangeTracker object returned |
| here to read data from the source.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple"> |
| <li><strong>start_position</strong> – starting position of the range. If ‘None’ default start |
| position of the source must be used.</li> |
| <li><strong>stop_position</strong> – ending position of the range. If ‘None’ default stop |
| position of the source must be used.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first last">a <code class="docutils literal"><span class="pre">RangeTracker</span></code> for the given position range.</p> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.iobase.BoundedSource.read"> |
| <code class="descname">read</code><span class="sig-paren">(</span><em>range_tracker</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/iobase.html#BoundedSource.read"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.BoundedSource.read" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Returns an iterator that reads data from the source.</p> |
| <p>The returned set of data must respect the boundaries defined by the given |
| <code class="docutils literal"><span class="pre">RangeTracker</span></code> object. For example:</p> |
| <blockquote> |
| <div><ul class="simple"> |
| <li>Returned set of data must be for the range |
| <code class="docutils literal"><span class="pre">[range_tracker.start_position,</span> <span class="pre">range_tracker.stop_position)</span></code>. Note |
| that a source may decide to return records that start after |
| <code class="docutils literal"><span class="pre">range_tracker.stop_position</span></code>. See documentation in class |
| <code class="docutils literal"><span class="pre">RangeTracker</span></code> for more details. Also, note that framework might |
| invoke <code class="docutils literal"><span class="pre">range_tracker.try_split()</span></code> to perform dynamic split |
| operations. range_tracker.stop_position may be updated |
| dynamically due to successful dynamic split operations.</li> |
| <li>Method <code class="docutils literal"><span class="pre">range_tracker.try_split()</span></code> must be invoked for every record |
| that starts at a split point.</li> |
| <li>Method <code class="docutils literal"><span class="pre">range_tracker.record_current_position()</span></code> may be invoked for |
| records that do not start at split points.</li> |
| </ul> |
| </div></blockquote> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>range_tracker</strong> – a <code class="docutils literal"><span class="pre">RangeTracker</span></code> whose boundaries must be respected |
| when reading data from the source. A runner that reads this |
| source muss pass a <code class="docutils literal"><span class="pre">RangeTracker</span></code> object that is not |
| <code class="docutils literal"><span class="pre">None</span></code>.</td> |
| </tr> |
| <tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body">an iterator of data read by the source.</td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.iobase.BoundedSource.split"> |
| <code class="descname">split</code><span class="sig-paren">(</span><em>desired_bundle_size</em>, <em>start_position=None</em>, <em>stop_position=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/iobase.html#BoundedSource.split"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.BoundedSource.split" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Splits the source into a set of bundles.</p> |
| <p>Bundles should be approximately of size <code class="docutils literal"><span class="pre">desired_bundle_size</span></code> bytes.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple"> |
| <li><strong>desired_bundle_size</strong> – the desired size (in bytes) of the bundles returned.</li> |
| <li><strong>start_position</strong> – if specified the given position must be used as the |
| starting position of the first bundle.</li> |
| <li><strong>stop_position</strong> – if specified the given position must be used as the ending |
| position of the last bundle.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first last">an iterator of objects of type ‘SourceBundle’ that gives information about |
| the generated bundles.</p> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| </dd></dl> |
| |
| <dl class="class"> |
| <dt id="apache_beam.io.iobase.RangeTracker"> |
| <em class="property">class </em><code class="descclassname">apache_beam.io.iobase.</code><code class="descname">RangeTracker</code><a class="reference internal" href="_modules/apache_beam/io/iobase.html#RangeTracker"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.RangeTracker" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Bases: <code class="xref py py-class docutils literal"><span class="pre">object</span></code></p> |
| <p>A thread safe object used by Dataflow source framework.</p> |
| <p>A Dataflow source is defined using a ‘’BoundedSource’’ and a ‘’RangeTracker’’ |
| pair. A ‘’RangeTracker’’ is used by Dataflow source framework to perform |
| dynamic work rebalancing of position-based sources.</p> |
| <p><strong>Position-based sources</strong></p> |
| <p>A position-based source is one where the source can be described by a range |
| of positions of an ordered type and the records returned by the reader can be |
| described by positions of the same type.</p> |
| <p>In case a record occupies a range of positions in the source, the most |
| important thing about the record is the position where it starts.</p> |
| <p>Defining the semantics of positions for a source is entirely up to the source |
| class, however the chosen definitions have to obey certain properties in order |
| to make it possible to correctly split the source into parts, including |
| dynamic splitting. Two main aspects need to be defined:</p> |
| <ol class="arabic simple"> |
| <li>How to assign starting positions to records.</li> |
| <li>Which records should be read by a source with a range ‘[A, B)’.</li> |
| </ol> |
| <p>Moreover, reading a range must be <em>efficient</em>, i.e., the performance of |
| reading a range should not significantly depend on the location of the range. |
| For example, reading the range [A, B) should not require reading all data |
| before ‘A’.</p> |
| <p>The sections below explain exactly what properties these definitions must |
| satisfy, and how to use a <code class="docutils literal"><span class="pre">RangeTracker</span></code> with a properly defined source.</p> |
| <p><strong>Properties of position-based sources</strong></p> |
| <p>The main requirement for position-based sources is <em>associativity</em>: reading |
| records from ‘[A, B)’ and records from ‘[B, C)’ should give the same |
| records as reading from ‘[A, C)’, where ‘A <= B <= C’. This property |
| ensures that no matter how a range of positions is split into arbitrarily many |
| sub-ranges, the total set of records described by them stays the same.</p> |
| <p>The other important property is how the source’s range relates to positions of |
| records in the source. In many sources each record can be identified by a |
| unique starting position. In this case:</p> |
| <ul class="simple"> |
| <li>All records returned by a source ‘[A, B)’ must have starting positions in |
| this range.</li> |
| <li>All but the last record should end within this range. The last record may or |
| may not extend past the end of the range.</li> |
| <li>Records should not overlap.</li> |
| </ul> |
| <p>Such sources should define “read ‘[A, B)’” as “read from the first record |
| starting at or after ‘A’, up to but not including the first record starting |
| at or after ‘B’”.</p> |
| <p>Some examples of such sources include reading lines or CSV from a text file, |
| reading keys and values from a BigTable, etc.</p> |
| <p>The concept of <em>split points</em> allows to extend the definitions for dealing |
| with sources where some records cannot be identified by a unique starting |
| position.</p> |
| <p>In all cases, all records returned by a source ‘[A, B)’ must <em>start</em> at or |
| after ‘A’.</p> |
| <p><strong>Split points</strong></p> |
| <p>Some sources may have records that are not directly addressable. For example, |
| imagine a file format consisting of a sequence of compressed blocks. Each |
| block can be assigned an offset, but records within the block cannot be |
| directly addressed without decompressing the block. Let us refer to this |
| hypothetical format as <i>CBF (Compressed Blocks Format)</i>.</p> |
| <p>Many such formats can still satisfy the associativity property. For example, |
| in CBF, reading ‘[A, B)’ can mean “read all the records in all blocks whose |
| starting offset is in ‘[A, B)’”.</p> |
| <p>To support such complex formats, we introduce the notion of <em>split points</em>. We |
| say that a record is a split point if there exists a position ‘A’ such that |
| the record is the first one to be returned when reading the range |
| ‘[A, infinity)’. In CBF, the only split points would be the first records |
| in each block.</p> |
| <p>Split points allow us to define the meaning of a record’s position and a |
| source’s range in all cases:</p> |
| <ul class="simple"> |
| <li>For a record that is at a split point, its position is defined to be the |
| largest ‘A’ such that reading a source with the range ‘[A, infinity)’ |
| returns this record.</li> |
| <li>Positions of other records are only required to be non-decreasing.</li> |
| <li>Reading the source ‘[A, B)’ must return records starting from the first |
| split point at or after ‘A’, up to but not including the first split point |
| at or after ‘B’. In particular, this means that the first record returned |
| by a source MUST always be a split point.</li> |
| <li>Positions of split points must be unique.</li> |
| </ul> |
| <p>As a result, for any decomposition of the full range of the source into |
| position ranges, the total set of records will be the full set of records in |
| the source, and each record will be read exactly once.</p> |
| <p><strong>Consumed positions</strong></p> |
| <p>As the source is being read, and records read from it are being passed to the |
| downstream transforms in the pipeline, we say that positions in the source are |
| being <em>consumed</em>. When a reader has read a record (or promised to a caller |
| that a record will be returned), positions up to and including the record’s |
| start position are considered <em>consumed</em>.</p> |
| <p>Dynamic splitting can happen only at <em>unconsumed</em> positions. If the reader |
| just returned a record at offset 42 in a file, dynamic splitting can happen |
| only at offset 43 or beyond, as otherwise that record could be read twice (by |
| the current reader and by a reader of the task starting at 43).</p> |
| <dl class="attribute"> |
| <dt id="apache_beam.io.iobase.RangeTracker.SPLIT_POINTS_UNKNOWN"> |
| <code class="descname">SPLIT_POINTS_UNKNOWN</code><em class="property"> = <object object></em><a class="headerlink" href="#apache_beam.io.iobase.RangeTracker.SPLIT_POINTS_UNKNOWN" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.iobase.RangeTracker.fraction_consumed"> |
| <code class="descname">fraction_consumed</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/iobase.html#RangeTracker.fraction_consumed"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.RangeTracker.fraction_consumed" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Returns the approximate fraction of consumed positions in the source.</p> |
| <p>** Thread safety **</p> |
| <p>Methods of the class <code class="docutils literal"><span class="pre">RangeTracker</span></code> including this method may get invoked |
| by different threads, hence must be made thread-safe, e.g. by using a single |
| lock object.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Returns:</th><td class="field-body">the approximate fraction of positions that have been consumed by |
| successful ‘try_split()’ and ‘report_current_position()’ calls, or |
| 0.0 if no such calls have happened.</td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.iobase.RangeTracker.position_at_fraction"> |
| <code class="descname">position_at_fraction</code><span class="sig-paren">(</span><em>fraction</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/iobase.html#RangeTracker.position_at_fraction"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.RangeTracker.position_at_fraction" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Returns the position at the given fraction.</p> |
| <p>Given a fraction within the range [0.0, 1.0) this method will return the |
| position at the given fraction compared to the position range |
| [self.start_position, self.stop_position).</p> |
| <p>** Thread safety **</p> |
| <p>Methods of the class <code class="docutils literal"><span class="pre">RangeTracker</span></code> including this method may get invoked |
| by different threads, hence must be made thread-safe, e.g. by using a single |
| lock object.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>fraction</strong> – a float value within the range [0.0, 1.0).</td> |
| </tr> |
| <tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body">a position within the range [self.start_position, self.stop_position).</td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.iobase.RangeTracker.set_current_position"> |
| <code class="descname">set_current_position</code><span class="sig-paren">(</span><em>position</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/iobase.html#RangeTracker.set_current_position"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.RangeTracker.set_current_position" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Updates the last-consumed position to the given position.</p> |
| <p>A source may invoke this method for records that do not start at split |
| points. This may modify the internal state of the <code class="docutils literal"><span class="pre">RangeTracker</span></code>. If the |
| record starts at a split point, method <code class="docutils literal"><span class="pre">try_claim()</span></code> <strong>must</strong> be invoked |
| instead of this method.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>position</strong> – starting position of a record being read by a source.</td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.iobase.RangeTracker.set_split_points_unclaimed_callback"> |
| <code class="descname">set_split_points_unclaimed_callback</code><span class="sig-paren">(</span><em>callback</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/iobase.html#RangeTracker.set_split_points_unclaimed_callback"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.RangeTracker.set_split_points_unclaimed_callback" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Sets a callback for determining the unclaimed number of split points.</p> |
| <p>By invoking this function, a <code class="docutils literal"><span class="pre">BoundedSource</span></code> can set a callback function |
| that may get invoked by the <code class="docutils literal"><span class="pre">RangeTracker</span></code> to determine the number of |
| unclaimed split points. A split point is unclaimed if |
| <code class="docutils literal"><span class="pre">RangeTracker.try_claim()</span></code> method has not been successfully invoked for |
| that particular split point. The callback function accepts a single |
| parameter, a stop position for the BoundedSource (stop_position). If the |
| record currently being consumed by the <code class="docutils literal"><span class="pre">BoundedSource</span></code> is at position |
| current_position, callback should return the number of split points within |
| the range (current_position, stop_position). Note that, this should not |
| include the split point that is currently being consumed by the source.</p> |
| <p>This function must be implemented by subclasses before being used.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>callback</strong> – a function that takes a single parameter, a stop position, |
| and returns unclaimed number of split points for the source read |
| operation that is calling this function. Value returned from |
| callback should be either an integer larger than or equal to |
| zero or <code class="docutils literal"><span class="pre">RangeTracker.SPLIT_POINTS_UNKNOWN</span></code>.</td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.iobase.RangeTracker.split_points"> |
| <code class="descname">split_points</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/iobase.html#RangeTracker.split_points"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.RangeTracker.split_points" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Gives the number of split points consumed and remaining.</p> |
| <p>For a <code class="docutils literal"><span class="pre">RangeTracker</span></code> used by a <code class="docutils literal"><span class="pre">BoundedSource</span></code> (within a |
| <code class="docutils literal"><span class="pre">BoundedSource.read()</span></code> invocation) this method produces a 2-tuple that |
| gives the number of split points consumed by the <code class="docutils literal"><span class="pre">BoundedSource</span></code> and the |
| number of split points remaining within the range of the <code class="docutils literal"><span class="pre">RangeTracker</span></code> |
| that has not been consumed by the <code class="docutils literal"><span class="pre">BoundedSource</span></code>.</p> |
| <p>More specifically, given that the position of the current record being read |
| by <code class="docutils literal"><span class="pre">BoundedSource</span></code> is current_position this method produces a tuple that |
| consists of |
| (1) number of split points in the range [self.start_position(), |
| current_position) without including the split point that is currently being |
| consumed. This represents the total amount of parallelism in the consumed |
| part of the source. |
| (2) number of split points within the range |
| [current_position, self.stop_position()) including the split point that is |
| currently being consumed. This represents the total amount of parallelism in |
| the unconsumed part of the source.</p> |
| <p>Methods of the class <code class="docutils literal"><span class="pre">RangeTracker</span></code> including this method may get invoked |
| by different threads, hence must be made thread-safe, e.g. by using a single |
| lock object.</p> |
| <dl class="docutils"> |
| <dt>** General information about consumed and remaining number of split</dt> |
| <dd><blockquote class="first"> |
| <div>points returned by this method. **</div></blockquote> |
| <ul class="last simple"> |
| <li>Before a source read (<code class="docutils literal"><span class="pre">BoundedSource.read()</span></code> invocation) claims the |
| first split point, number of consumed split points is 0. This condition |
| holds independent of whether the input is “splittable”. A splittable |
| source is a source that has more than one split point.</li> |
| <li>Any source read that has only claimed one split point has 0 consumed |
| split points since the first split point is the current split point and |
| is still being processed. This condition holds independent of whether |
| the input is splittable.</li> |
| <li>For an empty source read which never invokes |
| <code class="docutils literal"><span class="pre">RangeTracker.try_claim()</span></code>, the consumed number of split points is 0. |
| This condition holds independent of whether the input is splittable.</li> |
| <li>For a source read which has invoked <code class="docutils literal"><span class="pre">RangeTracker.try_claim()</span></code> n |
| times, the consumed number of split points is n -1.</li> |
| <li>If a <code class="docutils literal"><span class="pre">BoundedSource</span></code> sets a callback through function |
| <code class="docutils literal"><span class="pre">set_split_points_unclaimed_callback()</span></code>, <code class="docutils literal"><span class="pre">RangeTracker</span></code> can use that |
| callback when determining remaining number of split points.</li> |
| <li>Remaining split points should include the split point that is currently |
| being consumed by the source read. Hence if the above callback returns |
| an integer value n, remaining number of split points should be (n + 1).</li> |
| <li>After last split point is claimed remaining split points becomes 1, |
| because this unfinished read itself represents an unfinished split |
| point.</li> |
| <li>After all records of the source has been consumed, remaining number of |
| split points becomes 0 and consumed number of split points becomes equal |
| to the total number of split points within the range being read by the |
| source. This method does not address this condition and will continue to |
| report number of consumed split points as |
| (“total number of split points” - 1) and number of remaining split |
| points as 1. A runner that performs the reading of the source can |
| detect when all records have been consumed and adjust remaining and |
| consumed number of split points accordingly.</li> |
| </ul> |
| </dd> |
| </dl> |
| <p>** Examples **</p> |
| <ol class="arabic"> |
| <li><p class="first">A “perfectly splittable” input which can be read in parallel down to the |
| individual records.</p> |
| <p>Consider a perfectly splittable input that consists of 50 split points.</p> |
| </li> |
| </ol> |
| <blockquote> |
| <div><ul class="simple"> |
| <li>Before a source read (<code class="docutils literal"><span class="pre">BoundedSource.read()</span></code> invocation) claims the |
| first split point, number of consumed split points is 0 number of |
| remaining split points is 50.</li> |
| <li>After claiming first split point, consumed number of split points is 0 |
| and remaining number of split is 50.</li> |
| <li>After claiming split point #30, consumed number of split points is 29 |
| and remaining number of split points is 21.</li> |
| <li>After claiming all 50 split points, consumed number of split points is |
| 49 and remaining number of split points is 1.</li> |
| </ul> |
| </div></blockquote> |
| <ol class="arabic" start="2"> |
| <li><p class="first">a “block-compressed” file format such as <code class="docutils literal"><span class="pre">avroio</span></code>, in which a block of |
| records has to be read as a whole, but different blocks can be read in |
| parallel.</p> |
| <p>Consider a block compressed input that consists of 5 blocks.</p> |
| </li> |
| </ol> |
| <blockquote> |
| <div><ul class="simple"> |
| <li>Before a source read (<code class="docutils literal"><span class="pre">BoundedSource.read()</span></code> invocation) claims the |
| first split point (first block), number of consumed split points is 0 |
| number of remaining split points is 5.</li> |
| <li>After claiming first split point, consumed number of split points is 0 |
| and remaining number of split is 5.</li> |
| <li>After claiming split point #3, consumed number of split points is 2 |
| and remaining number of split points is 3.</li> |
| <li>After claiming all 5 split points, consumed number of split points is |
| 4 and remaining number of split points is 1.</li> |
| </ul> |
| </div></blockquote> |
| <ol class="arabic" start="3"> |
| <li><p class="first">an “unsplittable” input such as a cursor in a database or a gzip |
| compressed file.</p> |
| <p>Such an input is considered to have only a single split point. Number of |
| consumed split points is always 0 and number of remaining split points |
| is always 1.</p> |
| </li> |
| </ol> |
| <p>By default <code class="docutils literal"><span class="pre">RangeTracker`</span> <span class="pre">returns</span> <span class="pre">``RangeTracker.SPLIT_POINTS_UNKNOWN</span></code> for |
| both consumed and remaining number of split points, which indicates that the |
| number of split points consumed and remaining is unknown.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Returns:</th><td class="field-body">A pair that gives consumed and remaining number of split points. Consumed |
| number of split points should be an integer larger than or equal to zero |
| or <code class="docutils literal"><span class="pre">RangeTracker.SPLIT_POINTS_UNKNOWN</span></code>. Remaining number of split points |
| should be an integer larger than zero or |
| <code class="docutils literal"><span class="pre">RangeTracker.SPLIT_POINTS_UNKNOWN</span></code>.</td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.iobase.RangeTracker.start_position"> |
| <code class="descname">start_position</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/iobase.html#RangeTracker.start_position"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.RangeTracker.start_position" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Returns the starting position of the current range, inclusive.</p> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.iobase.RangeTracker.stop_position"> |
| <code class="descname">stop_position</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/iobase.html#RangeTracker.stop_position"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.RangeTracker.stop_position" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Returns the ending position of the current range, exclusive.</p> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.iobase.RangeTracker.try_claim"> |
| <code class="descname">try_claim</code><span class="sig-paren">(</span><em>position</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/iobase.html#RangeTracker.try_claim"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.RangeTracker.try_claim" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Atomically determines if a record at a split point is within the range.</p> |
| <p>This method should be called <strong>if and only if</strong> the record is at a split |
| point. This method may modify the internal state of the <code class="docutils literal"><span class="pre">RangeTracker</span></code> by |
| updating the last-consumed position to <code class="docutils literal"><span class="pre">position</span></code>.</p> |
| <p>** Thread safety **</p> |
| <p>Methods of the class <code class="docutils literal"><span class="pre">RangeTracker</span></code> including this method may get invoked |
| by different threads, hence must be made thread-safe, e.g. by using a single |
| lock object.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>position</strong> – starting position of a record being read by a source.</td> |
| </tr> |
| <tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><code class="docutils literal"><span class="pre">True</span></code>, if the given position falls within the current range, returns |
| <code class="docutils literal"><span class="pre">False</span></code> otherwise.</td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.iobase.RangeTracker.try_split"> |
| <code class="descname">try_split</code><span class="sig-paren">(</span><em>position</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/iobase.html#RangeTracker.try_split"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.RangeTracker.try_split" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Atomically splits the current range.</p> |
| <p>Determines a position to split the current range, split_position, based on |
| the given position. In most cases split_position and position will be the |
| same.</p> |
| <p>Splits the current range ‘[self.start_position, self.stop_position)’ |
| into a “primary” part ‘[self.start_position, split_position)’ and a |
| “residual” part ‘[split_position, self.stop_position)’, assuming the |
| current last-consumed position is within |
| ‘[self.start_position, split_position)’ (i.e., split_position has not been |
| consumed yet).</p> |
| <p>If successful, updates the current range to be the primary and returns a |
| tuple (split_position, split_fraction). split_fraction should be the |
| fraction of size of range ‘[self.start_position, split_position)’ compared |
| to the original (before split) range |
| ‘[self.start_position, self.stop_position)’.</p> |
| <p>If the split_position has already been consumed, returns <code class="docutils literal"><span class="pre">None</span></code>.</p> |
| <p>** Thread safety **</p> |
| <p>Methods of the class <code class="docutils literal"><span class="pre">RangeTracker</span></code> including this method may get invoked |
| by different threads, hence must be made thread-safe, e.g. by using a single |
| lock object.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>position</strong> – suggested position where the current range should try to |
| be split at.</td> |
| </tr> |
| <tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body">a tuple containing the split position and split fraction if split is |
| successful. Returns <code class="docutils literal"><span class="pre">None</span></code> otherwise.</td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| </dd></dl> |
| |
| <dl class="class"> |
| <dt id="apache_beam.io.iobase.Read"> |
| <em class="property">class </em><code class="descclassname">apache_beam.io.iobase.</code><code class="descname">Read</code><span class="sig-paren">(</span><em>source</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/iobase.html#Read"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.Read" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Bases: <a class="reference internal" href="apache_beam.transforms.html#apache_beam.transforms.ptransform.PTransform" title="apache_beam.transforms.ptransform.PTransform"><code class="xref py py-class docutils literal"><span class="pre">apache_beam.transforms.ptransform.PTransform</span></code></a></p> |
| <p>A transform that reads a PCollection.</p> |
| <dl class="method"> |
| <dt id="apache_beam.io.iobase.Read.display_data"> |
| <code class="descname">display_data</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/iobase.html#Read.display_data"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.Read.display_data" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.iobase.Read.expand"> |
| <code class="descname">expand</code><span class="sig-paren">(</span><em>pbegin</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/iobase.html#Read.expand"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.Read.expand" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.iobase.Read.get_windowing"> |
| <code class="descname">get_windowing</code><span class="sig-paren">(</span><em>unused_inputs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/iobase.html#Read.get_windowing"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.Read.get_windowing" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| </dd></dl> |
| |
| <dl class="class"> |
| <dt id="apache_beam.io.iobase.Sink"> |
| <em class="property">class </em><code class="descclassname">apache_beam.io.iobase.</code><code class="descname">Sink</code><a class="reference internal" href="_modules/apache_beam/io/iobase.html#Sink"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.Sink" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Bases: <a class="reference internal" href="apache_beam.transforms.html#apache_beam.transforms.display.HasDisplayData" title="apache_beam.transforms.display.HasDisplayData"><code class="xref py py-class docutils literal"><span class="pre">apache_beam.transforms.display.HasDisplayData</span></code></a></p> |
| <p>This class is deprecated, no backwards-compatibility guarantees.</p> |
| <p>A resource that can be written to using the <code class="docutils literal"><span class="pre">beam.io.Write</span></code> transform.</p> |
| <p>Here <code class="docutils literal"><span class="pre">beam</span></code> stands for Apache Beam Python code imported in following manner. |
| <code class="docutils literal"><span class="pre">import</span> <span class="pre">apache_beam</span> <span class="pre">as</span> <span class="pre">beam</span></code>.</p> |
| <p>A parallel write to an <code class="docutils literal"><span class="pre">iobase.Sink</span></code> consists of three phases:</p> |
| <ol class="arabic simple"> |
| <li>A sequential <em>initialization</em> phase (e.g., creating a temporary output |
| directory, etc.)</li> |
| <li>A parallel write phase where workers write <em>bundles</em> of records</li> |
| <li>A sequential <em>finalization</em> phase (e.g., committing the writes, merging |
| output files, etc.)</li> |
| </ol> |
| <p>Implementing a new sink requires extending two classes.</p> |
| <ol class="arabic simple"> |
| <li>iobase.Sink</li> |
| </ol> |
| <p><code class="docutils literal"><span class="pre">iobase.Sink</span></code> is an immutable logical description of the location/resource |
| to write to. Depending on the type of sink, it may contain fields such as the |
| path to an output directory on a filesystem, a database table name, |
| etc. <code class="docutils literal"><span class="pre">iobase.Sink</span></code> provides methods for performing a write operation to the |
| sink described by it. To this end, implementors of an extension of |
| <code class="docutils literal"><span class="pre">iobase.Sink</span></code> must implement three methods: |
| <code class="docutils literal"><span class="pre">initialize_write()</span></code>, <code class="docutils literal"><span class="pre">open_writer()</span></code>, and <code class="docutils literal"><span class="pre">finalize_write()</span></code>.</p> |
| <ol class="arabic simple" start="2"> |
| <li>iobase.Writer</li> |
| </ol> |
| <p><code class="docutils literal"><span class="pre">iobase.Writer</span></code> is used to write a single bundle of records. An |
| <code class="docutils literal"><span class="pre">iobase.Writer</span></code> defines two methods: <code class="docutils literal"><span class="pre">write()</span></code> which writes a |
| single record from the bundle and <code class="docutils literal"><span class="pre">close()</span></code> which is called once |
| at the end of writing a bundle.</p> |
| <p>See also <code class="docutils literal"><span class="pre">apache_beam.io.filebasedsink.FileBasedSink</span></code> which provides a |
| simpler API for writing sinks that produce files.</p> |
| <p><strong>Execution of the Write transform</strong></p> |
| <p><code class="docutils literal"><span class="pre">initialize_write()</span></code> and <code class="docutils literal"><span class="pre">finalize_write()</span></code> are conceptually called once: |
| at the beginning and end of a <code class="docutils literal"><span class="pre">Write</span></code> transform. However, implementors must |
| ensure that these methods are <em>idempotent</em>, as they may be called multiple |
| times on different machines in the case of failure/retry or for redundancy.</p> |
| <p><code class="docutils literal"><span class="pre">initialize_write()</span></code> should perform any initialization that needs to be done |
| prior to writing to the sink. <code class="docutils literal"><span class="pre">initialize_write()</span></code> may return a result |
| (let’s call this <code class="docutils literal"><span class="pre">init_result</span></code>) that contains any parameters it wants to |
| pass on to its writers about the sink. For example, a sink that writes to a |
| file system may return an <code class="docutils literal"><span class="pre">init_result</span></code> that contains a dynamically |
| generated unique directory to which data should be written.</p> |
| <p>To perform writing of a bundle of elements, Dataflow execution engine will |
| create an <code class="docutils literal"><span class="pre">iobase.Writer</span></code> using the implementation of |
| <code class="docutils literal"><span class="pre">iobase.Sink.open_writer()</span></code>. When invoking <code class="docutils literal"><span class="pre">open_writer()</span></code> execution |
| engine will provide the <code class="docutils literal"><span class="pre">init_result</span></code> returned by <code class="docutils literal"><span class="pre">initialize_write()</span></code> |
| invocation as well as a <em>bundle id</em> (let’s call this <code class="docutils literal"><span class="pre">bundle_id</span></code>) that is |
| unique for each invocation of <code class="docutils literal"><span class="pre">open_writer()</span></code>.</p> |
| <p>Execution engine will then invoke <code class="docutils literal"><span class="pre">iobase.Writer.write()</span></code> implementation for |
| each element that has to be written. Once all elements of a bundle are |
| written, execution engine will invoke <code class="docutils literal"><span class="pre">iobase.Writer.close()</span></code> implementation |
| which should return a result (let’s call this <code class="docutils literal"><span class="pre">write_result</span></code>) that contains |
| information that encodes the result of the write and, in most cases, some |
| encoding of the unique bundle id. For example, if each bundle is written to a |
| unique temporary file, <code class="docutils literal"><span class="pre">close()</span></code> method may return an object that contains |
| the temporary file name. After writing of all bundles is complete, execution |
| engine will invoke <code class="docutils literal"><span class="pre">finalize_write()</span></code> implementation. As parameters to this |
| invocation execution engine will provide <code class="docutils literal"><span class="pre">init_result</span></code> as well as an |
| iterable of <code class="docutils literal"><span class="pre">write_result</span></code>.</p> |
| <p>The execution of a write transform can be illustrated using following pseudo |
| code (assume that the outer for loop happens in parallel across many |
| machines):</p> |
| <div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">init_result</span> <span class="o">=</span> <span class="n">sink</span><span class="o">.</span><span class="n">initialize_write</span><span class="p">()</span> |
| <span class="n">write_results</span> <span class="o">=</span> <span class="p">[]</span> |
| <span class="k">for</span> <span class="n">bundle</span> <span class="ow">in</span> <span class="n">partition</span><span class="p">(</span><span class="n">pcoll</span><span class="p">):</span> |
| <span class="n">writer</span> <span class="o">=</span> <span class="n">sink</span><span class="o">.</span><span class="n">open_writer</span><span class="p">(</span><span class="n">init_result</span><span class="p">,</span> <span class="n">generate_bundle_id</span><span class="p">())</span> |
| <span class="k">for</span> <span class="n">elem</span> <span class="ow">in</span> <span class="n">bundle</span><span class="p">:</span> |
| <span class="n">writer</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">elem</span><span class="p">)</span> |
| <span class="n">write_results</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">writer</span><span class="o">.</span><span class="n">close</span><span class="p">())</span> |
| <span class="n">sink</span><span class="o">.</span><span class="n">finalize_write</span><span class="p">(</span><span class="n">init_result</span><span class="p">,</span> <span class="n">write_results</span><span class="p">)</span> |
| </pre></div> |
| </div> |
| <p><strong>init_result</strong></p> |
| <p>Methods of ‘iobase.Sink’ should agree on the ‘init_result’ type that will be |
| returned when initializing the sink. This type can be a client-defined object |
| or an existing type. The returned type must be picklable using Dataflow coder |
| <code class="docutils literal"><span class="pre">coders.PickleCoder</span></code>. Returning an init_result is optional.</p> |
| <p><strong>bundle_id</strong></p> |
| <p>In order to ensure fault-tolerance, a bundle may be executed multiple times |
| (e.g., in the event of failure/retry or for redundancy). However, exactly one |
| of these executions will have its result passed to the |
| <code class="docutils literal"><span class="pre">iobase.Sink.finalize_write()</span></code> method. Each call to |
| <code class="docutils literal"><span class="pre">iobase.Sink.open_writer()</span></code> is passed a unique bundle id when it is called |
| by the <code class="docutils literal"><span class="pre">WriteImpl</span></code> transform, so even redundant or retried bundles will have |
| a unique way of identifying their output.</p> |
| <p>The bundle id should be used to guarantee that a bundle’s output is unique. |
| This uniqueness guarantee is important; if a bundle is to be output to a file, |
| for example, the name of the file must be unique to avoid conflicts with other |
| writers. The bundle id should be encoded in the writer result returned by the |
| writer and subsequently used by the <code class="docutils literal"><span class="pre">finalize_write()</span></code> method to identify |
| the results of successful writes.</p> |
| <p>For example, consider the scenario where a Writer writes files containing |
| serialized records and the <code class="docutils literal"><span class="pre">finalize_write()</span></code> is to merge or rename these |
| output files. In this case, a writer may use its unique id to name its output |
| file (to avoid conflicts) and return the name of the file it wrote as its |
| writer result. The <code class="docutils literal"><span class="pre">finalize_write()</span></code> will then receive an <code class="docutils literal"><span class="pre">Iterable</span></code> of |
| output file names that it can then merge or rename using some bundle naming |
| scheme.</p> |
| <p><strong>write_result</strong></p> |
| <p><code class="docutils literal"><span class="pre">iobase.Writer.close()</span></code> and <code class="docutils literal"><span class="pre">finalize_write()</span></code> implementations must agree |
| on type of the <code class="docutils literal"><span class="pre">write_result</span></code> object returned when invoking |
| <code class="docutils literal"><span class="pre">iobase.Writer.close()</span></code>. This type can be a client-defined object or |
| an existing type. The returned type must be picklable using Dataflow coder |
| <code class="docutils literal"><span class="pre">coders.PickleCoder</span></code>. Returning a <code class="docutils literal"><span class="pre">write_result</span></code> when |
| <code class="docutils literal"><span class="pre">iobase.Writer.close()</span></code> is invoked is optional but if unique |
| <code class="docutils literal"><span class="pre">write_result</span></code> objects are not returned, sink should, guarantee idempotency |
| when same bundle is written multiple times due to failure/retry or redundancy.</p> |
| <p><strong>More information</strong></p> |
| <p>For more information on creating new sinks please refer to the official |
| documentation at |
| <code class="docutils literal"><span class="pre">https://beam.apache.org/documentation/sdks/python-custom-io#creating-sinks</span></code></p> |
| <dl class="method"> |
| <dt id="apache_beam.io.iobase.Sink.finalize_write"> |
| <code class="descname">finalize_write</code><span class="sig-paren">(</span><em>init_result</em>, <em>writer_results</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/iobase.html#Sink.finalize_write"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.Sink.finalize_write" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Finalizes the sink after all data is written to it.</p> |
| <p>Given the result of initialization and an iterable of results from bundle |
| writes, performs finalization after writing and closes the sink. Called |
| after all bundle writes are complete.</p> |
| <p>The bundle write results that are passed to finalize are those returned by |
| bundles that completed successfully. Although bundles may have been run |
| multiple times (for fault-tolerance), only one writer result will be passed |
| to finalize for each bundle. An implementation of finalize should perform |
| clean up of any failed and successfully retried bundles. Note that these |
| failed bundles will not have their writer result passed to finalize, so |
| finalize should be capable of locating any temporary/partial output written |
| by failed bundles.</p> |
| <p>If all retries of a bundle fails, the whole pipeline will fail <em>without</em> |
| finalize_write() being invoked.</p> |
| <p>A best practice is to make finalize atomic. If this is impossible given the |
| semantics of the sink, finalize should be idempotent, as it may be called |
| multiple times in the case of failure/retry or for redundancy.</p> |
| <p>Note that the iteration order of the writer results is not guaranteed to be |
| consistent if finalize is called multiple times.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple"> |
| <li><strong>init_result</strong> – the result of <code class="docutils literal"><span class="pre">initialize_write()</span></code> invocation.</li> |
| <li><strong>writer_results</strong> – an iterable containing results of <code class="docutils literal"><span class="pre">Writer.close()</span></code> |
| invocations. This will only contain results of successful writes, and |
| will only contain the result of a single successful write for a given |
| bundle.</li> |
| </ul> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.iobase.Sink.initialize_write"> |
| <code class="descname">initialize_write</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/iobase.html#Sink.initialize_write"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.Sink.initialize_write" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Initializes the sink before writing begins.</p> |
| <p>Invoked before any data is written to the sink.</p> |
| <p>Please see documentation in <code class="docutils literal"><span class="pre">iobase.Sink</span></code> for an example.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Returns:</th><td class="field-body">An object that contains any sink specific state generated by |
| initialization. This object will be passed to open_writer() and |
| finalize_write() methods.</td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.iobase.Sink.open_writer"> |
| <code class="descname">open_writer</code><span class="sig-paren">(</span><em>init_result</em>, <em>uid</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/iobase.html#Sink.open_writer"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.Sink.open_writer" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Opens a writer for writing a bundle of elements to the sink.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple"> |
| <li><strong>init_result</strong> – the result of initialize_write() invocation.</li> |
| <li><strong>uid</strong> – a unique identifier generated by the system.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first last">an <code class="docutils literal"><span class="pre">iobase.Writer</span></code> that can be used to write a bundle of records to the |
| current sink.</p> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| </dd></dl> |
| |
| <dl class="class"> |
| <dt id="apache_beam.io.iobase.Write"> |
| <em class="property">class </em><code class="descclassname">apache_beam.io.iobase.</code><code class="descname">Write</code><span class="sig-paren">(</span><em>sink</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/iobase.html#Write"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.Write" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Bases: <a class="reference internal" href="apache_beam.transforms.html#apache_beam.transforms.ptransform.PTransform" title="apache_beam.transforms.ptransform.PTransform"><code class="xref py py-class docutils literal"><span class="pre">apache_beam.transforms.ptransform.PTransform</span></code></a></p> |
| <p>A <code class="docutils literal"><span class="pre">PTransform</span></code> that writes to a sink.</p> |
| <p>A sink should inherit <code class="docutils literal"><span class="pre">iobase.Sink</span></code>. Such implementations are |
| handled using a composite transform that consists of three <code class="docutils literal"><span class="pre">ParDo``s</span> <span class="pre">-</span> |
| <span class="pre">(1)</span> <span class="pre">a</span> <span class="pre">``ParDo</span></code> performing a global initialization (2) a <code class="docutils literal"><span class="pre">ParDo</span></code> performing |
| a parallel write and (3) a <code class="docutils literal"><span class="pre">ParDo</span></code> performing a global finalization. In the |
| case of an empty <code class="docutils literal"><span class="pre">PCollection</span></code>, only the global initialization and |
| finalization will be performed. Currently only batch workflows support custom |
| sinks.</p> |
| <p>Example usage:</p> |
| <div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">pcollection</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">Write</span><span class="p">(</span><span class="n">MySink</span><span class="p">())</span> |
| </pre></div> |
| </div> |
| <p>This returns a <code class="docutils literal"><span class="pre">pvalue.PValue</span></code> object that represents the end of the |
| Pipeline.</p> |
| <p>The sink argument may also be a full PTransform, in which case it will be |
| applied directly. This allows composite sink-like transforms (e.g. a sink |
| with some pre-processing DoFns) to be used the same as all other sinks.</p> |
| <p>This transform also supports sinks that inherit <code class="docutils literal"><span class="pre">iobase.NativeSink</span></code>. These |
| are sinks that are implemented natively by the Dataflow service and hence |
| should not be updated by users. These sinks are processed using a Dataflow |
| native write transform.</p> |
| <dl class="method"> |
| <dt id="apache_beam.io.iobase.Write.display_data"> |
| <code class="descname">display_data</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/iobase.html#Write.display_data"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.Write.display_data" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.iobase.Write.expand"> |
| <code class="descname">expand</code><span class="sig-paren">(</span><em>pcoll</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/iobase.html#Write.expand"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.Write.expand" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| </dd></dl> |
| |
| <dl class="class"> |
| <dt id="apache_beam.io.iobase.Writer"> |
| <em class="property">class </em><code class="descclassname">apache_beam.io.iobase.</code><code class="descname">Writer</code><a class="reference internal" href="_modules/apache_beam/io/iobase.html#Writer"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.Writer" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Bases: <code class="xref py py-class docutils literal"><span class="pre">object</span></code></p> |
| <p>This class is deprecated, no backwards-compatibility guarantees.</p> |
| <p>Writes a bundle of elements from a <code class="docutils literal"><span class="pre">PCollection</span></code> to a sink.</p> |
| <p>A Writer <code class="docutils literal"><span class="pre">iobase.Writer.write()</span></code> writes and elements to the sink while |
| <code class="docutils literal"><span class="pre">iobase.Writer.close()</span></code> is called after all elements in the bundle have been |
| written.</p> |
| <p>See <code class="docutils literal"><span class="pre">iobase.Sink</span></code> for more detailed documentation about the process of |
| writing to a sink.</p> |
| <dl class="method"> |
| <dt id="apache_beam.io.iobase.Writer.close"> |
| <code class="descname">close</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/iobase.html#Writer.close"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.Writer.close" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Closes the current writer.</p> |
| <p>Please see documentation in <code class="docutils literal"><span class="pre">iobase.Sink</span></code> for an example.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Returns:</th><td class="field-body">An object representing the writes that were performed by the current |
| writer.</td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.iobase.Writer.write"> |
| <code class="descname">write</code><span class="sig-paren">(</span><em>value</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/iobase.html#Writer.write"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.iobase.Writer.write" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Writes a value to the sink using the current writer.</p> |
| </dd></dl> |
| |
| </dd></dl> |
| |
| </div> |
| <div class="section" id="module-apache_beam.io.localfilesystem"> |
| <span id="apache-beam-io-localfilesystem-module"></span><h2>apache_beam.io.localfilesystem module<a class="headerlink" href="#module-apache_beam.io.localfilesystem" title="Permalink to this headline">¶</a></h2> |
| <p>Local File system implementation for accessing files on disk.</p> |
| <dl class="class"> |
| <dt id="apache_beam.io.localfilesystem.LocalFileSystem"> |
| <em class="property">class </em><code class="descclassname">apache_beam.io.localfilesystem.</code><code class="descname">LocalFileSystem</code><a class="reference internal" href="_modules/apache_beam/io/localfilesystem.html#LocalFileSystem"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.localfilesystem.LocalFileSystem" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Bases: <a class="reference internal" href="#apache_beam.io.filesystem.FileSystem" title="apache_beam.io.filesystem.FileSystem"><code class="xref py py-class docutils literal"><span class="pre">apache_beam.io.filesystem.FileSystem</span></code></a></p> |
| <p>A Local <code class="docutils literal"><span class="pre">FileSystem</span></code> implementation for accessing files on disk.</p> |
| <dl class="method"> |
| <dt id="apache_beam.io.localfilesystem.LocalFileSystem.copy"> |
| <code class="descname">copy</code><span class="sig-paren">(</span><em>source_file_names</em>, <em>destination_file_names</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/localfilesystem.html#LocalFileSystem.copy"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.localfilesystem.LocalFileSystem.copy" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Recursively copy the file tree from the source to the destination</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple"> |
| <li><strong>source_file_names</strong> – list of source file objects that needs to be copied</li> |
| <li><strong>destination_file_names</strong> – list of destination of the new object</li> |
| </ul> |
| </td> |
| </tr> |
| <tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><p class="first last"><code class="docutils literal"><span class="pre">BeamIOError</span></code> if any of the copy operations fail</p> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.localfilesystem.LocalFileSystem.create"> |
| <code class="descname">create</code><span class="sig-paren">(</span><em>path</em>, <em>mime_type='application/octet-stream'</em>, <em>compression_type='auto'</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/localfilesystem.html#LocalFileSystem.create"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.localfilesystem.LocalFileSystem.create" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Returns a write channel for the given file path.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple"> |
| <li><strong>path</strong> – string path of the file object to be written to the system</li> |
| <li><strong>mime_type</strong> – MIME type to specify the type of content in the file object</li> |
| <li><strong>compression_type</strong> – Type of compression to be used for this object</li> |
| </ul> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <p>Returns: file handle with a close function for the user to use</p> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.localfilesystem.LocalFileSystem.delete"> |
| <code class="descname">delete</code><span class="sig-paren">(</span><em>paths</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/localfilesystem.html#LocalFileSystem.delete"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.localfilesystem.LocalFileSystem.delete" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Deletes files or directories at the provided paths. |
| Directories will be deleted recursively.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>paths</strong> – list of paths that give the file objects to be deleted</td> |
| </tr> |
| <tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><code class="docutils literal"><span class="pre">BeamIOError</span></code> if any of the delete operations fail</td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.localfilesystem.LocalFileSystem.exists"> |
| <code class="descname">exists</code><span class="sig-paren">(</span><em>path</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/localfilesystem.html#LocalFileSystem.exists"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.localfilesystem.LocalFileSystem.exists" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Check if the provided path exists on the FileSystem.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>path</strong> – string path that needs to be checked.</td> |
| </tr> |
| </tbody> |
| </table> |
| <p>Returns: boolean flag indicating if path exists</p> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.localfilesystem.LocalFileSystem.join"> |
| <code class="descname">join</code><span class="sig-paren">(</span><em>basepath</em>, <em>*paths</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/localfilesystem.html#LocalFileSystem.join"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.localfilesystem.LocalFileSystem.join" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Join two or more pathname components for the filesystem</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple"> |
| <li><strong>basepath</strong> – string path of the first component of the path</li> |
| <li><strong>paths</strong> – path components to be added</li> |
| </ul> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <p>Returns: full path after combining all the passed components</p> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.localfilesystem.LocalFileSystem.match"> |
| <code class="descname">match</code><span class="sig-paren">(</span><em>patterns</em>, <em>limits=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/localfilesystem.html#LocalFileSystem.match"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.localfilesystem.LocalFileSystem.match" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Find all matching paths to the pattern provided.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple"> |
| <li><strong>patterns</strong> – list of string for the file path pattern to match against</li> |
| <li><strong>limits</strong> – list of maximum number of responses that need to be fetched</li> |
| </ul> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <p>Returns: list of <code class="docutils literal"><span class="pre">MatchResult</span></code> objects.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Raises:</th><td class="field-body"><code class="docutils literal"><span class="pre">BeamIOError</span></code> if any of the pattern match operations fail</td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.localfilesystem.LocalFileSystem.mkdirs"> |
| <code class="descname">mkdirs</code><span class="sig-paren">(</span><em>path</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/localfilesystem.html#LocalFileSystem.mkdirs"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.localfilesystem.LocalFileSystem.mkdirs" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Recursively create directories for the provided path.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>path</strong> – string path of the directory structure that should be created</td> |
| </tr> |
| <tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body">IOError if leaf directory already exists.</td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.localfilesystem.LocalFileSystem.open"> |
| <code class="descname">open</code><span class="sig-paren">(</span><em>path</em>, <em>mime_type='application/octet-stream'</em>, <em>compression_type='auto'</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/localfilesystem.html#LocalFileSystem.open"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.localfilesystem.LocalFileSystem.open" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Returns a read channel for the given file path.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple"> |
| <li><strong>path</strong> – string path of the file object to be written to the system</li> |
| <li><strong>mime_type</strong> – MIME type to specify the type of content in the file object</li> |
| <li><strong>compression_type</strong> – Type of compression to be used for this object</li> |
| </ul> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <p>Returns: file handle with a close function for the user to use</p> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.localfilesystem.LocalFileSystem.rename"> |
| <code class="descname">rename</code><span class="sig-paren">(</span><em>source_file_names</em>, <em>destination_file_names</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/localfilesystem.html#LocalFileSystem.rename"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.localfilesystem.LocalFileSystem.rename" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Rename the files at the source list to the destination list. |
| Source and destination lists should be of the same size.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple"> |
| <li><strong>source_file_names</strong> – List of file paths that need to be moved</li> |
| <li><strong>destination_file_names</strong> – List of destination_file_names for the files</li> |
| </ul> |
| </td> |
| </tr> |
| <tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><p class="first last"><code class="docutils literal"><span class="pre">BeamIOError</span></code> if any of the rename operations fail</p> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="classmethod"> |
| <dt id="apache_beam.io.localfilesystem.LocalFileSystem.scheme"> |
| <em class="property">classmethod </em><code class="descname">scheme</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/localfilesystem.html#LocalFileSystem.scheme"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.localfilesystem.LocalFileSystem.scheme" title="Permalink to this definition">¶</a></dt> |
| <dd><p>URI scheme for the FileSystem</p> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.localfilesystem.LocalFileSystem.split"> |
| <code class="descname">split</code><span class="sig-paren">(</span><em>path</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/localfilesystem.html#LocalFileSystem.split"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.localfilesystem.LocalFileSystem.split" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Splits the given path into two parts.</p> |
| <p>Splits the path into a pair (head, tail) such that tail contains the last |
| component of the path and head contains everything up to that.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>path</strong> – path as a string</td> |
| </tr> |
| <tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body">a pair of path components as strings.</td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| </dd></dl> |
| |
| </div> |
| <div class="section" id="module-apache_beam.io.range_trackers"> |
| <span id="apache-beam-io-range-trackers-module"></span><h2>apache_beam.io.range_trackers module<a class="headerlink" href="#module-apache_beam.io.range_trackers" title="Permalink to this headline">¶</a></h2> |
| <p>iobase.RangeTracker implementations provided with Dataflow SDK.</p> |
| <dl class="class"> |
| <dt id="apache_beam.io.range_trackers.OffsetRangeTracker"> |
| <em class="property">class </em><code class="descclassname">apache_beam.io.range_trackers.</code><code class="descname">OffsetRangeTracker</code><span class="sig-paren">(</span><em>start</em>, <em>end</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#OffsetRangeTracker"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.OffsetRangeTracker" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Bases: <a class="reference internal" href="#apache_beam.io.iobase.RangeTracker" title="apache_beam.io.iobase.RangeTracker"><code class="xref py py-class docutils literal"><span class="pre">apache_beam.io.iobase.RangeTracker</span></code></a></p> |
| <p>A ‘RangeTracker’ for non-negative positions of type ‘long’.</p> |
| <dl class="attribute"> |
| <dt id="apache_beam.io.range_trackers.OffsetRangeTracker.OFFSET_INFINITY"> |
| <code class="descname">OFFSET_INFINITY</code><em class="property"> = inf</em><a class="headerlink" href="#apache_beam.io.range_trackers.OffsetRangeTracker.OFFSET_INFINITY" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.range_trackers.OffsetRangeTracker.fraction_consumed"> |
| <code class="descname">fraction_consumed</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#OffsetRangeTracker.fraction_consumed"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.OffsetRangeTracker.fraction_consumed" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="attribute"> |
| <dt id="apache_beam.io.range_trackers.OffsetRangeTracker.last_record_start"> |
| <code class="descname">last_record_start</code><a class="headerlink" href="#apache_beam.io.range_trackers.OffsetRangeTracker.last_record_start" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.range_trackers.OffsetRangeTracker.position_at_fraction"> |
| <code class="descname">position_at_fraction</code><span class="sig-paren">(</span><em>fraction</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#OffsetRangeTracker.position_at_fraction"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.OffsetRangeTracker.position_at_fraction" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.range_trackers.OffsetRangeTracker.set_current_position"> |
| <code class="descname">set_current_position</code><span class="sig-paren">(</span><em>record_start</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#OffsetRangeTracker.set_current_position"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.OffsetRangeTracker.set_current_position" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.range_trackers.OffsetRangeTracker.set_split_points_unclaimed_callback"> |
| <code class="descname">set_split_points_unclaimed_callback</code><span class="sig-paren">(</span><em>callback</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#OffsetRangeTracker.set_split_points_unclaimed_callback"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.OffsetRangeTracker.set_split_points_unclaimed_callback" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.range_trackers.OffsetRangeTracker.split_points"> |
| <code class="descname">split_points</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#OffsetRangeTracker.split_points"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.OffsetRangeTracker.split_points" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.range_trackers.OffsetRangeTracker.start_position"> |
| <code class="descname">start_position</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#OffsetRangeTracker.start_position"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.OffsetRangeTracker.start_position" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.range_trackers.OffsetRangeTracker.stop_position"> |
| <code class="descname">stop_position</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#OffsetRangeTracker.stop_position"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.OffsetRangeTracker.stop_position" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.range_trackers.OffsetRangeTracker.try_claim"> |
| <code class="descname">try_claim</code><span class="sig-paren">(</span><em>record_start</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#OffsetRangeTracker.try_claim"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.OffsetRangeTracker.try_claim" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.range_trackers.OffsetRangeTracker.try_split"> |
| <code class="descname">try_split</code><span class="sig-paren">(</span><em>split_offset</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#OffsetRangeTracker.try_split"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.OffsetRangeTracker.try_split" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| </dd></dl> |
| |
| <dl class="class"> |
| <dt id="apache_beam.io.range_trackers.LexicographicKeyRangeTracker"> |
| <em class="property">class </em><code class="descclassname">apache_beam.io.range_trackers.</code><code class="descname">LexicographicKeyRangeTracker</code><span class="sig-paren">(</span><em>start_position=None</em>, <em>stop_position=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#LexicographicKeyRangeTracker"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.LexicographicKeyRangeTracker" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Bases: <a class="reference internal" href="#apache_beam.io.range_trackers.OrderedPositionRangeTracker" title="apache_beam.io.range_trackers.OrderedPositionRangeTracker"><code class="xref py py-class docutils literal"><span class="pre">apache_beam.io.range_trackers.OrderedPositionRangeTracker</span></code></a></p> |
| <p>A range tracker that tracks progress through a lexicographically |
| ordered keyspace of strings.</p> |
| <dl class="classmethod"> |
| <dt id="apache_beam.io.range_trackers.LexicographicKeyRangeTracker.fraction_to_position"> |
| <em class="property">classmethod </em><code class="descname">fraction_to_position</code><span class="sig-paren">(</span><em>fraction</em>, <em>start=None</em>, <em>end=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#LexicographicKeyRangeTracker.fraction_to_position"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.LexicographicKeyRangeTracker.fraction_to_position" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Linearly interpolates a key that is lexicographically |
| fraction of the way between start and end.</p> |
| </dd></dl> |
| |
| <dl class="classmethod"> |
| <dt id="apache_beam.io.range_trackers.LexicographicKeyRangeTracker.position_to_fraction"> |
| <em class="property">classmethod </em><code class="descname">position_to_fraction</code><span class="sig-paren">(</span><em>key</em>, <em>start=None</em>, <em>end=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#LexicographicKeyRangeTracker.position_to_fraction"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.LexicographicKeyRangeTracker.position_to_fraction" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Returns the fraction of keys in the range [start, end) that |
| are less than the given key.</p> |
| </dd></dl> |
| |
| </dd></dl> |
| |
| <dl class="class"> |
| <dt id="apache_beam.io.range_trackers.OrderedPositionRangeTracker"> |
| <em class="property">class </em><code class="descclassname">apache_beam.io.range_trackers.</code><code class="descname">OrderedPositionRangeTracker</code><span class="sig-paren">(</span><em>start_position=None</em>, <em>stop_position=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#OrderedPositionRangeTracker"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.OrderedPositionRangeTracker" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Bases: <a class="reference internal" href="#apache_beam.io.iobase.RangeTracker" title="apache_beam.io.iobase.RangeTracker"><code class="xref py py-class docutils literal"><span class="pre">apache_beam.io.iobase.RangeTracker</span></code></a></p> |
| <p>An abstract base class for range trackers whose positions are comparable.</p> |
| <p>Subclasses only need to implement the mapping from position ranges |
| to and from the closed interval [0, 1].</p> |
| <dl class="attribute"> |
| <dt id="apache_beam.io.range_trackers.OrderedPositionRangeTracker.UNSTARTED"> |
| <code class="descname">UNSTARTED</code><em class="property"> = <object object></em><a class="headerlink" href="#apache_beam.io.range_trackers.OrderedPositionRangeTracker.UNSTARTED" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.range_trackers.OrderedPositionRangeTracker.fraction_consumed"> |
| <code class="descname">fraction_consumed</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#OrderedPositionRangeTracker.fraction_consumed"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.OrderedPositionRangeTracker.fraction_consumed" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.range_trackers.OrderedPositionRangeTracker.fraction_to_position"> |
| <code class="descname">fraction_to_position</code><span class="sig-paren">(</span><em>fraction</em>, <em>start</em>, <em>end</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#OrderedPositionRangeTracker.fraction_to_position"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.OrderedPositionRangeTracker.fraction_to_position" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Converts a fraction between 0 and 1 to a position between start and end.</p> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.range_trackers.OrderedPositionRangeTracker.position_at_fraction"> |
| <code class="descname">position_at_fraction</code><span class="sig-paren">(</span><em>fraction</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#OrderedPositionRangeTracker.position_at_fraction"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.OrderedPositionRangeTracker.position_at_fraction" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.range_trackers.OrderedPositionRangeTracker.position_to_fraction"> |
| <code class="descname">position_to_fraction</code><span class="sig-paren">(</span><em>pos</em>, <em>start</em>, <em>end</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#OrderedPositionRangeTracker.position_to_fraction"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.OrderedPositionRangeTracker.position_to_fraction" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Converts a position <cite>pos</cite> betweeen <cite>start</cite> and <cite>end</cite> (inclusive) to a |
| fraction between 0 and 1.</p> |
| </dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.range_trackers.OrderedPositionRangeTracker.start_position"> |
| <code class="descname">start_position</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#OrderedPositionRangeTracker.start_position"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.OrderedPositionRangeTracker.start_position" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.range_trackers.OrderedPositionRangeTracker.stop_position"> |
| <code class="descname">stop_position</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#OrderedPositionRangeTracker.stop_position"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.OrderedPositionRangeTracker.stop_position" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.range_trackers.OrderedPositionRangeTracker.try_claim"> |
| <code class="descname">try_claim</code><span class="sig-paren">(</span><em>position</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#OrderedPositionRangeTracker.try_claim"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.OrderedPositionRangeTracker.try_claim" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.range_trackers.OrderedPositionRangeTracker.try_split"> |
| <code class="descname">try_split</code><span class="sig-paren">(</span><em>position</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#OrderedPositionRangeTracker.try_split"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.OrderedPositionRangeTracker.try_split" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| </dd></dl> |
| |
| <dl class="class"> |
| <dt id="apache_beam.io.range_trackers.UnsplittableRangeTracker"> |
| <em class="property">class </em><code class="descclassname">apache_beam.io.range_trackers.</code><code class="descname">UnsplittableRangeTracker</code><span class="sig-paren">(</span><em>range_tracker</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#UnsplittableRangeTracker"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.UnsplittableRangeTracker" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Bases: <a class="reference internal" href="#apache_beam.io.iobase.RangeTracker" title="apache_beam.io.iobase.RangeTracker"><code class="xref py py-class docutils literal"><span class="pre">apache_beam.io.iobase.RangeTracker</span></code></a></p> |
| <p>A RangeTracker that always ignores split requests.</p> |
| <p>This can be used to make a given <code class="docutils literal"><span class="pre">RangeTracker</span></code> object unsplittable by |
| ignoring all calls to <code class="docutils literal"><span class="pre">try_split()</span></code>. All other calls will be delegated to |
| the given <code class="docutils literal"><span class="pre">RangeTracker</span></code>.</p> |
| <dl class="method"> |
| <dt id="apache_beam.io.range_trackers.UnsplittableRangeTracker.fraction_consumed"> |
| <code class="descname">fraction_consumed</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#UnsplittableRangeTracker.fraction_consumed"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.UnsplittableRangeTracker.fraction_consumed" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.range_trackers.UnsplittableRangeTracker.position_at_fraction"> |
| <code class="descname">position_at_fraction</code><span class="sig-paren">(</span><em>fraction</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#UnsplittableRangeTracker.position_at_fraction"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.UnsplittableRangeTracker.position_at_fraction" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.range_trackers.UnsplittableRangeTracker.set_current_position"> |
| <code class="descname">set_current_position</code><span class="sig-paren">(</span><em>position</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#UnsplittableRangeTracker.set_current_position"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.UnsplittableRangeTracker.set_current_position" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.range_trackers.UnsplittableRangeTracker.set_split_points_unclaimed_callback"> |
| <code class="descname">set_split_points_unclaimed_callback</code><span class="sig-paren">(</span><em>callback</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#UnsplittableRangeTracker.set_split_points_unclaimed_callback"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.UnsplittableRangeTracker.set_split_points_unclaimed_callback" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.range_trackers.UnsplittableRangeTracker.split_points"> |
| <code class="descname">split_points</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#UnsplittableRangeTracker.split_points"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.UnsplittableRangeTracker.split_points" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.range_trackers.UnsplittableRangeTracker.start_position"> |
| <code class="descname">start_position</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#UnsplittableRangeTracker.start_position"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.UnsplittableRangeTracker.start_position" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.range_trackers.UnsplittableRangeTracker.stop_position"> |
| <code class="descname">stop_position</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#UnsplittableRangeTracker.stop_position"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.UnsplittableRangeTracker.stop_position" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.range_trackers.UnsplittableRangeTracker.try_claim"> |
| <code class="descname">try_claim</code><span class="sig-paren">(</span><em>position</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#UnsplittableRangeTracker.try_claim"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.UnsplittableRangeTracker.try_claim" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| <dl class="method"> |
| <dt id="apache_beam.io.range_trackers.UnsplittableRangeTracker.try_split"> |
| <code class="descname">try_split</code><span class="sig-paren">(</span><em>position</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/range_trackers.html#UnsplittableRangeTracker.try_split"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.range_trackers.UnsplittableRangeTracker.try_split" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| </dd></dl> |
| |
| </div> |
| <div class="section" id="module-apache_beam.io.source_test_utils"> |
| <span id="apache-beam-io-source-test-utils-module"></span><h2>apache_beam.io.source_test_utils module<a class="headerlink" href="#module-apache_beam.io.source_test_utils" title="Permalink to this headline">¶</a></h2> |
| <p>Helper functions and test harnesses for source implementations.</p> |
| <p>This module contains helper functions and test harnesses for checking |
| correctness of source (a subclass of <code class="docutils literal"><span class="pre">iobase.BoundedSource</span></code>) and range |
| tracker (a subclass of``iobase.RangeTracker``) implementations.</p> |
| <p>Contains a few lightweight utilities (e.g. reading items from a source such as |
| <code class="docutils literal"><span class="pre">readFromSource()</span></code>, as well as heavyweight property testing and stress |
| testing harnesses that help getting a large amount of test coverage with few |
| code.</p> |
| <p>Most notable ones are: |
| * <code class="docutils literal"><span class="pre">assertSourcesEqualReferenceSource()</span></code> helps testing that the data read by |
| the union of sources produced by <code class="docutils literal"><span class="pre">BoundedSource.split()</span></code> is the same as data |
| read by the original source. |
| * If your source implements dynamic work rebalancing, use the |
| <code class="docutils literal"><span class="pre">assertSplitAtFraction()</span></code> family of functions - they test behavior of |
| <code class="docutils literal"><span class="pre">RangeTracker.try_split()</span></code>, in particular, that various consistency |
| properties are respected and the total set of data read by the source is |
| preserved when splits happen. Use <code class="docutils literal"><span class="pre">assertSplitAtFractionBehavior()</span></code> to test |
| individual cases of <code class="docutils literal"><span class="pre">RangeTracker.try_split()</span></code> and use |
| <code class="docutils literal"><span class="pre">assertSplitAtFractionExhaustive()</span></code> as a heavy-weight stress test including |
| concurrency. We strongly recommend to use both.</p> |
| <dl class="docutils"> |
| <dt>For example usages, see the unit tests of modules such as</dt> |
| <dd><ul class="first last simple"> |
| <li>apache_beam.io.source_test_utils_test.py</li> |
| <li>apache_beam.io.avroio_test.py</li> |
| </ul> |
| </dd> |
| </dl> |
| <dl class="function"> |
| <dt id="apache_beam.io.source_test_utils.read_from_source"> |
| <code class="descclassname">apache_beam.io.source_test_utils.</code><code class="descname">read_from_source</code><span class="sig-paren">(</span><em>source</em>, <em>start_position=None</em>, <em>stop_position=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/source_test_utils.html#read_from_source"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.source_test_utils.read_from_source" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Reads elements from the given <code class="docutils literal"><span class="pre">`BoundedSource`</span></code>.</p> |
| <p>Only reads elements within the given position range. |
| :param source: <code class="docutils literal"><span class="pre">iobase.BoundedSource</span></code> implementation. |
| :param start_position: start position for reading. |
| :param stop_position: stop position for reading.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Returns:</th><td class="field-body">the set of values read from the sources.</td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="function"> |
| <dt id="apache_beam.io.source_test_utils.assert_sources_equal_reference_source"> |
| <code class="descclassname">apache_beam.io.source_test_utils.</code><code class="descname">assert_sources_equal_reference_source</code><span class="sig-paren">(</span><em>reference_source_info</em>, <em>sources_info</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/source_test_utils.html#assert_sources_equal_reference_source"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.source_test_utils.assert_sources_equal_reference_source" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Tests if a reference source is equal to a given set of sources.</p> |
| <p>Given a reference source (a <code class="docutils literal"><span class="pre">BoundedSource</span></code> and a position range) and a |
| list of sources, assert that the union of the records |
| read from the list of sources is equal to the records read from the |
| reference source.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple"> |
| <li><strong>reference_source_info</strong> – a three-tuple that gives the reference |
| <code class="docutils literal"><span class="pre">iobase.BoundedSource</span></code>, position to start reading |
| at, and position to stop reading at.</li> |
| <li><strong>sources_info</strong> – a set of sources. Each source is a three-tuple that is of |
| the same format described above.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><p class="first last"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code> – if the set of data produced by the reference source and the |
| given set of sources are not equivalent.</p> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="function"> |
| <dt id="apache_beam.io.source_test_utils.assert_reentrant_reads_succeed"> |
| <code class="descclassname">apache_beam.io.source_test_utils.</code><code class="descname">assert_reentrant_reads_succeed</code><span class="sig-paren">(</span><em>source_info</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/source_test_utils.html#assert_reentrant_reads_succeed"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.source_test_utils.assert_reentrant_reads_succeed" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Tests if a given source can be read in a reentrant manner.</p> |
| <p>Assume that given source produces the set of values {v1, v2, v3, ... vn}. For |
| i in range [1, n-1] this method performs a reentrant read after reading i |
| elements and verifies that both the original and reentrant read produce the |
| expected set of values.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>source_info</strong> – a three-tuple that gives the reference |
| <code class="docutils literal"><span class="pre">iobase.BoundedSource</span></code>, position to start reading at, and a |
| position to stop reading at.</td> |
| </tr> |
| <tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code> – if source is too trivial or reentrant read result in an |
| incorrect read.</td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="function"> |
| <dt id="apache_beam.io.source_test_utils.assert_split_at_fraction_behavior"> |
| <code class="descclassname">apache_beam.io.source_test_utils.</code><code class="descname">assert_split_at_fraction_behavior</code><span class="sig-paren">(</span><em>source</em>, <em>num_items_to_read_before_split</em>, <em>split_fraction</em>, <em>expected_outcome</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/source_test_utils.html#assert_split_at_fraction_behavior"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.source_test_utils.assert_split_at_fraction_behavior" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Verifies the behaviour of splitting a source at a given fraction.</p> |
| <p>Asserts that splitting a <code class="docutils literal"><span class="pre">BoundedSource</span></code> either fails after reading |
| <code class="docutils literal"><span class="pre">num_items_to_read_before_split</span></code> items, or succeeds in a way that is |
| consistent according to <code class="docutils literal"><span class="pre">assertSplitAtFractionSucceedsAndConsistent()</span></code>.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple"> |
| <li><strong>source</strong> – the source to perform dynamic splitting on.</li> |
| <li><strong>num_items_to_read_before_split</strong> – number of items to read before splitting.</li> |
| <li><strong>split_fraction</strong> – fraction to split at.</li> |
| <li><strong>expected_outcome</strong> – a value from ‘ExpectedSplitOutcome’.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first last">a tuple that gives the number of items produced by reading the two ranges |
| produced after dynamic splitting. If splitting did not occur, the first |
| value of the tuple will represent the full set of records read by the |
| source while the second value of the tuple will be ‘-1’.</p> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="function"> |
| <dt id="apache_beam.io.source_test_utils.assert_split_at_fraction_binary"> |
| <code class="descclassname">apache_beam.io.source_test_utils.</code><code class="descname">assert_split_at_fraction_binary</code><span class="sig-paren">(</span><em>source</em>, <em>expected_items</em>, <em>num_items_to_read_before_split</em>, <em>left_fraction</em>, <em>left_result</em>, <em>right_fraction</em>, <em>right_result</em>, <em>stats</em>, <em>start_position=None</em>, <em>stop_position=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/source_test_utils.html#assert_split_at_fraction_binary"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.source_test_utils.assert_split_at_fraction_binary" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Performs dynamic work rebalancing for fractions within a given range.</p> |
| <p>Asserts that given a start position, a source can be split at every |
| interesting fraction (halfway between two fractions that differ by at |
| least one item) and the results are consistent if a split succeeds.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple"> |
| <li><strong>source</strong> – source to perform dynamic splitting on.</li> |
| <li><strong>expected_items</strong> – total set of items expected when reading the source.</li> |
| <li><strong>num_items_to_read_before_split</strong> – number of items to read before splitting.</li> |
| <li><strong>left_fraction</strong> – left fraction for binary splitting.</li> |
| <li><strong>left_result</strong> – result received by splitting at left fraction.</li> |
| <li><strong>right_fraction</strong> – right fraction for binary splitting.</li> |
| <li><strong>right_result</strong> – result received by splitting at right fraction.</li> |
| <li><strong>stats</strong> – a <code class="docutils literal"><span class="pre">SplitFractionStatistics</span></code> for storing results.</li> |
| </ul> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="function"> |
| <dt id="apache_beam.io.source_test_utils.assert_split_at_fraction_exhaustive"> |
| <code class="descclassname">apache_beam.io.source_test_utils.</code><code class="descname">assert_split_at_fraction_exhaustive</code><span class="sig-paren">(</span><em>source</em>, <em>start_position=None</em>, <em>stop_position=None</em>, <em>perform_multi_threaded_test=True</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/source_test_utils.html#assert_split_at_fraction_exhaustive"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.source_test_utils.assert_split_at_fraction_exhaustive" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Performs and tests dynamic work rebalancing exhaustively.</p> |
| <p>Asserts that for each possible start position, a source can be split at |
| every interesting fraction (halfway between two fractions that differ by at |
| least one item) and the results are consistent if a split succeeds. |
| Verifies multi threaded splitting as well.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple"> |
| <li><strong>source</strong> – the source to perform dynamic splitting on.</li> |
| <li><strong>perform_multi_threaded_test</strong> – if true performs a multi-threaded test |
| otherwise this test is skipped.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><p class="first last"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code> – if the exhaustive splitting test fails.</p> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="function"> |
| <dt id="apache_beam.io.source_test_utils.assert_split_at_fraction_fails"> |
| <code class="descclassname">apache_beam.io.source_test_utils.</code><code class="descname">assert_split_at_fraction_fails</code><span class="sig-paren">(</span><em>source</em>, <em>num_items_to_read_before_split</em>, <em>split_fraction</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/source_test_utils.html#assert_split_at_fraction_fails"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.source_test_utils.assert_split_at_fraction_fails" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Asserts that dynamic work rebalancing at a given fraction fails.</p> |
| <p>Asserts that trying to perform dynamic splitting after reading |
| ‘num_items_to_read_before_split’ items from the source fails.</p> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple"> |
| <li><strong>source</strong> – source to perform dynamic splitting on.</li> |
| <li><strong>num_items_to_read_before_split</strong> – number of items to read before splitting.</li> |
| <li><strong>split_fraction</strong> – fraction to split at.</li> |
| </ul> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| <dl class="function"> |
| <dt id="apache_beam.io.source_test_utils.assert_split_at_fraction_succeeds_and_consistent"> |
| <code class="descclassname">apache_beam.io.source_test_utils.</code><code class="descname">assert_split_at_fraction_succeeds_and_consistent</code><span class="sig-paren">(</span><em>source</em>, <em>num_items_to_read_before_split</em>, <em>split_fraction</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/source_test_utils.html#assert_split_at_fraction_succeeds_and_consistent"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.source_test_utils.assert_split_at_fraction_succeeds_and_consistent" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Verifies some consistency properties of dynamic work rebalancing.</p> |
| <p>Equivalent to the following pseudocode::</p> |
| <div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">original_range_tracker</span> <span class="o">=</span> <span class="n">source</span><span class="o">.</span><span class="n">getRangeTracker</span><span class="p">(</span><span class="kc">None</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span> |
| <span class="n">original_reader</span> <span class="o">=</span> <span class="n">source</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">original_range_tracker</span><span class="p">)</span> |
| <span class="n">items_before_split</span> <span class="o">=</span> <span class="n">read</span> <span class="n">N</span> <span class="n">items</span> <span class="kn">from</span> <span class="nn">original_reader</span> |
| <span class="n">suggested_split_position</span> <span class="o">=</span> <span class="n">original_range_tracker</span><span class="o">.</span><span class="n">position_for_fraction</span><span class="p">(</span> |
| <span class="n">split_fraction</span><span class="p">)</span> |
| <span class="n">original_stop_position</span> <span class="o">-</span> <span class="n">original_range_tracker</span><span class="o">.</span><span class="n">stop_position</span><span class="p">()</span> |
| <span class="n">split_result</span> <span class="o">=</span> <span class="n">range_tracker</span><span class="o">.</span><span class="n">try_split</span><span class="p">()</span> |
| <span class="n">split_position</span><span class="p">,</span> <span class="n">split_fraction</span> <span class="o">=</span> <span class="n">split_result</span> |
| <span class="n">primary_range_tracker</span> <span class="o">=</span> <span class="n">source</span><span class="o">.</span><span class="n">get_range_tracker</span><span class="p">(</span> |
| <span class="n">original_range_tracker</span><span class="o">.</span><span class="n">start_position</span><span class="p">(),</span> <span class="n">split_position</span><span class="p">)</span> |
| <span class="n">residual_range_tracker</span> <span class="o">=</span> <span class="n">source</span><span class="o">.</span><span class="n">get_range_tracker</span><span class="p">(</span><span class="n">split_position</span><span class="p">,</span> |
| <span class="n">original_stop_position</span><span class="p">)</span> |
| |
| <span class="k">assert</span> <span class="n">that</span><span class="p">:</span> <span class="n">items</span> <span class="n">when</span> <span class="n">reading</span> <span class="n">source</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">primary_range_tracker</span><span class="p">)</span> <span class="o">==</span> |
| <span class="n">items_before_split</span> <span class="o">+</span> <span class="n">items</span> <span class="kn">from</span> <span class="nn">continuing</span> <span class="n">to</span> <span class="n">read</span> <span class="s1">'original_reader'</span> |
| <span class="k">assert</span> <span class="n">that</span><span class="p">:</span> <span class="n">items</span> <span class="n">when</span> <span class="n">reading</span> <span class="n">source</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">original_range_tracker</span><span class="p">)</span> <span class="o">=</span> |
| <span class="n">items</span> <span class="n">when</span> <span class="n">reading</span> <span class="n">source</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">primary_range_tracker</span><span class="p">)</span> <span class="o">+</span> <span class="n">items</span> <span class="n">when</span> <span class="n">reading</span> |
| <span class="n">source</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">residual_range_tracker</span><span class="p">)</span> |
| </pre></div> |
| </div> |
| <table class="docutils field-list" frame="void" rules="none"> |
| <col class="field-name" /> |
| <col class="field-body" /> |
| <tbody valign="top"> |
| <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple"> |
| <li><strong>source</strong> – source to perform dynamic work rebalancing on.</li> |
| <li><strong>num_items_to_read_before_split</strong> – number of items to read before splitting.</li> |
| <li><strong>split_fraction</strong> – fraction to split at.</li> |
| </ul> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| </dd></dl> |
| |
| </div> |
| <div class="section" id="module-apache_beam.io.textio"> |
| <span id="apache-beam-io-textio-module"></span><h2>apache_beam.io.textio module<a class="headerlink" href="#module-apache_beam.io.textio" title="Permalink to this headline">¶</a></h2> |
| <p>A source and a sink for reading from and writing to text files.</p> |
| <dl class="class"> |
| <dt id="apache_beam.io.textio.ReadFromText"> |
| <em class="property">class </em><code class="descclassname">apache_beam.io.textio.</code><code class="descname">ReadFromText</code><span class="sig-paren">(</span><em>file_pattern=None</em>, <em>min_bundle_size=0</em>, <em>compression_type='auto'</em>, <em>strip_trailing_newlines=True</em>, <em>coder=StrUtf8Coder</em>, <em>validate=True</em>, <em>skip_header_lines=0</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/textio.html#ReadFromText"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.textio.ReadFromText" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Bases: <a class="reference internal" href="apache_beam.transforms.html#apache_beam.transforms.ptransform.PTransform" title="apache_beam.transforms.ptransform.PTransform"><code class="xref py py-class docutils literal"><span class="pre">apache_beam.transforms.ptransform.PTransform</span></code></a></p> |
| <p>A PTransform for reading text files.</p> |
| <p>Parses a text file as newline-delimited elements, by default assuming |
| UTF-8 encoding. Supports newline delimiters ‘n’ and ‘rn’.</p> |
| <p>This implementation only supports reading text encoded using UTF-8 or ASCII. |
| This does not support other encodings such as UTF-16 or UTF-32.</p> |
| <dl class="method"> |
| <dt id="apache_beam.io.textio.ReadFromText.expand"> |
| <code class="descname">expand</code><span class="sig-paren">(</span><em>pvalue</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/textio.html#ReadFromText.expand"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.textio.ReadFromText.expand" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| </dd></dl> |
| |
| <dl class="class"> |
| <dt id="apache_beam.io.textio.WriteToText"> |
| <em class="property">class </em><code class="descclassname">apache_beam.io.textio.</code><code class="descname">WriteToText</code><span class="sig-paren">(</span><em>file_path_prefix</em>, <em>file_name_suffix=''</em>, <em>append_trailing_newlines=True</em>, <em>num_shards=0</em>, <em>shard_name_template=None</em>, <em>coder=ToStringCoder</em>, <em>compression_type='auto'</em>, <em>header=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/textio.html#WriteToText"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.textio.WriteToText" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Bases: <a class="reference internal" href="apache_beam.transforms.html#apache_beam.transforms.ptransform.PTransform" title="apache_beam.transforms.ptransform.PTransform"><code class="xref py py-class docutils literal"><span class="pre">apache_beam.transforms.ptransform.PTransform</span></code></a></p> |
| <p>A PTransform for writing to text files.</p> |
| <dl class="method"> |
| <dt id="apache_beam.io.textio.WriteToText.expand"> |
| <code class="descname">expand</code><span class="sig-paren">(</span><em>pcoll</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/textio.html#WriteToText.expand"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.textio.WriteToText.expand" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| </dd></dl> |
| |
| </div> |
| <div class="section" id="module-apache_beam.io.tfrecordio"> |
| <span id="apache-beam-io-tfrecordio-module"></span><h2>apache_beam.io.tfrecordio module<a class="headerlink" href="#module-apache_beam.io.tfrecordio" title="Permalink to this headline">¶</a></h2> |
| <p>TFRecord sources and sinks.</p> |
| <dl class="class"> |
| <dt id="apache_beam.io.tfrecordio.ReadFromTFRecord"> |
| <em class="property">class </em><code class="descclassname">apache_beam.io.tfrecordio.</code><code class="descname">ReadFromTFRecord</code><span class="sig-paren">(</span><em>file_pattern</em>, <em>coder=BytesCoder</em>, <em>compression_type='auto'</em>, <em>validate=True</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/tfrecordio.html#ReadFromTFRecord"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.tfrecordio.ReadFromTFRecord" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Bases: <a class="reference internal" href="apache_beam.transforms.html#apache_beam.transforms.ptransform.PTransform" title="apache_beam.transforms.ptransform.PTransform"><code class="xref py py-class docutils literal"><span class="pre">apache_beam.transforms.ptransform.PTransform</span></code></a></p> |
| <p>Transform for reading TFRecord sources.</p> |
| <dl class="method"> |
| <dt id="apache_beam.io.tfrecordio.ReadFromTFRecord.expand"> |
| <code class="descname">expand</code><span class="sig-paren">(</span><em>pvalue</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/tfrecordio.html#ReadFromTFRecord.expand"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.tfrecordio.ReadFromTFRecord.expand" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| </dd></dl> |
| |
| <dl class="class"> |
| <dt id="apache_beam.io.tfrecordio.WriteToTFRecord"> |
| <em class="property">class </em><code class="descclassname">apache_beam.io.tfrecordio.</code><code class="descname">WriteToTFRecord</code><span class="sig-paren">(</span><em>file_path_prefix</em>, <em>coder=BytesCoder</em>, <em>file_name_suffix=''</em>, <em>num_shards=0</em>, <em>shard_name_template='-SSSSS-of-NNNNN'</em>, <em>compression_type='auto'</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/tfrecordio.html#WriteToTFRecord"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.tfrecordio.WriteToTFRecord" title="Permalink to this definition">¶</a></dt> |
| <dd><p>Bases: <a class="reference internal" href="apache_beam.transforms.html#apache_beam.transforms.ptransform.PTransform" title="apache_beam.transforms.ptransform.PTransform"><code class="xref py py-class docutils literal"><span class="pre">apache_beam.transforms.ptransform.PTransform</span></code></a></p> |
| <p>Transform for writing to TFRecord sinks.</p> |
| <dl class="method"> |
| <dt id="apache_beam.io.tfrecordio.WriteToTFRecord.expand"> |
| <code class="descname">expand</code><span class="sig-paren">(</span><em>pcoll</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/io/tfrecordio.html#WriteToTFRecord.expand"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.io.tfrecordio.WriteToTFRecord.expand" title="Permalink to this definition">¶</a></dt> |
| <dd></dd></dl> |
| |
| </dd></dl> |
| |
| </div> |
| <div class="section" id="module-apache_beam.io"> |
| <span id="module-contents"></span><h2>Module contents<a class="headerlink" href="#module-apache_beam.io" title="Permalink to this headline">¶</a></h2> |
| <p>A package defining several input sources and output sinks.</p> |
| </div> |
| </div> |
| |
| |
| </div> |
| </div> |
| </div> |
| <div class="clearer"></div> |
| </div> |
| <div class="related" role="navigation" aria-label="related navigation"> |
| <h3>Navigation</h3> |
| <ul> |
| <li class="right" style="margin-right: 10px"> |
| <a href="genindex.html" title="General Index" |
| >index</a></li> |
| <li class="right" > |
| <a href="py-modindex.html" title="Python Module Index" |
| >modules</a> |</li> |
| <li class="right" > |
| <a href="apache_beam.io.gcp.html" title="apache_beam.io.gcp package" |
| >next</a> |</li> |
| <li class="right" > |
| <a href="apache_beam.internal.gcp.html" title="apache_beam.internal.gcp package" |
| >previous</a> |</li> |
| <li class="nav-item nav-item-0"><a href="index.html">Apache Beam documentation</a> »</li> |
| <li class="nav-item nav-item-1"><a href="apache_beam.html" >apache_beam package</a> »</li> |
| </ul> |
| </div> |
| <div class="footer" role="contentinfo"> |
| © Copyright . |
| Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.5.5. |
| </div> |
| </body> |
| </html> |