blob: 942650fe09dfff4f43aefd658ca2cb015c7ef5b2 [file] [log] [blame]
<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>apache_beam.dataframe.convert module &mdash; Apache Beam documentation</title>
<script type="text/javascript" src="_static/js/modernizr.min.js"></script>
<script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script type="text/javascript" src="_static/jquery.js"></script>
<script type="text/javascript" src="_static/underscore.js"></script>
<script type="text/javascript" src="_static/doctools.js"></script>
<script type="text/javascript" src="_static/language_data.js"></script>
<script async="async" type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/latest.js?config=TeX-AMS-MML_HTMLorMML"></script>
<script type="text/javascript" src="_static/js/theme.js"></script>
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="apache_beam.dataframe.doctests module" href="apache_beam.dataframe.doctests.html" />
<link rel="prev" title="apache_beam.dataframe package" href="apache_beam.dataframe.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="index.html" class="icon icon-home"> Apache Beam
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="apache_beam.coders.html">apache_beam.coders package</a></li>
<li class="toctree-l1 current"><a class="reference internal" href="apache_beam.dataframe.html">apache_beam.dataframe package</a><ul class="current">
<li class="toctree-l2 current"><a class="reference internal" href="apache_beam.dataframe.html#submodules">Submodules</a><ul class="current">
<li class="toctree-l3 current"><a class="current reference internal" href="#">apache_beam.dataframe.convert module</a></li>
<li class="toctree-l3"><a class="reference internal" href="apache_beam.dataframe.doctests.html">apache_beam.dataframe.doctests module</a></li>
<li class="toctree-l3"><a class="reference internal" href="apache_beam.dataframe.expressions.html">apache_beam.dataframe.expressions module</a></li>
<li class="toctree-l3"><a class="reference internal" href="apache_beam.dataframe.frame_base.html">apache_beam.dataframe.frame_base module</a></li>
<li class="toctree-l3"><a class="reference internal" href="apache_beam.dataframe.frames.html">apache_beam.dataframe.frames module</a></li>
<li class="toctree-l3"><a class="reference internal" href="apache_beam.dataframe.io.html">apache_beam.dataframe.io module</a></li>
<li class="toctree-l3"><a class="reference internal" href="apache_beam.dataframe.pandas_top_level_functions.html">apache_beam.dataframe.pandas_top_level_functions module</a></li>
<li class="toctree-l3"><a class="reference internal" href="apache_beam.dataframe.partitionings.html">apache_beam.dataframe.partitionings module</a></li>
<li class="toctree-l3"><a class="reference internal" href="apache_beam.dataframe.schemas.html">apache_beam.dataframe.schemas module</a></li>
<li class="toctree-l3"><a class="reference internal" href="apache_beam.dataframe.transforms.html">apache_beam.dataframe.transforms module</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="apache_beam.io.html">apache_beam.io package</a></li>
<li class="toctree-l1"><a class="reference internal" href="apache_beam.metrics.html">apache_beam.metrics package</a></li>
<li class="toctree-l1"><a class="reference internal" href="apache_beam.ml.html">apache_beam.ml package</a></li>
<li class="toctree-l1"><a class="reference internal" href="apache_beam.options.html">apache_beam.options package</a></li>
<li class="toctree-l1"><a class="reference internal" href="apache_beam.portability.html">apache_beam.portability package</a></li>
<li class="toctree-l1"><a class="reference internal" href="apache_beam.runners.html">apache_beam.runners package</a></li>
<li class="toctree-l1"><a class="reference internal" href="apache_beam.transforms.html">apache_beam.transforms package</a></li>
<li class="toctree-l1"><a class="reference internal" href="apache_beam.typehints.html">apache_beam.typehints package</a></li>
<li class="toctree-l1"><a class="reference internal" href="apache_beam.utils.html">apache_beam.utils package</a></li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="apache_beam.error.html">apache_beam.error module</a></li>
<li class="toctree-l1"><a class="reference internal" href="apache_beam.pipeline.html">apache_beam.pipeline module</a></li>
<li class="toctree-l1"><a class="reference internal" href="apache_beam.pvalue.html">apache_beam.pvalue module</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="index.html">Apache Beam</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="index.html">Docs</a> &raquo;</li>
<li><a href="apache_beam.dataframe.html">apache_beam.dataframe package</a> &raquo;</li>
<li>apache_beam.dataframe.convert module</li>
<li class="wy-breadcrumbs-aside">
<a href="_sources/apache_beam.dataframe.convert.rst.txt" rel="nofollow"> View page source</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<div class="section" id="module-apache_beam.dataframe.convert">
<span id="apache-beam-dataframe-convert-module"></span><h1>apache_beam.dataframe.convert module<a class="headerlink" href="#module-apache_beam.dataframe.convert" title="Permalink to this headline"></a></h1>
<dl class="function">
<dt id="apache_beam.dataframe.convert.to_dataframe">
<code class="descclassname">apache_beam.dataframe.convert.</code><code class="descname">to_dataframe</code><span class="sig-paren">(</span><em>pcoll</em>, <em>proxy=None</em>, <em>label=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/apache_beam/dataframe/convert.html#to_dataframe"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.dataframe.convert.to_dataframe" title="Permalink to this definition"></a></dt>
<dd><p>Converts a PCollection to a deferred dataframe-like object, which can
manipulated with pandas methods like <cite>filter</cite> and <cite>groupby</cite>.</p>
<p>For example, one might write:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">pcoll</span> <span class="o">=</span> <span class="o">...</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">to_dataframe</span><span class="p">(</span><span class="n">pcoll</span><span class="p">,</span> <span class="n">proxy</span><span class="o">=...</span><span class="p">)</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">groupby</span><span class="p">(</span><span class="s1">&#39;col&#39;</span><span class="p">)</span><span class="o">.</span><span class="n">sum</span><span class="p">()</span>
<span class="n">pcoll_result</span> <span class="o">=</span> <span class="n">to_pcollection</span><span class="p">(</span><span class="n">result</span><span class="p">)</span>
</pre></div>
</div>
<p>A proxy object must be given if the schema for the PCollection is not known.</p>
</dd></dl>
<dl class="function">
<dt id="apache_beam.dataframe.convert.to_pcollection">
<code class="descclassname">apache_beam.dataframe.convert.</code><code class="descname">to_pcollection</code><span class="sig-paren">(</span><em>*dataframes</em>, <em>label=None</em>, <em>always_return_tuple=False</em>, <em>yield_elements='schemas'</em>, <em>include_indexes=False</em>, <em>pipeline=None</em><span class="sig-paren">)</span> &#x2192; Union[apache_beam.pvalue.PCollection, Tuple[apache_beam.pvalue.PCollection, ...]]<a class="reference internal" href="_modules/apache_beam/dataframe/convert.html#to_pcollection"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#apache_beam.dataframe.convert.to_pcollection" title="Permalink to this definition"></a></dt>
<dd><p>Converts one or more deferred dataframe-like objects back to a PCollection.</p>
<p>This method creates and applies the actual Beam operations that compute
the given deferred dataframes, returning a PCollection of their results. By
default the resulting PCollections are schema-aware PCollections where each
element is one row from the output dataframes, excluding indexes. This
behavior can be modified with the <cite>yield_elements</cite> and <cite>include_indexes</cite>
arguments.</p>
<p>Also accepts non-deferred pandas dataframes, which are converted to deferred,
schema’d PCollections. In this case the contents of the entire dataframe are
serialized into the graph, so for large amounts of data it is preferable to
write them to disk and read them with one of the read methods.</p>
<p>If more than one (related) result is desired, it can be more efficient to
pass them all at the same time to this method.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>label</strong> – (optional, default “ToPCollection(…)””) the label to use for the
conversion transform.</li>
<li><strong>always_return_tuple</strong> – (optional, default: False) If true, always return
a tuple of PCollections, even if there’s only one output.</li>
<li><strong>yield_elements</strong> – (optional, default: “schemas”) If set to “pandas”, return
PCollections containing the raw Pandas objects (DataFrames or Series),
if set to “schemas”, return an element-wise PCollection, where DataFrame
and Series instances are expanded to one element per row. DataFrames are
converted to schema-aware PCollections, where column values can be
accessed by attribute.</li>
<li><strong>include_indexes</strong> – (optional, default: False) When yield_elements=”schemas”,
if include_indexes=True, attempt to include index columns in the output
schema for expanded DataFrames. Raises an error if any of the index
levels are unnamed (name=None), or if any of the names are not unique
among all column and index names.</li>
<li><strong>pipeline</strong> – (optional, unless non-deferred dataframes are passed) Used when
creating a PCollection from a non-deferred dataframe.</li>
</ul>
</td>
</tr>
</tbody>
</table>
</dd></dl>
</div>
</div>
</div>
<footer>
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
<a href="apache_beam.dataframe.doctests.html" class="btn btn-neutral float-right" title="apache_beam.dataframe.doctests module" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right"></span></a>
<a href="apache_beam.dataframe.html" class="btn btn-neutral float-left" title="apache_beam.dataframe package" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left"></span> Previous</a>
</div>
<hr/>
<div role="contentinfo">
<p>
&copy; Copyright
</p>
</div>
Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/rtfd/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script type="text/javascript">
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>