blob: 60cef35b76319d1e486d3720da113e8ea0d40f8e [file] [log] [blame]
<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"><meta name="viewport" content="width=device-width, initial-scale=1.0"><meta name="generator" content="rustdoc"><meta name="description" content="A builder used to construct a `ParquetRecordBatchStream` for `async` reading of a parquet file"><title>ParquetRecordBatchStreamBuilder in parquet::arrow::async_reader - Rust</title><script>if(window.location.protocol!=="file:")document.head.insertAdjacentHTML("beforeend","SourceSerif4-Regular-46f98efaafac5295.ttf.woff2,FiraSans-Regular-018c141bf0843ffd.woff2,FiraSans-Medium-8f9a781e4970d388.woff2,SourceCodePro-Regular-562dcc5011b6de7d.ttf.woff2,SourceCodePro-Semibold-d899c5a5c4aeb14a.ttf.woff2".split(",").map(f=>`<link rel="preload" as="font" type="font/woff2" crossorigin href="../../../static.files/${f}">`).join(""))</script><link rel="stylesheet" href="../../../static.files/normalize-76eba96aa4d2e634.css"><link rel="stylesheet" href="../../../static.files/rustdoc-dd39b87e5fcfba68.css"><meta name="rustdoc-vars" data-root-path="../../../" data-static-root-path="../../../static.files/" data-current-crate="parquet" data-themes="" data-resource-suffix="" data-rustdoc-version="1.81.0-nightly (d7f6ebace 2024-06-16)" data-channel="nightly" data-search-js="search-0fe7219eb170c82e.js" data-settings-js="settings-4313503d2e1961c2.js" ><script src="../../../static.files/storage-118b08c4c78b968e.js"></script><script defer src="sidebar-items.js"></script><script defer src="../../../static.files/main-20a3ad099b048cf2.js"></script><noscript><link rel="stylesheet" href="../../../static.files/noscript-df360f571f6edeae.css"></noscript><link rel="alternate icon" type="image/png" href="../../../static.files/favicon-32x32-422f7d1d52889060.png"><link rel="icon" type="image/svg+xml" href="../../../static.files/favicon-2c020d218678b618.svg"></head><body class="rustdoc type"><!--[if lte IE 11]><div class="warning">This old browser is unsupported and will most likely display funky things.</div><![endif]--><nav class="mobile-topbar"><button class="sidebar-menu-toggle" title="show sidebar"></button></nav><nav class="sidebar"><div class="sidebar-crate"><h2><a href="../../../parquet/index.html">parquet</a><span class="version">52.0.0</span></h2></div><h2 class="location"><a href="#">ParquetRecordBatchStreamBuilder</a></h2><div class="sidebar-elems"><section><h3><a href="#aliased-type">Aliased type</a></h3><h3><a href="#fields">Fields</a></h3><ul class="block field"><li><a href="#structfield.batch_size">batch_size</a></li><li><a href="#structfield.fields">fields</a></li><li><a href="#structfield.filter">filter</a></li><li><a href="#structfield.input">input</a></li><li><a href="#structfield.limit">limit</a></li><li><a href="#structfield.metadata">metadata</a></li><li><a href="#structfield.offset">offset</a></li><li><a href="#structfield.projection">projection</a></li><li><a href="#structfield.row_groups">row_groups</a></li><li><a href="#structfield.schema">schema</a></li><li><a href="#structfield.selection">selection</a></li></ul><h3><a href="#implementations">Methods</a></h3><ul class="block method"><li><a href="#method.build">build</a></li><li><a href="#method.get_row_group_column_bloom_filter">get_row_group_column_bloom_filter</a></li><li><a href="#method.new">new</a></li><li><a href="#method.new_with_metadata">new_with_metadata</a></li><li><a href="#method.new_with_options">new_with_options</a></li></ul></section><h2><a href="index.html">In parquet::arrow::async_reader</a></h2></div></nav><div class="sidebar-resizer"></div><main><div class="width-limiter"><rustdoc-search></rustdoc-search><section id="main-content" class="content"><div class="main-heading"><h1>Type Alias <a href="../../index.html">parquet</a>::<wbr><a href="../index.html">arrow</a>::<wbr><a href="index.html">async_reader</a>::<wbr><a class="type" href="#">ParquetRecordBatchStreamBuilder</a><button id="copy-path" title="Copy item path to clipboard">Copy item path</button></h1><span class="out-of-band"><a class="src" href="../../../src/parquet/arrow/async_reader/mod.rs.html#244">source</a> · <button id="toggle-all-docs" title="collapse all docs">[<span>&#x2212;</span>]</button></span></div><pre class="rust item-decl"><code>pub type ParquetRecordBatchStreamBuilder&lt;T&gt; = <a class="struct" href="../arrow_reader/struct.ArrowReaderBuilder.html" title="struct parquet::arrow::arrow_reader::ArrowReaderBuilder">ArrowReaderBuilder</a>&lt;AsyncReader&lt;T&gt;&gt;;</code></pre><details class="toggle top-doc" open><summary class="hideme"><span>Expand description</span></summary><div class="docblock"><p>A builder used to construct a <a href="struct.ParquetRecordBatchStream.html" title="struct parquet::arrow::async_reader::ParquetRecordBatchStream"><code>ParquetRecordBatchStream</code></a> for <code>async</code> reading of a parquet file</p>
<p>In particular, this handles reading the parquet file metadata, allowing consumers
to use this information to select what specific columns, row groups, etc…
they wish to be read by the resulting stream</p>
<p>See <a href="../arrow_reader/struct.ArrowReaderBuilder.html" title="struct parquet::arrow::arrow_reader::ArrowReaderBuilder"><code>ArrowReaderBuilder</code></a> for additional member functions</p>
</div></details><h2 id="aliased-type" class="section-header">Aliased Type<a href="#aliased-type" class="anchor">§</a></h2><pre class="rust item-decl"><code>struct ParquetRecordBatchStreamBuilder&lt;T&gt; {
pub(crate) input: AsyncReader&lt;T&gt;,
pub(crate) metadata: <a class="struct" href="https://doc.rust-lang.org/nightly/alloc/sync/struct.Arc.html" title="struct alloc::sync::Arc">Arc</a>&lt;<a class="struct" href="../../file/metadata/struct.ParquetMetaData.html" title="struct parquet::file::metadata::ParquetMetaData">ParquetMetaData</a>&gt;,
pub(crate) schema: <a class="struct" href="https://doc.rust-lang.org/nightly/alloc/sync/struct.Arc.html" title="struct alloc::sync::Arc">Arc</a>&lt;Schema&gt;,
pub(crate) fields: <a class="enum" href="https://doc.rust-lang.org/nightly/core/option/enum.Option.html" title="enum core::option::Option">Option</a>&lt;<a class="struct" href="https://doc.rust-lang.org/nightly/alloc/sync/struct.Arc.html" title="struct alloc::sync::Arc">Arc</a>&lt;ParquetField&gt;&gt;,
pub(crate) batch_size: <a class="primitive" href="https://doc.rust-lang.org/nightly/std/primitive.usize.html">usize</a>,
pub(crate) row_groups: <a class="enum" href="https://doc.rust-lang.org/nightly/core/option/enum.Option.html" title="enum core::option::Option">Option</a>&lt;<a class="struct" href="https://doc.rust-lang.org/nightly/alloc/vec/struct.Vec.html" title="struct alloc::vec::Vec">Vec</a>&lt;<a class="primitive" href="https://doc.rust-lang.org/nightly/std/primitive.usize.html">usize</a>&gt;&gt;,
pub(crate) projection: <a class="struct" href="../struct.ProjectionMask.html" title="struct parquet::arrow::ProjectionMask">ProjectionMask</a>,
pub(crate) filter: <a class="enum" href="https://doc.rust-lang.org/nightly/core/option/enum.Option.html" title="enum core::option::Option">Option</a>&lt;<a class="struct" href="../arrow_reader/filter/struct.RowFilter.html" title="struct parquet::arrow::arrow_reader::filter::RowFilter">RowFilter</a>&gt;,
pub(crate) selection: <a class="enum" href="https://doc.rust-lang.org/nightly/core/option/enum.Option.html" title="enum core::option::Option">Option</a>&lt;<a class="struct" href="../arrow_reader/selection/struct.RowSelection.html" title="struct parquet::arrow::arrow_reader::selection::RowSelection">RowSelection</a>&gt;,
pub(crate) limit: <a class="enum" href="https://doc.rust-lang.org/nightly/core/option/enum.Option.html" title="enum core::option::Option">Option</a>&lt;<a class="primitive" href="https://doc.rust-lang.org/nightly/std/primitive.usize.html">usize</a>&gt;,
pub(crate) offset: <a class="enum" href="https://doc.rust-lang.org/nightly/core/option/enum.Option.html" title="enum core::option::Option">Option</a>&lt;<a class="primitive" href="https://doc.rust-lang.org/nightly/std/primitive.usize.html">usize</a>&gt;,
}</code></pre><h2 id="fields" class="fields section-header">Fields<a href="#fields" class="anchor">§</a></h2><span id="structfield.input" class="structfield section-header"><a href="#structfield.input" class="anchor field">§</a><code>input: AsyncReader&lt;T&gt;</code></span><span id="structfield.metadata" class="structfield section-header"><a href="#structfield.metadata" class="anchor field">§</a><code>metadata: <a class="struct" href="https://doc.rust-lang.org/nightly/alloc/sync/struct.Arc.html" title="struct alloc::sync::Arc">Arc</a>&lt;<a class="struct" href="../../file/metadata/struct.ParquetMetaData.html" title="struct parquet::file::metadata::ParquetMetaData">ParquetMetaData</a>&gt;</code></span><span id="structfield.schema" class="structfield section-header"><a href="#structfield.schema" class="anchor field">§</a><code>schema: <a class="struct" href="https://doc.rust-lang.org/nightly/alloc/sync/struct.Arc.html" title="struct alloc::sync::Arc">Arc</a>&lt;Schema&gt;</code></span><span id="structfield.fields" class="structfield section-header"><a href="#structfield.fields" class="anchor field">§</a><code>fields: <a class="enum" href="https://doc.rust-lang.org/nightly/core/option/enum.Option.html" title="enum core::option::Option">Option</a>&lt;<a class="struct" href="https://doc.rust-lang.org/nightly/alloc/sync/struct.Arc.html" title="struct alloc::sync::Arc">Arc</a>&lt;ParquetField&gt;&gt;</code></span><span id="structfield.batch_size" class="structfield section-header"><a href="#structfield.batch_size" class="anchor field">§</a><code>batch_size: <a class="primitive" href="https://doc.rust-lang.org/nightly/std/primitive.usize.html">usize</a></code></span><span id="structfield.row_groups" class="structfield section-header"><a href="#structfield.row_groups" class="anchor field">§</a><code>row_groups: <a class="enum" href="https://doc.rust-lang.org/nightly/core/option/enum.Option.html" title="enum core::option::Option">Option</a>&lt;<a class="struct" href="https://doc.rust-lang.org/nightly/alloc/vec/struct.Vec.html" title="struct alloc::vec::Vec">Vec</a>&lt;<a class="primitive" href="https://doc.rust-lang.org/nightly/std/primitive.usize.html">usize</a>&gt;&gt;</code></span><span id="structfield.projection" class="structfield section-header"><a href="#structfield.projection" class="anchor field">§</a><code>projection: <a class="struct" href="../struct.ProjectionMask.html" title="struct parquet::arrow::ProjectionMask">ProjectionMask</a></code></span><span id="structfield.filter" class="structfield section-header"><a href="#structfield.filter" class="anchor field">§</a><code>filter: <a class="enum" href="https://doc.rust-lang.org/nightly/core/option/enum.Option.html" title="enum core::option::Option">Option</a>&lt;<a class="struct" href="../arrow_reader/filter/struct.RowFilter.html" title="struct parquet::arrow::arrow_reader::filter::RowFilter">RowFilter</a>&gt;</code></span><span id="structfield.selection" class="structfield section-header"><a href="#structfield.selection" class="anchor field">§</a><code>selection: <a class="enum" href="https://doc.rust-lang.org/nightly/core/option/enum.Option.html" title="enum core::option::Option">Option</a>&lt;<a class="struct" href="../arrow_reader/selection/struct.RowSelection.html" title="struct parquet::arrow::arrow_reader::selection::RowSelection">RowSelection</a>&gt;</code></span><span id="structfield.limit" class="structfield section-header"><a href="#structfield.limit" class="anchor field">§</a><code>limit: <a class="enum" href="https://doc.rust-lang.org/nightly/core/option/enum.Option.html" title="enum core::option::Option">Option</a>&lt;<a class="primitive" href="https://doc.rust-lang.org/nightly/std/primitive.usize.html">usize</a>&gt;</code></span><span id="structfield.offset" class="structfield section-header"><a href="#structfield.offset" class="anchor field">§</a><code>offset: <a class="enum" href="https://doc.rust-lang.org/nightly/core/option/enum.Option.html" title="enum core::option::Option">Option</a>&lt;<a class="primitive" href="https://doc.rust-lang.org/nightly/std/primitive.usize.html">usize</a>&gt;</code></span><h2 id="implementations" class="section-header">Implementations<a href="#implementations" class="anchor">§</a></h2><div id="implementations-list"><details class="toggle implementors-toggle" open><summary><section id="impl-ArrowReaderBuilder%3CAsyncReader%3CT%3E%3E" class="impl"><a class="src rightside" href="../../../src/parquet/arrow/async_reader/mod.rs.html#246-455">source</a><a href="#impl-ArrowReaderBuilder%3CAsyncReader%3CT%3E%3E" class="anchor">§</a><h3 class="code-header">impl&lt;T: <a class="trait" href="trait.AsyncFileReader.html" title="trait parquet::arrow::async_reader::AsyncFileReader">AsyncFileReader</a> + <a class="trait" href="https://doc.rust-lang.org/nightly/core/marker/trait.Send.html" title="trait core::marker::Send">Send</a> + 'static&gt; <a class="type" href="type.ParquetRecordBatchStreamBuilder.html" title="type parquet::arrow::async_reader::ParquetRecordBatchStreamBuilder">ParquetRecordBatchStreamBuilder</a>&lt;T&gt;</h3></section></summary><div class="impl-items"><details class="toggle method-toggle" open><summary><section id="method.new" class="method"><a class="src rightside" href="../../../src/parquet/arrow/async_reader/mod.rs.html#279-281">source</a><h4 class="code-header">pub async fn <a href="#method.new" class="fn">new</a>(input: T) -&gt; <a class="type" href="../../errors/type.Result.html" title="type parquet::errors::Result">Result</a>&lt;Self&gt;</h4></section></summary><div class="docblock"><p>Create a new <a href="type.ParquetRecordBatchStreamBuilder.html" title="type parquet::arrow::async_reader::ParquetRecordBatchStreamBuilder"><code>ParquetRecordBatchStreamBuilder</code></a> with the provided parquet file</p>
<h5 id="example"><a class="doc-anchor" href="#example">§</a>Example</h5>
<div class="example-wrap"><pre class="rust rust-example-rendered"><code><span class="comment">// Open async file containing parquet data
</span><span class="kw">let </span><span class="kw-2">mut </span>file = tokio::fs::File::from_std(file);
<span class="comment">// construct the reader
</span><span class="kw">let </span><span class="kw-2">mut </span>reader = ParquetRecordBatchStreamBuilder::new(file)
.<span class="kw">await</span>.unwrap().build().unwrap();
<span class="comment">// Read batche
</span><span class="kw">let </span>batch: RecordBatch = reader.next().<span class="kw">await</span>.unwrap().unwrap();</code></pre></div>
</div></details><details class="toggle method-toggle" open><summary><section id="method.new_with_options" class="method"><a class="src rightside" href="../../../src/parquet/arrow/async_reader/mod.rs.html#285-288">source</a><h4 class="code-header">pub async fn <a href="#method.new_with_options" class="fn">new_with_options</a>(
input: T,
options: <a class="struct" href="../arrow_reader/struct.ArrowReaderOptions.html" title="struct parquet::arrow::arrow_reader::ArrowReaderOptions">ArrowReaderOptions</a>,
) -&gt; <a class="type" href="../../errors/type.Result.html" title="type parquet::errors::Result">Result</a>&lt;Self&gt;</h4></section></summary><div class="docblock"><p>Create a new <a href="type.ParquetRecordBatchStreamBuilder.html" title="type parquet::arrow::async_reader::ParquetRecordBatchStreamBuilder"><code>ParquetRecordBatchStreamBuilder</code></a> with the provided parquet file
and <a href="../arrow_reader/struct.ArrowReaderOptions.html" title="struct parquet::arrow::arrow_reader::ArrowReaderOptions"><code>ArrowReaderOptions</code></a></p>
</div></details><details class="toggle method-toggle" open><summary><section id="method.new_with_metadata" class="method"><a class="src rightside" href="../../../src/parquet/arrow/async_reader/mod.rs.html#335-337">source</a><h4 class="code-header">pub fn <a href="#method.new_with_metadata" class="fn">new_with_metadata</a>(input: T, metadata: <a class="struct" href="../arrow_reader/struct.ArrowReaderMetadata.html" title="struct parquet::arrow::arrow_reader::ArrowReaderMetadata">ArrowReaderMetadata</a>) -&gt; Self</h4></section></summary><div class="docblock"><p>Create a <a href="type.ParquetRecordBatchStreamBuilder.html" title="type parquet::arrow::async_reader::ParquetRecordBatchStreamBuilder"><code>ParquetRecordBatchStreamBuilder</code></a> from the provided <a href="../arrow_reader/struct.ArrowReaderMetadata.html" title="struct parquet::arrow::arrow_reader::ArrowReaderMetadata"><code>ArrowReaderMetadata</code></a></p>
<p>This allows loading metadata once and using it to create multiple builders with
potentially different settings, that can be read in parallel.</p>
<h5 id="example-of-reading-from-multiple-streams-in-parallel"><a class="doc-anchor" href="#example-of-reading-from-multiple-streams-in-parallel">§</a>Example of reading from multiple streams in parallel</h5>
<div class="example-wrap"><pre class="rust rust-example-rendered"><code><span class="comment">// open file with parquet data
</span><span class="kw">let </span><span class="kw-2">mut </span>file = tokio::fs::File::from_std(file);
<span class="comment">// load metadata once
</span><span class="kw">let </span>meta = ArrowReaderMetadata::load_async(<span class="kw-2">&amp;mut </span>file, Default::default()).<span class="kw">await</span>.unwrap();
<span class="comment">// create two readers, a and b, from the same underlying file
// without reading the metadata again
</span><span class="kw">let </span><span class="kw-2">mut </span>a = ParquetRecordBatchStreamBuilder::new_with_metadata(
file.try_clone().<span class="kw">await</span>.unwrap(),
meta.clone()
).build().unwrap();
<span class="kw">let </span><span class="kw-2">mut </span>b = ParquetRecordBatchStreamBuilder::new_with_metadata(file, meta).build().unwrap();
<span class="comment">// Can read batches from both readers in parallel
</span><span class="macro">assert_eq!</span>(
a.next().<span class="kw">await</span>.unwrap().unwrap(),
b.next().<span class="kw">await</span>.unwrap().unwrap(),
);</code></pre></div>
</div></details><details class="toggle method-toggle" open><summary><section id="method.get_row_group_column_bloom_filter" class="method"><a class="src rightside" href="../../../src/parquet/arrow/async_reader/mod.rs.html#343-400">source</a><h4 class="code-header">pub async fn <a href="#method.get_row_group_column_bloom_filter" class="fn">get_row_group_column_bloom_filter</a>(
&amp;mut self,
row_group_idx: <a class="primitive" href="https://doc.rust-lang.org/nightly/std/primitive.usize.html">usize</a>,
column_idx: <a class="primitive" href="https://doc.rust-lang.org/nightly/std/primitive.usize.html">usize</a>,
) -&gt; <a class="type" href="../../errors/type.Result.html" title="type parquet::errors::Result">Result</a>&lt;<a class="enum" href="https://doc.rust-lang.org/nightly/core/option/enum.Option.html" title="enum core::option::Option">Option</a>&lt;<a class="struct" href="../../bloom_filter/struct.Sbbf.html" title="struct parquet::bloom_filter::Sbbf">Sbbf</a>&gt;&gt;</h4></section></summary><div class="docblock"><p>Read bloom filter for a column in a row group
Returns <code>None</code> if the column does not have a bloom filter</p>
<p>We should call this function after other forms pruning, such as projection and predicate pushdown.</p>
</div></details><details class="toggle method-toggle" open><summary><section id="method.build" class="method"><a class="src rightside" href="../../../src/parquet/arrow/async_reader/mod.rs.html#403-454">source</a><h4 class="code-header">pub fn <a href="#method.build" class="fn">build</a>(self) -&gt; <a class="type" href="../../errors/type.Result.html" title="type parquet::errors::Result">Result</a>&lt;<a class="struct" href="struct.ParquetRecordBatchStream.html" title="struct parquet::arrow::async_reader::ParquetRecordBatchStream">ParquetRecordBatchStream</a>&lt;T&gt;&gt;</h4></section></summary><div class="docblock"><p>Build a new <a href="struct.ParquetRecordBatchStream.html" title="struct parquet::arrow::async_reader::ParquetRecordBatchStream"><code>ParquetRecordBatchStream</code></a></p>
</div></details></div></details></div><script src="../../../type.impl/parquet/arrow/arrow_reader/struct.ArrowReaderBuilder.js" data-self-path="parquet::arrow::async_reader::ParquetRecordBatchStreamBuilder" async></script></section></div></main></body></html>