blob: de28b4dafe36f1408cb342ac8073be9a5749d353 [file] [log] [blame]
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Provides a second-generation row set (AKA "record batch") writer used
* by client code to<ul>
* <li>Define the schema of a result set.</li>
* <li>Write data into the vectors backing a row set.</li></ul>
* <p>
* <h4>Terminology</h4>
* The code here follows the "row/column" naming convention rather than
* the "record/field" convention.
* <dl>
* <dt>Result set</dt>
* <dd>A set of zero or more row sets that hold rows of data.<dd>
* <dt>Row set</dt>
* <dd>A collection of rows with a common schema. Also called a "row
* batch" or "record batch." (But, in Drill, the term "record batch" also
* usually means an operator on that set of records. Here, a row set is
* just the rows &nash; separate from operations on that data.</dd>
* <dt>Row</dt>
* <dd>A single row of data, in the usual database sense. Here, a row is
* a kind of tuple (see below) allowing both name and index access to
* columns.</dd>
* <dt>Tuple</dt>
* <dd>In relational theory, a row is a tuple: a collection of values
* defined by a schema. Tuple values are indexed by position or name.</dd>
* <dt>Column</dt>
* <dd>A single value within a row or row set. (Generally, the context
* makes clear if the term refers to single value or all values for a
* column for a row set. Columns are backed by value vectors.</dd>
* <dt>Map</dt>
* <dd>In Drill, a map is what other systems call a "structure". It is,
* in fact, a nested tuple. In a Java or Python map, each map instance has
* a distinct set of name/value pairs. But, in Drill, all map instances have
* the same schema; hence the so-called "map" is really a tuple. This
* implementation exploits that fact and treats the row, and nested maps,
* almost identically: both provide columns indexed by name or position.</dd>
* <dt>Row Set Mutator</dt>
* <dd>An awkward name, but retains the "mutator" name from the previous
* generation. The mechanism to build a result set as series of row sets.</dd>
* <dt>Tuple Loader</dt>
* <dd>Mechanism to build a single tuple (row or map) by providing name
* or index access to columns. A better name would b "tuple writer", but
* that name is already used elsewhere.</dd>
* <dt>Column Loader</dt>
* <dd>Mechanism to write values to a single column.<dd>
* </dl>
* <h4>Building the Schema</h4>
* The row set mutator works for two cases: a known schema or a discovered
* schema. A known schema occurs in the case, such as JDBC, where the
* underlying data source can describe the schema before reading any rows.
* In this case, client code can build the schema and pass that schema to
* the mutator directly. Alternatively, the client code can build the
* schema column-by-column before the first row is read.
* <p>
* Readers that discover schema can build the schema incrementally: add
* a column, load data for that column for one row, discover the next
* column, and so on. Almost any kind of column can be added at any time
* within the first batch:<ul>
* <li>Required columns are "back-filled" with zeros in the active batch,
* if that value
* makes sense for the column. (Date and Interval columns will throw an
* exception if added after the first row as there is no good "zero"
* value for that column. Varchar columns are back-filled with blanks.<li>
* <li>Optional (nullable) columns can be added at any time; they are
* back-filled with nulls in the active batch. In general, if a column is
* added after the first row, it should be nullable, not required, unless
* the data source has a "missing = blank or zero" policy.</li>
* <li>Repeated (array) columns can be added at any time; they are
* back-filled with empty entries in the first batch. Arrays can also be
* safely added at any time.</li></ul>
* Client code must be aware of the semantics of adding columns at various
* times.<ul>
* <li>Columns added before or during the first row are the trivial case;
* this works for all data types and modes.</li>
* <li>Required (non-nullable0 structured columns (Date, Period) cannot be
* added after the first row (as there is no good zero-fill value.)</li>
* <li>Columns added within the first batch appear to the rest of Drill as
* if they were added before the first row: the downstream operators see the
* same schema from batch to batch.</li>
* <li>Columns added <i>after</i> the first batch will trigger a
* schema-change event downstream.</li>
* <li>The above is true during an "overflow row" (see below.) Once
* overflow occurs, columns added later in that overflow row will actually
* appear in the next batch, and will trigger a schema change when that
* batch is returned. That is, overflow "time shifts" a row addition from
* one batch to the next, and so it also time-shifts the column addition.
* </li></ul>
* Use the {@link org.apache.drill.exec.record.metadata.TupleBuilder} class
* to build the schema. The schema class is part of the
* {@link org.apache.drill.exec.physical.resultSet.RowSetLoader} object available from the
* {@link org.apache.drill.exec.physical.resultSet.ResultSetLoader#writer()} method.
* <h4>Using the Schema</h4>
* Presents columns using a physical schema. That is, map columns appear
* as columns that provide a nested map schema. Presumes that column
* access is primarily structural: first get a map, then process all
* columns for the map.
* <p>
* If the input is a flat structure, then the physical schema has a
* flattened schema as the degenerate case.
* <p>
* In both cases, access to columns is by index or by name. If new columns
* are added while loading, their index is always at the end of the existing
* columns.
* <h4>Writing Data to the Batch</h4>
* Each batch is delimited by a call to {@link org.apache.drill.exec.physical.resultSet.ResultSetLoader#startBatch()}
* and a call to {@link org.apache.drill.exec.physical.resultSet.impl.VectorState#harvestWithLookAhead()}
* to obtain the completed batch. Note that readers do not
* call these methods; the scan operator does this work.
* <p>
* Each row is delimited by a call to {@code startValue()} and a call to
* {@code saveRow()}. <tt>startRow()</tt> performs initialization necessary
* for some vectors such as repeated vectors. <tt>saveRow()</tt> moves the
* row pointer ahead.
* <p>
* A reader can easily reject a row by calling <tt>startRow()</tt>, begin
* to load a row, but omitting the call to <tt>saveRow()</tt> In this case,
* the next call to <tt>startRow()</tt> repositions the row pointer to the
* same row, and new data will overwrite the previous data, effectively erasing
* the unwanted row. This also works for the last row; omitting the call to
* <tt>saveRow()</tt> causes the batch to hold only the rows actually
* saved.
* <p>
* Readers then write to each column. Columns are accessible via index
* ({@link org.apache.drill.exec.physical.resultSet.RowSetLoader#column(int)} or by name
* ({@link org.apache.drill.exec.physical.resultSet.RowSetLoader#column(String)}.
* Indexed access is much faster.
* Column indexes are defined by the order that columns are added. The first
* column is column 0, the second is column 1 and so on.
* <p>
* Each call to the above methods returns the same column writer, allowing the
* reader to cache column writers for additional performance.
* <p>
* All column writers are of the same class; there is no need to cast to a
* type corresponding to the vector. Instead, they provide a variety of
* <tt>set<i>Type</i></tt> methods, where the type is one of various Java
* primitive or structured types. Most vectors provide just one method, but
* others (such as VarChar) provide two. The implementation will throw an
* exception if the vector does not support a particular type.
* <p>
* Note that this class uses the term "loader" for row and column writers
* since the term "writer" is already used by the legacy record set mutator
* and column writers.
* <h4>Handling Batch Limits</h4>
* The mutator enforces two sets of batch limits:<ol>
* <li>The number of rows per batch. The limit defaults to 64K (the Drill
* maximum), but can be set lower by the client.</li>
* <li>The size of the largest vector, which is capped at 16 MB. (A future
* version may allow adjustable caps, or cap the memory of the entire
* batch.</li></ol>
* Both limits are presented to the client via the
* {@link org.apache.drill.exec.physical.resultSet.RowSetLoader#isFull()} method.
* After each call to {@code saveRow()},
* the client should call <tt>isFull()</tt> to determine if the client can add another row. Note
* that failing to do this check will cause the next call to
* {@link org.apache.drill.exec.physical.resultSet.ResultSetLoader#startBatch()} to throw an exception.
* <p>
* The limits have subtle differences, however. Row limits are simple: at
* the end of the last row, the mutator notices that no more rows are possible,
* and so does not allow starting a new row.
* <p>
* Vector overflow is more complex. A row may consist of columns (a, b, c).
* The client may write column a, but then column b might trigger a vector
* overflow. (For example, b is a Varchar, and the value for b is larger than
* the space left in the vector.) The client cannot stop and rewrite a. Instead,
* the client simply continues writing the row. The mutator, internally, moves
* this "overflow" row to a new batch. The overflow row becomes the first row
* of the next batch rather than the first row of the current batch.
* <p>
* For this reason, the client can treat the two overflow cases identically,
* as described above.
* <p>
* There are some subtle differences between the two cases that clients may
* occasionally may need to expect:<ul>
* <li>When a vector overflow occurs, the returned batch will have one
* fewer rows than the client might expect if it is simply counting the rows
* written.</li>
* <li>A new column added to the batch after overflow occurs will appear in
* the <i>next</i> batch, triggering a schema change between the current and
* next batches.</li></ul>
*/
package org.apache.drill.exec.physical.resultSet;