exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/package-info.java - drill - Git at Google

 /*
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing, software
  * distributed under the License is distributed on an "AS IS" BASIS,
  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
 /**
  * Provides run-time semantic analysis of the projection list for the
  * scan operator. The project list can include table columns and a
  * variety of special columns. Requested columns can exist in the table,
  * or may be "missing" with null values applied. The code here prepares
  * a run-time projection plan based on the actual table schema.
  *
  * <h4>Overview</h4>
  *
  * The projection framework look at schema as a set of transforms:
  * <p>
  * <ul>
  * <li>Scan level: physical plan projection list and optional provided
  * schema information.</li>
  * <li>File level: materializes implicit file and parition columns.</li>
  * <li>Reader level: integrates the actual schema discovered by the
  * reader with the scan-level projection list.</li>
  * </ul>
  * <p>
  * Projection turns out to be a very complex operation in a schema-on-read
  * system such as Drill. Provided schema helps resolve ambiguities inherent
  * in schema-on-read, but at the cost of some additional complexity.
  *
  * <h4>Background</h4>
  *
  * The Scan-level projection holds the list of columns
  * (or the wildcard) as requested by the user in the query. The planner
  * determines which columns to project. In Drill, projection is speculative:
  * it is a list of names which the planner hopes will appear in the data
  * files. The reader must make up columns (the infamous nullable INT) when
  * it turns out that no such column exists. Else, the reader must figure out
  * the data type for any columns that does exist.
  * <p>
  * With the advent of provided schema in Drill 1.16, the scan level projection
  * integrates that schema information with the projection list provided in
  * the physical operator. If a schema is provided, then each scan-level
  * column tracks the schema information for that column.
  * <p>
  * The scan-level projection also
  * implements the special rules for a "strict" provided schema: if the operator
  * projection list contains a wildcard, a schema is provided, and the schema
  * is strict, then the scan level projection expands the wildcard into the
  * set of columns in the provided schema. Doing so ensures that the scan
  * output contains exactly those columns from the schema, even if the columns
  * must be null or at a default value. (The result set loader does additional
  * filtering as well.)
  * <p>
  * The scan project list defines the set of columns which the scan operator
  * is obliged to send downstream. Ideally, the scan operator sends exactly the
  * same schema (the project list with types filled in) for all batches. Since
  * batches may come from different files, the scan operator is obligated to
  * unify the schemas from those files (or blocks.)
  * <p>
  * Reader (file)-level projection occurs for each reader. A single scan
  * may use multiple readers to read data. From the reader's perspective, it
  * offers the schema it discovers in the file. The reader itself is rather
  * inflexible: it must deal with the data it finds, of the type found in
  * the data source.
  * <p>
  * The reader thus tells the result set loader that it has such-and-so schema.
  * It does that either at open time (so-called "early" schema, such as for
  * CSV, JDBC or Parquet) or as it discovers the columns (so-called "late"
  * schema as in JSON.) Again, in each case, the data source schema is what
  * it is; it can't be changed due to the wishes of the scan-level projection.
  * <p>
  * Readers obtain column schema from the file or data source. For example,
  * a Parquet reader can obtain schema information
  * from the Parquet headers. A JDBC reader obtains schema information from the
  * returned schema. As noted above, we use the term "early schema" when type
  * information is available at open time, before reading the first row of data.
  * <p>
  * By contrast eaders such as JSON and CSV are "late schema": they don't know the data
  * schema until they read the file. This is true "schema on read." Further, for
  * JSON, the data may change from one batch to the next as the reader "discovers"
  * fields that did not appear in earlier batches. This requires some amount of
  * "schema smoothing": the ability to preserve a consistent output schema even
  * as the input schema jiggles around some.
  * <p>
  * Drill supports many kinds of data sources via plugins. The DFS plugin works
  * with files in a distributed store such as HDFS. Such file-based readers
  * add implicit file or partition columns. Since these columns are generic to
  * all format plugins, they are factored out into a file scan framework which
  * inserts the "implicit" columns separate from the reader-provided columns.
  *
  * <h4>Design</h4>
  *
  * This leads to a multi-stage merge operation. The result set loader is
  * presented with each column one-by-one (either at open time or during read.)
  * When a column is presented, the projection framework makes a number of
  * decisions:
  * <p>
  * <ul>
  * <li>Is the column projected? For example, if a query is <tt>SELECT a, b, c</tt>
  * and the reader offers column <tt>d</tt>, then column d will not be projected.
  * In the wildcard case, "special" columns will be omitted from the column
  * expansion and will be unprojected.</li>
  * <li>Is type conversion needed? If a schema is provided, and the type of the
  * column requested in the provided schema differs from that offered by the
  * reader, the framework can insert a type-conversion "shim", assuming that
  * the framework knows how to do the conversion. Else, and error is raised.</li>
  * <li>Is the column type and mode consistent with the projection list?
  * Suppose the query is <tt>SELECT a, b[10], c.d</tt>. Column `a` matches
  * any reader column. But, column `b` is valid only for an array (not a map
  * and not a scalar.) Column `c` must be a map (or array of maps.) And so on.</li>
  * </ul>
  * <p>
  * The result is a refined schema: the scan level schema with more information
  * filled in. For Parquet, all projection information can be filled in. For
  * CSV or JSON, we can only add file metadata information, but not yet the
  * actual data schema.
  * <p>
  * Batch-level schema: once a reader reads actual data, it now knows
  * exactly what it read. This is the "schema on read model." Thus, after reading
  * a batch, any remaining uncertainty about the projected schema is removed.
  * The actual data defined data types and so on.
  * <p>
  * The goal of this mechanism is to handle the above use cases cleanly, in a
  * common set of classes, and to avoid the need for each reader to figure out
  * all these issues for themselves (as was the case with earlier versions of
  * Drill.)
  * <p>
  * Because these issues are complex, the code itself is complex. To make the
  * code easier to manage, each bit of functionality is encapsulated in a
  * distinct class. Classes combine via composition to create a "framework"
  * suitable for each kind of reader: whether it be early or late schema,
  * file-based or something else, etc.
  *
  * <h4>Nuances of Reader-Level Projection</h4>
  *
  * We've said that the scan-level projection identifies what the query
  * <i>wants</i>. We've said that the reader identifies what the external
  * data actually <i>is</i>. We've mentioned how we bridge between the
  * two. Here we explore this in more detail.
  * <p>
  * Run-time schema resolution occurs at various stages:
  * <p>
  * <ul>
  * <li>The per-column resolution identified earlier: matching types,
  * type conversion, and so on.</li>
  * <li>The reader provides some set of columns. We don't know which
  * columns until the end of the first (or more generally, every) batch.
  * Suppose the query wants <tt>SELECT a, b, c</tt> but the reader turns
  * out to provide only `a` and `b`. On after the first batch do we
  * realize that we need column `c` as a "null" column (of a type defined
  * in the provided schema, specified by the plugin, or good-old nullable
  * INT.)</li>
  * <li>The result set loader will have created "dummy" columns for
  * unprojected columns. The reader can still write to such columns
  * (because they represent data in the file), but the associated column
  * writer simply ignores the data. As a result, the result set loader
  * should produce only a (possibly full) subset of projected columns.</li>
  * <li>After each reader batch, the projection framework goes to work
  * filling in implicit columns, and filling in missing columns. It is
  * important to remember that this pass *must* be done *after* a batch
  * is read since we don't now the columns that the reader can provided
  * until after a batch is read.</li>
  * <li>Some readers, such as JSON, can "change its mind" about the
  * schema across batches. For example, the first batch may include
  * only columns a and b. Later in the JSON file, the reader may
  * discover column c. This means that the above post-batch analysis
  * must be repeated each time the reader changes the schema. (The result
  * set loader tracks schema changes for this purpose.)</li>
  * <li>File schemas evolve. The same changes noted above can occur
  * cross files. Maybe file 1 has column `x` as a BIGINT, while file 2
  * has column 'x' as INT. A "smoothing" step attempts to avoid hard
  * schema changes if they can be avoided. While smoothing is a clever
  * idea, it only handles some cases. Provided schema is a more reliable
  * solution (but is not yet widely deployed.)</li>
  * </ul>
  *
  * <h4>Reader-Level Projection Set</h4>
  *
  * The Projection Set mechanism is designed to handle the increasing nuances
  * of Drill run-time projection by providing a source of information about
  * each column that the reader may discover:
  * <ul>
  * <li>Is the column projected?</li><ul>
  *   <li>If the query is explicit (<tt>SELECT a, b, c</tt>), is the column
  *   in the projection list?</li>
  *   <li>If the query is a wildcard (<tt>SELECT *</tt>), is the column
  *   marked as special (not included in the wildcard)?</li>
  *   <li>If the query is wildcard, and a strict schema is provided, is
  *   the column part of the provided schema?</li></ul></li>
  * <li>Verify column is consistent with projection.</li>
  * <li>Type conversion, if needed.</li>
  * </ul>
  *
  * <h4>Projection Via Rewrites</h4>
  *
  * The core concept is one of successive refinement of the project
  * list through a set of rewrites:
  * <p>
  * <ul>
  * <li>Scan-level rewrite: convert {@link SchemaPath} entries into
  * internal column nodes, tagging the nodes with the column type:
  * wildcard, unresolved table column, or special columns (such as
  * file metadata.) The scan-level rewrite is done once per scan
  * operator.</li>
  * <li>Reader-level rewrite: convert the internal column nodes into
  * other internal nodes, leaving table column nodes unresolved. The
  * typical use is to fill in metadata columns with information about a
  * specific file.</li>
  * <li>Schema-level rewrite: given the actual schema of a record batch,
  * rewrite the reader-level projection to describe the final projection
  * from incoming data to output container. This step fills in missing
  * columns, expands wildcards, etc.</li>
  * </ul>
  * The following outlines the steps from scan plan to per-file data
  * loading to producing the output batch. The center path is the
  * projection metadata which turns into an actual output batch.
  * <pre>
  *                   Scan Plan
  *                       |
  *                       v
  *               +--------------+
  *               | Project List |
  *               |    Parser    |
  *               +--------------+
  *                       |
  *                       v
  *                +------------+
  *                | Scan Level |     +----------------+
  *                | Projection | --->| Projection Set |
  *                +------------+     +----------------+
  *                       |                  |
  *                       v                  v
  *  +------+      +------------+     +------------+      +-----------+
  *  | File | ---> | File Level |     | Result Set | ---> | Data File |
  *  | Data |      | Projection |     |   Loader   | <--- |  Reader   |
  *  +------+      +------------+     +------------+      +-----------+
  *                       |                  |
  *                       v                  |
  *               +--------------+   Reader  |
  *               | Reader Level |   Schema  |
  *               |  Projection  | <---------+
  *               +--------------+           |
  *                       |                  |
  *                       v                  |
  *                  +--------+   Loaded     |
  *                  | Output |   Vectors    |
  *                  | Mapper | <------------+
  *                  +--------+
  *                       |
  *                       v
  *                 Output Batch
  * </pre>
  * <p>
  * The left side can be thought of as the "what we want" description of the
  * schema, with the right side being "what the reader actually discovered."
  * <p>
  * The output mapper includes mechanisms to populate implicit columns, create
  * null columns, and to merge implicit, null and data columns, omitting
  * unprojected data columns.
  * <p>
  * In all cases, projection must handle maps, which are a recursive structure
  * much like a row. That is, Drill consists of nested tuples (the row and maps),
  * each of which contains columns which can be maps. Thus, there is a set of
  * alternating layers of tuples, columns, tuples, and so on until we get to leaf
  * (non-map) columns. As a result, most of the above structures are in the form
  * of tuple trees, requiring recursive algorithms to apply rules down through the
  * nested layers of tuples.
  * <p>
  * The above mechanism is done at runtime, in each scan fragment. Since Drill is
  * schema-on-read, and has no plan-time schema concept, run-time projection is
  * required. On the other hand, if Drill were ever to support the "classic"
  * plan-time schema resolution, then much of this work could be done at plan
  * time rather than (redundantly) at runtime. The main change would be to do
  * the work abstractly, working with column and row descriptions, rather than
  * concretely with vectors as is done here. Then, that abstract description
  * would feed directly into these mechanisms with the "final answer" about
  * projection, batch layout, and so on. The parts of this mechanism that
  * create and populate vectors would remain.
  */

 package org.apache.drill.exec.physical.impl.scan.project;
	/*
	* Licensed to the Apache Software Foundation (ASF) under one
	* or more contributor license agreements. See the NOTICE file
	* distributed with this work for additional information
	* regarding copyright ownership. The ASF licenses this file
	* to you under the Apache License, Version 2.0 (the
	* "License"); you may not use this file except in compliance
	* with the License. You may obtain a copy of the License at
	*
	* http://www.apache.org/licenses/LICENSE-2.0
	*
	* Unless required by applicable law or agreed to in writing, software
	* distributed under the License is distributed on an "AS IS" BASIS,
	* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	* See the License for the specific language governing permissions and
	* limitations under the License.
	*/
	/**
	* Provides run-time semantic analysis of the projection list for the
	* scan operator. The project list can include table columns and a
	* variety of special columns. Requested columns can exist in the table,
	* or may be "missing" with null values applied. The code here prepares
	* a run-time projection plan based on the actual table schema.
	*
	* <h4>Overview</h4>
	*
	* The projection framework look at schema as a set of transforms:
	* <p>
	* <ul>
	* <li>Scan level: physical plan projection list and optional provided
	* schema information.</li>
	* <li>File level: materializes implicit file and parition columns.</li>
	* <li>Reader level: integrates the actual schema discovered by the
	* reader with the scan-level projection list.</li>
	* </ul>
	* <p>
	* Projection turns out to be a very complex operation in a schema-on-read
	* system such as Drill. Provided schema helps resolve ambiguities inherent
	* in schema-on-read, but at the cost of some additional complexity.
	*
	* <h4>Background</h4>
	*
	* The Scan-level projection holds the list of columns
	* (or the wildcard) as requested by the user in the query. The planner
	* determines which columns to project. In Drill, projection is speculative:
	* it is a list of names which the planner hopes will appear in the data
	* files. The reader must make up columns (the infamous nullable INT) when
	* it turns out that no such column exists. Else, the reader must figure out
	* the data type for any columns that does exist.
	* <p>
	* With the advent of provided schema in Drill 1.16, the scan level projection
	* integrates that schema information with the projection list provided in
	* the physical operator. If a schema is provided, then each scan-level
	* column tracks the schema information for that column.
	* <p>
	* The scan-level projection also
	* implements the special rules for a "strict" provided schema: if the operator
	* projection list contains a wildcard, a schema is provided, and the schema
	* is strict, then the scan level projection expands the wildcard into the
	* set of columns in the provided schema. Doing so ensures that the scan
	* output contains exactly those columns from the schema, even if the columns
	* must be null or at a default value. (The result set loader does additional
	* filtering as well.)
	* <p>
	* The scan project list defines the set of columns which the scan operator
	* is obliged to send downstream. Ideally, the scan operator sends exactly the
	* same schema (the project list with types filled in) for all batches. Since
	* batches may come from different files, the scan operator is obligated to
	* unify the schemas from those files (or blocks.)
	* <p>
	* Reader (file)-level projection occurs for each reader. A single scan
	* may use multiple readers to read data. From the reader's perspective, it
	* offers the schema it discovers in the file. The reader itself is rather
	* inflexible: it must deal with the data it finds, of the type found in
	* the data source.
	* <p>
	* The reader thus tells the result set loader that it has such-and-so schema.
	* It does that either at open time (so-called "early" schema, such as for
	* CSV, JDBC or Parquet) or as it discovers the columns (so-called "late"
	* schema as in JSON.) Again, in each case, the data source schema is what
	* it is; it can't be changed due to the wishes of the scan-level projection.
	* <p>
	* Readers obtain column schema from the file or data source. For example,
	* a Parquet reader can obtain schema information
	* from the Parquet headers. A JDBC reader obtains schema information from the
	* returned schema. As noted above, we use the term "early schema" when type
	* information is available at open time, before reading the first row of data.
	* <p>
	* By contrast eaders such as JSON and CSV are "late schema": they don't know the data
	* schema until they read the file. This is true "schema on read." Further, for
	* JSON, the data may change from one batch to the next as the reader "discovers"
	* fields that did not appear in earlier batches. This requires some amount of
	* "schema smoothing": the ability to preserve a consistent output schema even
	* as the input schema jiggles around some.
	* <p>
	* Drill supports many kinds of data sources via plugins. The DFS plugin works
	* with files in a distributed store such as HDFS. Such file-based readers
	* add implicit file or partition columns. Since these columns are generic to
	* all format plugins, they are factored out into a file scan framework which
	* inserts the "implicit" columns separate from the reader-provided columns.
	*
	* <h4>Design</h4>
	*
	* This leads to a multi-stage merge operation. The result set loader is
	* presented with each column one-by-one (either at open time or during read.)
	* When a column is presented, the projection framework makes a number of
	* decisions:
	* <p>
	* <ul>
	* <li>Is the column projected? For example, if a query is <tt>SELECT a, b, c</tt>
	* and the reader offers column <tt>d</tt>, then column d will not be projected.
	* In the wildcard case, "special" columns will be omitted from the column
	* expansion and will be unprojected.</li>
	* <li>Is type conversion needed? If a schema is provided, and the type of the
	* column requested in the provided schema differs from that offered by the
	* reader, the framework can insert a type-conversion "shim", assuming that
	* the framework knows how to do the conversion. Else, and error is raised.</li>
	* <li>Is the column type and mode consistent with the projection list?
	* Suppose the query is <tt>SELECT a, b[10], c.d</tt>. Column `a` matches
	* any reader column. But, column `b` is valid only for an array (not a map
	* and not a scalar.) Column `c` must be a map (or array of maps.) And so on.</li>
	* </ul>
	* <p>
	* The result is a refined schema: the scan level schema with more information
	* filled in. For Parquet, all projection information can be filled in. For
	* CSV or JSON, we can only add file metadata information, but not yet the
	* actual data schema.
	* <p>
	* Batch-level schema: once a reader reads actual data, it now knows
	* exactly what it read. This is the "schema on read model." Thus, after reading
	* a batch, any remaining uncertainty about the projected schema is removed.
	* The actual data defined data types and so on.
	* <p>
	* The goal of this mechanism is to handle the above use cases cleanly, in a
	* common set of classes, and to avoid the need for each reader to figure out
	* all these issues for themselves (as was the case with earlier versions of
	* Drill.)
	* <p>
	* Because these issues are complex, the code itself is complex. To make the
	* code easier to manage, each bit of functionality is encapsulated in a
	* distinct class. Classes combine via composition to create a "framework"
	* suitable for each kind of reader: whether it be early or late schema,
	* file-based or something else, etc.
	*
	* <h4>Nuances of Reader-Level Projection</h4>
	*
	* We've said that the scan-level projection identifies what the query
	* <i>wants</i>. We've said that the reader identifies what the external
	* data actually <i>is</i>. We've mentioned how we bridge between the
	* two. Here we explore this in more detail.
	* <p>
	* Run-time schema resolution occurs at various stages:
	* <p>
	* <ul>
	* <li>The per-column resolution identified earlier: matching types,
	* type conversion, and so on.</li>
	* <li>The reader provides some set of columns. We don't know which
	* columns until the end of the first (or more generally, every) batch.
	* Suppose the query wants <tt>SELECT a, b, c</tt> but the reader turns
	* out to provide only `a` and `b`. On after the first batch do we
	* realize that we need column `c` as a "null" column (of a type defined
	* in the provided schema, specified by the plugin, or good-old nullable
	* INT.)</li>
	* <li>The result set loader will have created "dummy" columns for
	* unprojected columns. The reader can still write to such columns
	* (because they represent data in the file), but the associated column
	* writer simply ignores the data. As a result, the result set loader
	* should produce only a (possibly full) subset of projected columns.</li>
	* <li>After each reader batch, the projection framework goes to work
	* filling in implicit columns, and filling in missing columns. It is
	* important to remember that this pass must be done after a batch
	* is read since we don't now the columns that the reader can provided
	* until after a batch is read.</li>
	* <li>Some readers, such as JSON, can "change its mind" about the
	* schema across batches. For example, the first batch may include
	* only columns a and b. Later in the JSON file, the reader may
	* discover column c. This means that the above post-batch analysis
	* must be repeated each time the reader changes the schema. (The result
	* set loader tracks schema changes for this purpose.)</li>
	* <li>File schemas evolve. The same changes noted above can occur
	* cross files. Maybe file 1 has column `x` as a BIGINT, while file 2
	* has column 'x' as INT. A "smoothing" step attempts to avoid hard
	* schema changes if they can be avoided. While smoothing is a clever
	* idea, it only handles some cases. Provided schema is a more reliable
	* solution (but is not yet widely deployed.)</li>
	* </ul>
	*
	* <h4>Reader-Level Projection Set</h4>
	*
	* The Projection Set mechanism is designed to handle the increasing nuances
	* of Drill run-time projection by providing a source of information about
	* each column that the reader may discover:
	* <ul>
	* <li>Is the column projected?</li><ul>
	* <li>If the query is explicit (<tt>SELECT a, b, c</tt>), is the column
	* in the projection list?</li>
	* <li>If the query is a wildcard (<tt>SELECT *</tt>), is the column
	* marked as special (not included in the wildcard)?</li>
	* <li>If the query is wildcard, and a strict schema is provided, is
	* the column part of the provided schema?</li></ul></li>
	* <li>Verify column is consistent with projection.</li>
	* <li>Type conversion, if needed.</li>
	* </ul>
	*
	* <h4>Projection Via Rewrites</h4>
	*
	* The core concept is one of successive refinement of the project
	* list through a set of rewrites:
	* <p>
	* <ul>
	* <li>Scan-level rewrite: convert {@link SchemaPath} entries into
	* internal column nodes, tagging the nodes with the column type:
	* wildcard, unresolved table column, or special columns (such as
	* file metadata.) The scan-level rewrite is done once per scan
	* operator.</li>
	* <li>Reader-level rewrite: convert the internal column nodes into
	* other internal nodes, leaving table column nodes unresolved. The
	* typical use is to fill in metadata columns with information about a
	* specific file.</li>
	* <li>Schema-level rewrite: given the actual schema of a record batch,
	* rewrite the reader-level projection to describe the final projection
	* from incoming data to output container. This step fills in missing
	* columns, expands wildcards, etc.</li>
	* </ul>
	* The following outlines the steps from scan plan to per-file data
	* loading to producing the output batch. The center path is the
	* projection metadata which turns into an actual output batch.
	* <pre>
	* Scan Plan
	* \|
	* v
	* +--------------+
	* \| Project List \|
	* \| Parser \|
	* +--------------+
	* \|
	* v
	* +------------+
	* \| Scan Level \| +----------------+
	* \| Projection \| --->\| Projection Set \|
	* +------------+ +----------------+
	* \| \|
	* v v
	* +------+ +------------+ +------------+ +-----------+
	* \| File \| ---> \| File Level \| \| Result Set \| ---> \| Data File \|
	* \| Data \| \| Projection \| \| Loader \| <--- \| Reader \|
	* +------+ +------------+ +------------+ +-----------+
	* \| \|
	* v \|
	* +--------------+ Reader \|
	* \| Reader Level \| Schema \|
	* \| Projection \| <---------+
	* +--------------+ \|
	* \| \|
	* v \|
	* +--------+ Loaded \|
	* \| Output \| Vectors \|
	* \| Mapper \| <------------+
	* +--------+
	* \|
	* v
	* Output Batch
	* </pre>
	* <p>
	* The left side can be thought of as the "what we want" description of the
	* schema, with the right side being "what the reader actually discovered."
	* <p>
	* The output mapper includes mechanisms to populate implicit columns, create
	* null columns, and to merge implicit, null and data columns, omitting
	* unprojected data columns.
	* <p>
	* In all cases, projection must handle maps, which are a recursive structure
	* much like a row. That is, Drill consists of nested tuples (the row and maps),
	* each of which contains columns which can be maps. Thus, there is a set of
	* alternating layers of tuples, columns, tuples, and so on until we get to leaf
	* (non-map) columns. As a result, most of the above structures are in the form
	* of tuple trees, requiring recursive algorithms to apply rules down through the
	* nested layers of tuples.
	* <p>
	* The above mechanism is done at runtime, in each scan fragment. Since Drill is
	* schema-on-read, and has no plan-time schema concept, run-time projection is
	* required. On the other hand, if Drill were ever to support the "classic"
	* plan-time schema resolution, then much of this work could be done at plan
	* time rather than (redundantly) at runtime. The main change would be to do
	* the work abstractly, working with column and row descriptions, rather than
	* concretely with vectors as is done here. Then, that abstract description
	* would feed directly into these mechanisms with the "final answer" about
	* projection, batch layout, and so on. The parts of this mechanism that
	* create and populate vectors would remain.
	*/

	package org.apache.drill.exec.physical.impl.scan.project;