exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/package-info.java - drill - Git at Google

 /*
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing, software
  * distributed under the License is distributed on an "AS IS" BASIS,
  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
 /**
  * Defines the scan operation implementation. The scan operator is a generic mechanism
  * that fits into the Drill Volcano-based iterator protocol to return record batches
  * from one or more readers.
  * <p>
  * Two versions of the scan operator exist:<ul>
  * <li>{@link ScanBatch}: the original version that uses readers based on the
  * {@link RecordReader} interface. <tt>ScanBatch</tt> cannot, however, handle
  * limited-length vectors.</li>
  * <li>{@link ScanOperatorExec}: the revised version that uses a more modular
  * design and that offers a mutator that is a bit easier to use, and can limit
  * vector sizes.</li></ul>
  * New code should use the new version, existing code will continue to use the
  * <tt>ScanBatch</tt> version until all readers are converted to the new format.
  * <p>
  * Further, the new version is designed to allow intensive unit test without
  * the need for the Drill server. New readers should exploit this feature to
  * include intensive tests to keep Drill quality high.
  * <p>
  * See {@link ScanOperatorExec} for details of the scan operator protocol
  * and components.
  *
  * <h4>Traditional Class Structure<h4>
  * The original design was simple: but required each reader to handle many
  * detailed tasks.
  * <pre><code>
  *  +------------+          +-----------+
  *  | Scan Batch |    +---> | ScanBatch |
  *  |  Creator   |    |     +-----------+
  *  +------------+    |           |
  *         |          |           |
  *         v          |           |
  *  +------------+    |           v
  *  |   Format   | ---+   +---------------+
  *  |   Plugin   | -----> | Record Reader |
  *  +------------+        +---------------+
  *
  * </code></pre>
  *
  * The scan batch creator is unique to each storage plugin and is created
  * based on the physical operator configuration ("pop config"). The
  * scan batch creator delegates to the format plugin to create both the
  * scan batch (the scan operator) and the set of readers which the scan
  * batch will manage.
  * <p>
  * The scan batch
  * provides a <code>Mutator</code> that creates the vectors used by the
  * record readers. Schema continuity comes from reusing the Mutator from one
  * file/block to the next.
  * <p>
  * One characteristic of this system is that all the record readers are
  * created up front. If we must read 1000 blocks, we'll create 1000 record
  * readers. Developers must be very careful to only allocate resources when
  * the reader is opened, and release resources when the reader is closed.
  * Else, resource bloat becomes a large problem.
  *
  * <h4>Revised Class Structure</h4>
  *
  * The new design is more complex because it divides tasks up into separate
  * classes. The class structure is larger, but each class is smaller, more
  * focused and does just one task.
  * <pre><code>
  *  +------------+          +---------------+
  *  | Scan Batch | -------> | Format Plugin |
  *  |  Creator   |          +---------------+
  *  +------------+          /        |       \
  *                         /         |        \
  *    +---------------------+        |         \ +---------------+
  *    | OperatorRecordBatch |        |     +---->| ScanFramework |
  *    +---------------------+        |     |     +---------------+
  *                                   v     |            |
  *                         +------------------+         |
  *                         | ScanOperatorExec |         |
  *                         +------------------+         v
  *                                   |            +--------------+
  *                                   +----------> | Batch Reader |
  *                                                +--------------+
  * </code></pre>
  *
  * Here, the scan batch creator again delegates to the format plugin. The
  * format plugin creates three objects:
  * <ul>
  * <li>The <code>OperatorRecordBatch</code>, which encapsulates the Volcano
  * iterator protocol. It also holds onto the output batch. This allows the
  * operator implementation to just focus on its specific job.</li>
  * <li>The <code>ScanOperatorExec</code> is the operator implementation for
  * the new result-set-loader based scan.</li>
  * <li>The scan framework is specific to each kind of reader. It handles
  * everything which is unique to that reader. Rather than inheriting from
  * the scan itself, the framework follows the strategy pattern: it says how
  * to do a scan for the target format.<li>
  * </ul>
  *
  * The overall structure uses the "composition" pattern: what is combined
  * into a small set of classes in the traditional model is broken out into
  * focused classes in the revised model.
  * <p>
  * A key part of the scan strategy is the batch reader. ("Batch" because
  * it reads an entire batch at a time, using the result set loader.) The
  * framework creates batch readers one by one as needed. Resource bloat
  * is less of an issue because only one batch reader instance exists at
  * any time for each scan operator instance.
  * <p>
  * Each of the above is further broken down into additional classes to
  * handle projection and so on.
  */

 package org.apache.drill.exec.physical.impl.scan;
	/*
	* Licensed to the Apache Software Foundation (ASF) under one
	* or more contributor license agreements. See the NOTICE file
	* distributed with this work for additional information
	* regarding copyright ownership. The ASF licenses this file
	* to you under the Apache License, Version 2.0 (the
	* "License"); you may not use this file except in compliance
	* with the License. You may obtain a copy of the License at
	*
	* http://www.apache.org/licenses/LICENSE-2.0
	*
	* Unless required by applicable law or agreed to in writing, software
	* distributed under the License is distributed on an "AS IS" BASIS,
	* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	* See the License for the specific language governing permissions and
	* limitations under the License.
	*/
	/**
	* Defines the scan operation implementation. The scan operator is a generic mechanism
	* that fits into the Drill Volcano-based iterator protocol to return record batches
	* from one or more readers.
	* <p>
	* Two versions of the scan operator exist:<ul>
	* <li>{@link ScanBatch}: the original version that uses readers based on the
	* {@link RecordReader} interface. <tt>ScanBatch</tt> cannot, however, handle
	* limited-length vectors.</li>
	* <li>{@link ScanOperatorExec}: the revised version that uses a more modular
	* design and that offers a mutator that is a bit easier to use, and can limit
	* vector sizes.</li></ul>
	* New code should use the new version, existing code will continue to use the
	* <tt>ScanBatch</tt> version until all readers are converted to the new format.
	* <p>
	* Further, the new version is designed to allow intensive unit test without
	* the need for the Drill server. New readers should exploit this feature to
	* include intensive tests to keep Drill quality high.
	* <p>
	* See {@link ScanOperatorExec} for details of the scan operator protocol
	* and components.
	*
	* <h4>Traditional Class Structure<h4>
	* The original design was simple: but required each reader to handle many
	* detailed tasks.
	* <pre><code>
	* +------------+ +-----------+
	* \| Scan Batch \| +---> \| ScanBatch \|
	* \| Creator \| \| +-----------+
	* +------------+ \| \|
	* \| \| \|
	* v \| \|
	* +------------+ \| v
	* \| Format \| ---+ +---------------+
	* \| Plugin \| -----> \| Record Reader \|
	* +------------+ +---------------+
	*
	* </code></pre>
	*
	* The scan batch creator is unique to each storage plugin and is created
	* based on the physical operator configuration ("pop config"). The
	* scan batch creator delegates to the format plugin to create both the
	* scan batch (the scan operator) and the set of readers which the scan
	* batch will manage.
	* <p>
	* The scan batch
	* provides a <code>Mutator</code> that creates the vectors used by the
	* record readers. Schema continuity comes from reusing the Mutator from one
	* file/block to the next.
	* <p>
	* One characteristic of this system is that all the record readers are
	* created up front. If we must read 1000 blocks, we'll create 1000 record
	* readers. Developers must be very careful to only allocate resources when
	* the reader is opened, and release resources when the reader is closed.
	* Else, resource bloat becomes a large problem.
	*
	* <h4>Revised Class Structure</h4>
	*
	* The new design is more complex because it divides tasks up into separate
	* classes. The class structure is larger, but each class is smaller, more
	* focused and does just one task.
	* <pre><code>
	* +------------+ +---------------+
	* \| Scan Batch \| -------> \| Format Plugin \|
	* \| Creator \| +---------------+
	* +------------+ / \| \
	* / \| \
	* +---------------------+ \| \ +---------------+
	* \| OperatorRecordBatch \| \| +---->\| ScanFramework \|
	* +---------------------+ \| \| +---------------+
	* v \| \|
	* +------------------+ \|
	* \| ScanOperatorExec \| \|
	* +------------------+ v
	* \| +--------------+
	* +----------> \| Batch Reader \|
	* +--------------+
	* </code></pre>
	*
	* Here, the scan batch creator again delegates to the format plugin. The
	* format plugin creates three objects:
	* <ul>
	* <li>The <code>OperatorRecordBatch</code>, which encapsulates the Volcano
	* iterator protocol. It also holds onto the output batch. This allows the
	* operator implementation to just focus on its specific job.</li>
	* <li>The <code>ScanOperatorExec</code> is the operator implementation for
	* the new result-set-loader based scan.</li>
	* <li>The scan framework is specific to each kind of reader. It handles
	* everything which is unique to that reader. Rather than inheriting from
	* the scan itself, the framework follows the strategy pattern: it says how
	* to do a scan for the target format.<li>
	* </ul>
	*
	* The overall structure uses the "composition" pattern: what is combined
	* into a small set of classes in the traditional model is broken out into
	* focused classes in the revised model.
	* <p>
	* A key part of the scan strategy is the batch reader. ("Batch" because
	* it reads an entire batch at a time, using the result set loader.) The
	* framework creates batch readers one by one as needed. Resource bloat
	* is less of an issue because only one batch reader instance exists at
	* any time for each scan operator instance.
	* <p>
	* Each of the above is further broken down into additional classes to
	* handle projection and so on.
	*/

	package org.apache.drill.exec.physical.impl.scan;