exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/package-info.java - drill - Git at Google

 /*
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing, software
  * distributed under the License is distributed on an "AS IS" BASIS,
  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
 /**
  * Defines a mock data source which generates dummy test data for use
  * in testing. The data source operates in two modes:
  * <ul>
  * <li><b>Classic:</b> used in physical plans in many unit tests.
  * The plan specifies a set of columns; data is generated by the
  * vectors themselves based on two alternating values.</li>
  * <li><b>Enhanced:</b> available for use in newer unit tests.
  * Enhances the physical plan description to allow specifying a data
  * generator class (for various types, data formats, etc.) Also
  * provides a data storage engine framework to allow using mock
  * tables in SQL queries.</li>
  * </ul>
  * <h3>Classic Mode</h3>
  * Create a scan operator that looks like the following (from
  * <tt>/src/test/resources/functions/cast/two_way_implicit_cast.json</tt>,
  * used in {@link TestReverseImplicitCast}):
  * <pre><code>
  *    graph:[
  *        {
  *            @id:1,
  *            pop:"mock-scan",
  *            url: "http://apache.org",
  *            entries:[
  *                {records: 1, types: [
  *                    {name: "col1", type: "FLOAT4", mode: "REQUIRED"},
  *                    {name: "col2", type: "FLOAT8", mode: "REQUIRED"}
  *                ]}
  *            ]
  *        },
  *    }, ...
  * </code></pre>
  * Here:
  * <ul>
  * <li>The <tt>pop</tt> must be <tt>mock-scan</tt>.</li>
  * <li>The <tt>url</tt> is unused.</li>
  * <li>The <tt>entries</tt> section can have one or more entries. If
  * more than one entry, the storage engine will enable parallel scans
  * up to the number of entries, as though each entry was a different
  * file or group.</li>
  * <li>The entry <tt>name</tt> is arbitrary, though color names seem
  * to be the traditional names used in Drill tests.</li>
  * <li>The <tt>type</tt> is one of the supported Drill
  * {@link MinorType} names.</li>
  * <li>The <tt>mode</tt> is one of the supported Drill
  * {@link DataMode} names: usually <tt>OPTIONAL</tt> or <tt>REQUIRED</tt>.</li>
  * </ul>
  * <p>
  * Recent extensions include:
  * <ul>
  * <li><tt>repeat</tt> in either the "entry" or "record" elements allow
  * repeating entries (simulating multiple blocks or row groups) and
  * repeating fields (easily create a dozen fields of some type.)</li>
  * <li><tt>generator</tt> in a field definition lets you specify a
  * specific data generator (see below.)</tt>
  * <li><tt>properties</tt> in a field definition lets you pass
  * generator-specific values to the data generator (such as, say
  * a minimum and maximum value.)</li>
  * </ul>
  *
  * <h3>Enhanced Mode</h3>
  * Enhanced builds on the Classic mode to add additional capabilities.
  * Enhanced mode can be used either in a physical plan or in SQL. Data
  * is randomly generated over a wide range of values and can be
  * controlled by custom generator classes. When
  * in a physical plan, the <tt>records</tt> section has additional
  * attributes as described in {@link MockTableDef.MockColumn}:
  * <ul>
  * <li>The <tt>generator</tt> lets you specify a class to generate the
  * sample data. Rules for the class name is that it can either contain
  * a full package path, or just a class name. If just a class name, the
  * class is assumed to reside in this package. For example, to generate
  * an ISO date into a string, use <tt>DateGen</tt>. Additional generators
  * can (and should) be added as the need arises.</li>
  * <li>The <tt>repeat</tt> attribute lets you create a very wide row by
  * repeating a column the specified number of times. Actual column names
  * have a numeric suffix. For example, if the base name is "blue" and
  * is repeated twice, actual columns are "blue1" and "blue2".</li>
  * </ul>
  * When used in SQL, use the <tt>mock</tt> name space as follows:
  * <pre><code>
  * SELECT id_i, name_s50 FROM `mock`.`employee_500`;
  * </code></pre>
  * Both the column names and table names encode information that specifies
  * what data to generate.
  * <p>
  * Columns are of the form <tt><i>name</i>_<i>type</i><i>length</i>?</tt>.
  * <ul>
  * <li>The name is anything you want ("id" and "name" in the example.)</li>
  * <li>The underscore is required to separate the type from the name.</li>
  * <li>The type is one of "i" (integer), "d" (double) or "s" (string).
  * Other types can be added as needed: n (decimal number), l (long), etc.</li>
  * <li>The length is optional and is used only for string (<tt>VARCHAR</tt>)
  * columns. The default string length is 10.</li>
  * <li>Columns do not yet support nulls. When they do, the encoding will
  * be "_n<i>percent</i>" where the percent specifies the percent of rows
  * that should contain null values in this column.<l/i>
  * <li>The column is known to SQL as its full name, that is "id_i" or
  * "name_s50".</li>
  * </ul>
  * <p>
  * Tables are of the form <tt><i>name</i>_<i>rows</i><i>unit<i>?</tt> where:
  * <ul>
  * <li>The name is anything you want. ("employee" in the example.)</li>
  * <li>The underscore is required to separate the row count from the name.</li>
  * <li>The row count specifies the number of rows to return.</li>
  * <li>The count unit can be none, K (multiply count by 1000) or M
  * (multiply row count by one million), case insensitive.</li>
  * <li>Another field (not yet implemented) might specify the split count.</li>
  * </ul>
  * <h3>Enhanced Mode with Definition File</h3>
  * You can reference a mock data definition file directly from SQL as follows:
  * <pre<code>SELECT * FROM `mock`.`your_defn_file.json`</code></pre>
  * <h3>Data Generators</h3>
  * The classic mode uses data generators built into each vector to generate
  * the sample data. These generators use a very simple black/white alternating
  * series of two values. Simple, but limited. The enhanced mode allows custom
  * data generators. Unfortunately, this requires a separate generator class for
  * each data type. As a result, we presently support just a few key data types.
  * On the other hand, the custom generators do allow tests to specify a custom
  * generator class to generate the kind of data needed for that test.
  * <p>
  * All data generators implement the {@link FieldGen} interface, and must have
  * a non-argument constructor to allow dynamic instantiation. The mock data
  * source either picks a default generator (if no <tt>generator</tt> is provided)
  * or uses the custom generator specified in <tt>generator<tt>. Generators
  * are independent (though one could, perhaps, write generators that correlate
  * field values.)
  */
 package org.apache.drill.exec.store.mock;
	/*
	* Licensed to the Apache Software Foundation (ASF) under one
	* or more contributor license agreements. See the NOTICE file
	* distributed with this work for additional information
	* regarding copyright ownership. The ASF licenses this file
	* to you under the Apache License, Version 2.0 (the
	* "License"); you may not use this file except in compliance
	* with the License. You may obtain a copy of the License at
	*
	* http://www.apache.org/licenses/LICENSE-2.0
	*
	* Unless required by applicable law or agreed to in writing, software
	* distributed under the License is distributed on an "AS IS" BASIS,
	* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	* See the License for the specific language governing permissions and
	* limitations under the License.
	*/
	/**
	* Defines a mock data source which generates dummy test data for use
	* in testing. The data source operates in two modes:
	* <ul>
	* <li><b>Classic:</b> used in physical plans in many unit tests.
	* The plan specifies a set of columns; data is generated by the
	* vectors themselves based on two alternating values.</li>
	* <li><b>Enhanced:</b> available for use in newer unit tests.
	* Enhances the physical plan description to allow specifying a data
	* generator class (for various types, data formats, etc.) Also
	* provides a data storage engine framework to allow using mock
	* tables in SQL queries.</li>
	* </ul>
	* <h3>Classic Mode</h3>
	* Create a scan operator that looks like the following (from
	* <tt>/src/test/resources/functions/cast/two_way_implicit_cast.json</tt>,
	* used in {@link TestReverseImplicitCast}):
	* <pre><code>
	* graph:[
	* {
	* @id:1,
	* pop:"mock-scan",
	* url: "http://apache.org",
	* entries:[
	* {records: 1, types: [
	* {name: "col1", type: "FLOAT4", mode: "REQUIRED"},
	* {name: "col2", type: "FLOAT8", mode: "REQUIRED"}
	* ]}
	* ]
	* },
	* }, ...
	* </code></pre>
	* Here:
	* <ul>
	* <li>The <tt>pop</tt> must be <tt>mock-scan</tt>.</li>
	* <li>The <tt>url</tt> is unused.</li>
	* <li>The <tt>entries</tt> section can have one or more entries. If
	* more than one entry, the storage engine will enable parallel scans
	* up to the number of entries, as though each entry was a different
	* file or group.</li>
	* <li>The entry <tt>name</tt> is arbitrary, though color names seem
	* to be the traditional names used in Drill tests.</li>
	* <li>The <tt>type</tt> is one of the supported Drill
	* {@link MinorType} names.</li>
	* <li>The <tt>mode</tt> is one of the supported Drill
	* {@link DataMode} names: usually <tt>OPTIONAL</tt> or <tt>REQUIRED</tt>.</li>
	* </ul>
	* <p>
	* Recent extensions include:
	* <ul>
	* <li><tt>repeat</tt> in either the "entry" or "record" elements allow
	* repeating entries (simulating multiple blocks or row groups) and
	* repeating fields (easily create a dozen fields of some type.)</li>
	* <li><tt>generator</tt> in a field definition lets you specify a
	* specific data generator (see below.)</tt>
	* <li><tt>properties</tt> in a field definition lets you pass
	* generator-specific values to the data generator (such as, say
	* a minimum and maximum value.)</li>
	* </ul>
	*
	* <h3>Enhanced Mode</h3>
	* Enhanced builds on the Classic mode to add additional capabilities.
	* Enhanced mode can be used either in a physical plan or in SQL. Data
	* is randomly generated over a wide range of values and can be
	* controlled by custom generator classes. When
	* in a physical plan, the <tt>records</tt> section has additional
	* attributes as described in {@link MockTableDef.MockColumn}:
	* <ul>
	* <li>The <tt>generator</tt> lets you specify a class to generate the
	* sample data. Rules for the class name is that it can either contain
	* a full package path, or just a class name. If just a class name, the
	* class is assumed to reside in this package. For example, to generate
	* an ISO date into a string, use <tt>DateGen</tt>. Additional generators
	* can (and should) be added as the need arises.</li>
	* <li>The <tt>repeat</tt> attribute lets you create a very wide row by
	* repeating a column the specified number of times. Actual column names
	* have a numeric suffix. For example, if the base name is "blue" and
	* is repeated twice, actual columns are "blue1" and "blue2".</li>
	* </ul>
	* When used in SQL, use the <tt>mock</tt> name space as follows:
	* <pre><code>
	* SELECT id_i, name_s50 FROM `mock`.`employee_500`;
	* </code></pre>
	* Both the column names and table names encode information that specifies
	* what data to generate.
	* <p>
	* Columns are of the form <tt><i>name</i>_<i>type</i><i>length</i>?</tt>.
	* <ul>
	* <li>The name is anything you want ("id" and "name" in the example.)</li>
	* <li>The underscore is required to separate the type from the name.</li>
	* <li>The type is one of "i" (integer), "d" (double) or "s" (string).
	* Other types can be added as needed: n (decimal number), l (long), etc.</li>
	* <li>The length is optional and is used only for string (<tt>VARCHAR</tt>)
	* columns. The default string length is 10.</li>
	* <li>Columns do not yet support nulls. When they do, the encoding will
	* be "_n<i>percent</i>" where the percent specifies the percent of rows
	* that should contain null values in this column.<l/i>
	* <li>The column is known to SQL as its full name, that is "id_i" or
	* "name_s50".</li>
	* </ul>
	* <p>
	* Tables are of the form <tt><i>name</i>_<i>rows</i><i>unit<i>?</tt> where:
	* <ul>
	* <li>The name is anything you want. ("employee" in the example.)</li>
	* <li>The underscore is required to separate the row count from the name.</li>
	* <li>The row count specifies the number of rows to return.</li>
	* <li>The count unit can be none, K (multiply count by 1000) or M
	* (multiply row count by one million), case insensitive.</li>
	* <li>Another field (not yet implemented) might specify the split count.</li>
	* </ul>
	* <h3>Enhanced Mode with Definition File</h3>
	* You can reference a mock data definition file directly from SQL as follows:
	* <pre<code>SELECT * FROM `mock`.`your_defn_file.json`</code></pre>
	* <h3>Data Generators</h3>
	* The classic mode uses data generators built into each vector to generate
	* the sample data. These generators use a very simple black/white alternating
	* series of two values. Simple, but limited. The enhanced mode allows custom
	* data generators. Unfortunately, this requires a separate generator class for
	* each data type. As a result, we presently support just a few key data types.
	* On the other hand, the custom generators do allow tests to specify a custom
	* generator class to generate the kind of data needed for that test.
	* <p>
	* All data generators implement the {@link FieldGen} interface, and must have
	* a non-argument constructor to allow dynamic instantiation. The mock data
	* source either picks a default generator (if no <tt>generator</tt> is provided)
	* or uses the custom generator specified in <tt>generator<tt>. Generators
	* are independent (though one could, perhaps, write generators that correlate
	* field values.)
	*/
	package org.apache.drill.exec.store.mock;