| /* |
| * Licensed to the Apache Software Foundation (ASF) under one |
| * or more contributor license agreements. See the NOTICE file |
| * distributed with this work for additional information |
| * regarding copyright ownership. The ASF licenses this file |
| * to you under the Apache License, Version 2.0 (the |
| * "License"); you may not use this file except in compliance |
| * with the License. You may obtain a copy of the License at |
| * |
| * http://www.apache.org/licenses/LICENSE-2.0 |
| * |
| * Unless required by applicable law or agreed to in writing, software |
| * distributed under the License is distributed on an "AS IS" BASIS, |
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| * See the License for the specific language governing permissions and |
| * limitations under the License. |
| */ |
| /** |
| * Defines a mock data source which generates dummy test data for use |
| * in testing. The data source operates in two modes: |
| * <ul> |
| * <li><b>Classic:</b> used in physical plans in many unit tests. |
| * The plan specifies a set of columns; data is generated by the |
| * vectors themselves based on two alternating values.</li> |
| * <li><b>Enhanced:</b> available for use in newer unit tests. |
| * Enhances the physical plan description to allow specifying a data |
| * generator class (for various types, data formats, etc.) Also |
| * provides a data storage engine framework to allow using mock |
| * tables in SQL queries.</li> |
| * </ul> |
| * <h3>Classic Mode</h3> |
| * Create a scan operator that looks like the following (from |
| * <tt>/src/test/resources/functions/cast/two_way_implicit_cast.json</tt>, |
| * used in {@link TestReverseImplicitCast}): |
| * <pre><code> |
| * graph:[ |
| * { |
| * @id:1, |
| * pop:"mock-scan", |
| * url: "http://apache.org", |
| * entries:[ |
| * {records: 1, types: [ |
| * {name: "col1", type: "FLOAT4", mode: "REQUIRED"}, |
| * {name: "col2", type: "FLOAT8", mode: "REQUIRED"} |
| * ]} |
| * ] |
| * }, |
| * }, ... |
| * </code></pre> |
| * Here: |
| * <ul> |
| * <li>The <tt>pop</tt> must be <tt>mock-scan</tt>.</li> |
| * <li>The <tt>url</tt> is unused.</li> |
| * <li>The <tt>entries</tt> section can have one or more entries. If |
| * more than one entry, the storage engine will enable parallel scans |
| * up to the number of entries, as though each entry was a different |
| * file or group.</li> |
| * <li>The entry <tt>name</tt> is arbitrary, though color names seem |
| * to be the traditional names used in Drill tests.</li> |
| * <li>The <tt>type</tt> is one of the supported Drill |
| * {@link MinorType} names.</li> |
| * <li>The <tt>mode</tt> is one of the supported Drill |
| * {@link DataMode} names: usually <tt>OPTIONAL</tt> or <tt>REQUIRED</tt>.</li> |
| * </ul> |
| * <p> |
| * Recent extensions include: |
| * <ul> |
| * <li><tt>repeat</tt> in either the "entry" or "record" elements allow |
| * repeating entries (simulating multiple blocks or row groups) and |
| * repeating fields (easily create a dozen fields of some type.)</li> |
| * <li><tt>generator</tt> in a field definition lets you specify a |
| * specific data generator (see below.)</tt> |
| * <li><tt>properties</tt> in a field definition lets you pass |
| * generator-specific values to the data generator (such as, say |
| * a minimum and maximum value.)</li> |
| * </ul> |
| * |
| * <h3>Enhanced Mode</h3> |
| * Enhanced builds on the Classic mode to add additional capabilities. |
| * Enhanced mode can be used either in a physical plan or in SQL. Data |
| * is randomly generated over a wide range of values and can be |
| * controlled by custom generator classes. When |
| * in a physical plan, the <tt>records</tt> section has additional |
| * attributes as described in {@link MockTableDef.MockColumn}: |
| * <ul> |
| * <li>The <tt>generator</tt> lets you specify a class to generate the |
| * sample data. Rules for the class name is that it can either contain |
| * a full package path, or just a class name. If just a class name, the |
| * class is assumed to reside in this package. For example, to generate |
| * an ISO date into a string, use <tt>DateGen</tt>. Additional generators |
| * can (and should) be added as the need arises.</li> |
| * <li>The <tt>repeat</tt> attribute lets you create a very wide row by |
| * repeating a column the specified number of times. Actual column names |
| * have a numeric suffix. For example, if the base name is "blue" and |
| * is repeated twice, actual columns are "blue1" and "blue2".</li> |
| * </ul> |
| * When used in SQL, use the <tt>mock</tt> name space as follows: |
| * <pre><code> |
| * SELECT id_i, name_s50 FROM `mock`.`employee_500`; |
| * </code></pre> |
| * Both the column names and table names encode information that specifies |
| * what data to generate. |
| * <p> |
| * Columns are of the form <tt><i>name</i>_<i>type</i><i>length</i>?</tt>. |
| * <ul> |
| * <li>The name is anything you want ("id" and "name" in the example.)</li> |
| * <li>The underscore is required to separate the type from the name.</li> |
| * <li>The type is one of "i" (integer), "d" (double) or "s" (string). |
| * Other types can be added as needed: n (decimal number), l (long), etc.</li> |
| * <li>The length is optional and is used only for string (<tt>VARCHAR</tt>) |
| * columns. The default string length is 10.</li> |
| * <li>Columns do not yet support nulls. When they do, the encoding will |
| * be "_n<i>percent</i>" where the percent specifies the percent of rows |
| * that should contain null values in this column.<l/i> |
| * <li>The column is known to SQL as its full name, that is "id_i" or |
| * "name_s50".</li> |
| * </ul> |
| * <p> |
| * Tables are of the form <tt><i>name</i>_<i>rows</i><i>unit<i>?</tt> where: |
| * <ul> |
| * <li>The name is anything you want. ("employee" in the example.)</li> |
| * <li>The underscore is required to separate the row count from the name.</li> |
| * <li>The row count specifies the number of rows to return.</li> |
| * <li>The count unit can be none, K (multiply count by 1000) or M |
| * (multiply row count by one million), case insensitive.</li> |
| * <li>Another field (not yet implemented) might specify the split count.</li> |
| * </ul> |
| * <h3>Enhanced Mode with Definition File</h3> |
| * You can reference a mock data definition file directly from SQL as follows: |
| * <pre<code>SELECT * FROM `mock`.`your_defn_file.json`</code></pre> |
| * <h3>Data Generators</h3> |
| * The classic mode uses data generators built into each vector to generate |
| * the sample data. These generators use a very simple black/white alternating |
| * series of two values. Simple, but limited. The enhanced mode allows custom |
| * data generators. Unfortunately, this requires a separate generator class for |
| * each data type. As a result, we presently support just a few key data types. |
| * On the other hand, the custom generators do allow tests to specify a custom |
| * generator class to generate the kind of data needed for that test. |
| * <p> |
| * All data generators implement the {@link FieldGen} interface, and must have |
| * a non-argument constructor to allow dynamic instantiation. The mock data |
| * source either picks a default generator (if no <tt>generator</tt> is provided) |
| * or uses the custom generator specified in <tt>generator<tt>. Generators |
| * are independent (though one could, perhaps, write generators that correlate |
| * field values.) |
| */ |
| package org.apache.drill.exec.store.mock; |