commit | 2224ee1014cfdcc2b76f1a8290171ca63db76a53 | [log] [tgz] |
---|---|---|
author | Paul Rogers <par0328@yahoo.com> | Wed Jun 12 18:28:16 2019 -0700 |
committer | Arina Ielchiieva <arina.yelchiyeva@gmail.com> | Mon Jul 15 13:25:20 2019 +0300 |
tree | 3cc9ff06b31c944fee64b803efa65ad46c3cd974 | |
parent | 77cf7e2ee61fb40e7efd85148ac76947d13dda38 [diff] |
DRILL-7293: Convert the regex ("log") plugin to use EVF Converts the log format plugin (which uses a regex for parsing) to work with the Extended Vector Format. User-visible behavior changes added to the README file. * Use the plugin config object to pass config to the Easy framework. * Use the EVF scan mechanism in place of the legacy "ScanBatch" mechanism. * Minor code and README cleanup. * Replace ad-hoc type conversion with builtin conversions The provided schema support in the enhanced vector framework (EVF) provides automatic conversions from VARCHAR to most types. The log format plugin was created before EVF was available and provided its own conversion mechanism. This commit removes the ad-hoc conversion code and instead uses the log plugin config schema information to create an "output schema" just as if it was provided by the provided schema framework. Because we need the schema in the plugin (rather than the reader), moved the schema-parsing code out of the reader into the plugin. The plugin creates two schemas: an "output schema" with the desired output types, and a "reader schema" that uses only VARCHAR. This causes the EVF to perform conversions. * Enable provided schema support Allows the user to specify types using either the format config (as previously) or a provided schema. If a schema is provided, it will match columns using names specified in the format config. The provided schema can specify both types and modes (nullable or not null.) If a schema is provided, then the types specified in the plugin config are ignored. No attempt is made to merge schemas. If a schema is provided, but a column is omitted from the schema, the type defaults to VARCHAR. * Added ability to specify regex in table properties Allows the user to specify the regex, and the column schema, using a CREATE SCHEMA statement. The README file provides the details. Unit tests demonstrate and verify the functionality. * Used the custom error context provided by EVF to enhance the log format reader error messages. * Added user name to default EVF error context * Added support for table functions Can set the regex and maxErrors fields, but not the schema. Schema will default to "field_0", "field_1", etc. of type VARCHAR. * Added unit tests to verify the functionality. * Added a check, and a test, for a regex with no groups. * Added columns array support When the log regex plugin is given no schema, it previously created a list of columns "field_0", "field_1", etc. After this change, the plugin instead follows the pattern set by the text plugin: it will place all fields into the columns array. (The two special fields are still separate.) A few adjustments were necessary to the columns array framework to allow use of the special columns along with the `columns` column. Modified unit tests and the README to reflect this change. The change should be backward compatible because few users are likely relying on the dummy field names. Added unit tests to verify that schema-based table functions work. A test shows that, due to the unforunate config property name "schema", users of this plugin cannot combine a config table function with the schema attribute in the way promised in DRILL-6965.
Apache Drill is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems. It was inspired in part by Google's Dremel.
Please read Environment.md for setting up and running Apache Drill. For complete developer documentation see DevDocs.md
Please see the Apache Drill Website or the Apache Drill Documentation for more information including:
Apache Drill is an Apache Foundation project and is seeking all types of users and contributions. Please say hello on the Apache Drill mailing list.You can also join our Google Hangouts or join our Slack Channel if you need help with using or developing Apache Drill. (More information can be found on Apache Drill website).
This distribution includes cryptographic software. The country in which you currently reside may have restrictions on the import, possession, use, and/or re-export to another country, of encryption software. BEFORE using any encryption software, please check your country's laws, regulations and policies concerning the import, possession, or use, and re-export of encryption software, to see if this is permitted. See http://www.wassenaar.org/ for more information.
The U.S. Government Department of Commerce, Bureau of Industry and Security (BIS), has classified this software as Export Commodity Control Number (ECCN) 5D002.C.1, which includes information security software using or performing cryptographic functions with asymmetric algorithms. The form and manner of this Apache Software Foundation distribution makes it eligible for export under the License Exception ENC Technology Software Unrestricted (TSU) exception (see the BIS Export Administration Regulations, Section 740.13) for both object code and source code. The following provides more details on the included cryptographic software: Java SE Security packages are used to provide support for authentication, authorization and secure sockets communication. The Jetty Web Server is used to provide communication via HTTPS. The Cyrus SASL libraries, Kerberos Libraries and OpenSSL Libraries are used to provide SASL based authentication and SSL communication.