DRILL-7293: Convert the regex ("log") plugin to use EVF

Converts the log format plugin (which uses a regex for parsing) to work
with the Extended Vector Format.

User-visible behavior changes added to the README file.

* Use the plugin config object to pass config to the Easy framework.
* Use the EVF scan mechanism in place of the legacy "ScanBatch"
mechanism.
* Minor code and README cleanup.
* Replace ad-hoc type conversion with builtin conversions

The provided schema support in the enhanced vector framework (EVF)
provides automatic conversions from VARCHAR to most types. The log
format plugin was created before EVF was available and provided its own
conversion mechanism. This commit removes the ad-hoc conversion code and
instead uses the log plugin config schema information to create an
"output schema" just as if it was provided by the provided schema
framework.

Because we need the schema in the plugin (rather than the reader), moved
the schema-parsing code out of the reader into the plugin. The plugin
creates two schemas: an "output schema" with the desired output types,
and a "reader schema" that uses only VARCHAR. This causes the EVF to
perform conversions.

* Enable provided schema support

Allows the user to specify types using either the format config (as
previously) or a provided schema. If a schema is provided, it will match
columns using names specified in the format config.

The provided schema can specify both types and modes (nullable or not
null.)

If a schema is provided, then the types specified in the plugin config
are ignored. No attempt is made to merge schemas.

If a schema is provided, but a column is omitted from the schema, the
type defaults to VARCHAR.

* Added ability to specify regex in table properties

Allows the user to specify the regex, and the column schema,
using a CREATE SCHEMA statement. The README file provides the details.
Unit tests demonstrate and verify the functionality.

* Used the custom error context provided by EVF to enhance the log format
reader error messages.
* Added user name to default EVF error context
* Added support for table functions

Can set the regex and maxErrors fields, but not the schema.
Schema will default to "field_0", "field_1", etc. of type
VARCHAR.

* Added unit tests to verify the functionality.
* Added a check, and a test, for a regex with no groups.
* Added columns array support

When the log regex plugin is given no schema, it previously
created a list of columns "field_0", "field_1", etc. After
this change, the plugin instead follows the pattern set by
the text plugin: it will place all fields into the columns
array. (The two special fields are still separate.)

A few adjustments were necessary to the columns array
framework to allow use of the special columns along with
the `columns` column.

Modified unit tests and the README to reflect this change.
The change should be backward compatible because few users
are likely relying on the dummy field names.

Added unit tests to verify that schema-based table
functions work. A test shows that, due to the unforunate
config property name "schema", users of this plugin cannot
combine a config table function with the schema attribute
in the way promised in DRILL-6965.
19 files changed
tree: 3cc9ff06b31c944fee64b803efa65ad46c3cd974
  1. .circleci/
  2. .mvn/
  3. common/
  4. contrib/
  5. distribution/
  6. docs/
  7. drill-shaded/
  8. drill-yarn/
  9. exec/
  10. logical/
  11. metastore/
  12. protocol/
  13. sample-data/
  14. src/
  15. tools/
  16. .gitignore
  17. .travis.yml
  18. header
  19. KEYS
  20. LICENSE
  21. NOTICE
  22. pom.xml
  23. README.md
README.md

Apache Drill

Build Status Artifact License

Apache Drill is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems. It was inspired in part by Google's Dremel.

Developers

Please read Environment.md for setting up and running Apache Drill. For complete developer documentation see DevDocs.md

More Information

Please see the Apache Drill Website or the Apache Drill Documentation for more information including:

  • Remote Execution Installation Instructions
  • Information about how to submit logical and distributed physical plans
  • More example queries and sample data
  • Find out ways to be involved or discuss Drill

Join the community!

Apache Drill is an Apache Foundation project and is seeking all types of users and contributions. Please say hello on the Apache Drill mailing list.You can also join our Google Hangouts or join our Slack Channel if you need help with using or developing Apache Drill. (More information can be found on Apache Drill website).

Export Control

This distribution includes cryptographic software. The country in which you currently reside may have restrictions on the import, possession, use, and/or re-export to another country, of encryption software. BEFORE using any encryption software, please check your country's laws, regulations and policies concerning the import, possession, or use, and re-export of encryption software, to see if this is permitted. See http://www.wassenaar.org/ for more information.
The U.S. Government Department of Commerce, Bureau of Industry and Security (BIS), has classified this software as Export Commodity Control Number (ECCN) 5D002.C.1, which includes information security software using or performing cryptographic functions with asymmetric algorithms. The form and manner of this Apache Software Foundation distribution makes it eligible for export under the License Exception ENC Technology Software Unrestricted (TSU) exception (see the BIS Export Administration Regulations, Section 740.13) for both object code and source code. The following provides more details on the included cryptographic software: Java SE Security packages are used to provide support for authentication, authorization and secure sockets communication. The Jetty Web Server is used to provide communication via HTTPS. The Cyrus SASL libraries, Kerberos Libraries and OpenSSL Libraries are used to provide SASL based authentication and SSL communication.