DRILL-7441: Fix issues with fillEmpties, offset vectors

Fixes subtle issues with offset vectors and "fill empties"
logic.

Drill has an informal standard that if a batch has no rows, then
offset vectors within that batch should have zero size. Contrast
this with batches of size 1 that should have offset vectors of
size 2. Changed to enforce this rule throughout.

Nullable, repeated and variable-width vectors have "fill empties"
logic that is used in two places: when setting the value count and
when preparing to write a new value. The current logic is not
quite right for either case. Added tests and fixed the code to
properly handle each case.

Revised the batch validator to enforce the offset-vector length of 0 for
0-sized batches rule. The result was much simpler code.

Added tools to easily print a batch, restoring some code that
was recently lost when the RowSet classes were moved.

Code cleanup in all files touched.

Added logic to "dirty" allocated buffers when testing to ensure
logic is not sensitive to the "pristine" state of new buffers.

Added logic to the column writers to enforce the zero-size-batch rule
for offset vectors. Added unit tests for this case.

Fixed the column writers to set the "lastSet" mutator value for
nullable types since other code relies on this value.

Removed the "setCount" field in nullable vectors: turns out
it is not actually used.

closes #1896
44 files changed
tree: 8c4650e51daa5d1f92929f8667f50d40bb07d86e
  1. .circleci/
  2. .mvn/
  3. common/
  4. contrib/
  5. distribution/
  6. docs/
  7. drill-shaded/
  8. drill-yarn/
  9. exec/
  10. logical/
  11. metastore/
  12. protocol/
  13. sample-data/
  14. src/
  15. tools/
  16. .gitignore
  17. .travis.yml
  18. header
  19. KEYS
  20. LICENSE
  21. NOTICE
  22. pom.xml
  23. README.md
README.md

Apache Drill

Build Status Artifact License

Apache Drill is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems. It was inspired in part by Google's Dremel.

Developers

Please read Environment.md for setting up and running Apache Drill. For complete developer documentation see DevDocs.md

More Information

Please see the Apache Drill Website or the Apache Drill Documentation for more information including:

  • Remote Execution Installation Instructions
  • Information about how to submit logical and distributed physical plans
  • More example queries and sample data
  • Find out ways to be involved or discuss Drill

Join the community!

Apache Drill is an Apache Foundation project and is seeking all types of users and contributions. Please say hello on the Apache Drill mailing list.You can also join our Google Hangouts or join our Slack Channel if you need help with using or developing Apache Drill. (More information can be found on Apache Drill website).

Export Control

This distribution includes cryptographic software. The country in which you currently reside may have restrictions on the import, possession, use, and/or re-export to another country, of encryption software. BEFORE using any encryption software, please check your country's laws, regulations and policies concerning the import, possession, or use, and re-export of encryption software, to see if this is permitted. See http://www.wassenaar.org/ for more information.
The U.S. Government Department of Commerce, Bureau of Industry and Security (BIS), has classified this software as Export Commodity Control Number (ECCN) 5D002.C.1, which includes information security software using or performing cryptographic functions with asymmetric algorithms. The form and manner of this Apache Software Foundation distribution makes it eligible for export under the License Exception ENC Technology Software Unrestricted (TSU) exception (see the BIS Export Administration Regulations, Section 740.13) for both object code and source code. The following provides more details on the included cryptographic software: Java SE Security packages are used to provide support for authentication, authorization and secure sockets communication. The Jetty Web Server is used to provide communication via HTTPS. The Cyrus SASL libraries, Kerberos Libraries and OpenSSL Libraries are used to provide SASL based authentication and SSL communication.