refactor documentation for 2.0 encodings
diff --git a/Encodings.md b/Encodings.md
index 75ed475..cf72854 100644
--- a/Encodings.md
+++ b/Encodings.md
@@ -136,9 +136,9 @@
This encoding is always preferred over PLAIN for byte array columns.
For this encoding, we will take all the byte array lengths and encode them using delta
-encoding. The byte array data follows all of the length data just concatenated back to
-back. The expected savings is from the cost of encoding the lengths and possibly
-better compression in the data (it is no longer interleaved with the lengths).
+encoding (DELTA_BINARY_PACKED). The byte array data follows all of the length data just
+concatenated back to back. The expected savings is from the cost of encoding the lengths
+and possibly better compression in the data (it is no longer interleaved with the lengths).
The data stream looks like:
@@ -153,8 +153,7 @@
Supported Types: BYTE_ARRAY
This is also known as incremental encoding or front compression: for each element in a
-sorted sequence of strings, store the prefix length of the previous entry plus the
-suffix.
+sequence of strings, store the prefix length of the previous entry plus the suffix.
For a longer description, see http://en.wikipedia.org/wiki/Incremental_encoding.
diff --git a/src/thrift/parquet.thrift b/src/thrift/parquet.thrift
index 71807f9..2e93ede 100644
--- a/src/thrift/parquet.thrift
+++ b/src/thrift/parquet.thrift
@@ -130,7 +130,8 @@
*/
PLAIN = 0;
- /** Group VarInt encoding for INT32/INT64. */
+ /** Group VarInt encoding for INT32/INT64.
+ */
GROUP_VAR_INT = 1;
/** Dictionary encoding. The values in the dictionary are encoded in the
@@ -139,22 +140,27 @@
PLAIN_DICTIONARY = 2;
/** Group packed run length encoding. Usable for definition/reptition levels
- * encoding */
+ * encoding
+ */
RLE = 3;
/** Bit packed encoding. This can only be used if the data has a known max
- * width. Usable for definition/repetition levels encoding. **/
+ * width. Usable for definition/repetition levels encoding.
+ */
BIT_PACKED = 4;
/** Delta encoding for integers. This can be used for int columns and works best
- * on sorted data */
+ * on sorted data
+ */
DELTA_BINARY_PACKED = 5;
/** Encoding for byte arrays to separate the length values and the data. The lengths
- * are encoded using DELTA_RLE **/
+ * are encoded using DELTA_BINARY_PACKED
+ */
DELTA_LENGTH_BYTE_ARRAY = 6;
- /** Delta-encoded sorted strings.
+ /** Incremental-encoded strings. Prefix lengths are encoded using DELTA_BINARY_PACKED.
+ * Suffixes are stored as delta length byte arrays.
*/
DELTA_STRINGS = 7;
}