PARQUET-2231: [Format] Allow DELTA_BYTE_ARRAY for FIXED_LEN_BYTE_ARRAY (#189) DELTA_BYTE_ARRAY has been supported for FIXED_LEN_BYTE_ARRAY by parquet-mr since 2015 (see PARQUET-152). Update the spec in consequence. Also improve wording, markup and add an example.

commit: bfc549b93e6927cb1fc425466e4084f76edc6d22 [log] [tgz]
author: Antoine Pitrou <antoine@python.org> Thu Jan 19 23:20:21 2023 +0100
committer: GitHub <noreply@github.com> Thu Jan 19 23:20:21 2023 +0100
tree: 77108dc96122d3566807732bebf558ea8eca957e
parent: 613a1cf4475c662457a0fd81a894ce4709799e3b [diff]
diff --git a/Encodings.md b/Encodings.md
index 40e2177..a84cb02 100644
--- a/Encodings.md
+++ b/Encodings.md

@@ -280,16 +280,19 @@
 and possibly better compression in the data (it is no longer interleaved with the lengths).
 
 The data stream looks like:
-
+```
 <Delta Encoded Lengths> <Byte Array Data>
+```
 
-For example, if the data was "Hello", "World", "Foobar", "ABCDEF":
+For example, if the data was "Hello", "World", "Foobar", "ABCDEF"
 
-The encoded data would be DeltaEncoding(5, 5, 6, 6) "HelloWorldFoobarABCDEF"
+then the encoded data would be comprised of the following segments:
+- DeltaEncoding(5, 5, 6, 6) (the string lengths)
+- "HelloWorldFoobarABCDEF"
 
 ### Delta Strings: (DELTA_BYTE_ARRAY = 7)
 
-Supported Types: BYTE_ARRAY
+Supported Types: BYTE_ARRAY, FIXED_LEN_BYTE_ARRAY
 
 This is also known as incremental encoding or front compression: for each element in a
 sequence of strings, store the prefix length of the previous entry plus the suffix.
@@ -299,9 +302,18 @@
 This is stored as a sequence of delta-encoded prefix lengths (DELTA_BINARY_PACKED), followed by
 the suffixes encoded as delta length byte arrays (DELTA_LENGTH_BYTE_ARRAY).
 
+For example, if the data was "axis", "axle", "babble", "babyhood"
+
+then the encoded data would be comprised of the following segments:
+- DeltaEncoding(0, 2, 0, 3) (the prefix lengths)
+- DeltaEncoding(4, 2, 6, 5) (the suffix lengths)
+- "axislebabbleyhood"
+
+Note that, even for FIXED_LEN_BYTE_ARRAY, all lengths are encoded despite the redundancy.
+
 ### Byte Stream Split: (BYTE_STREAM_SPLIT = 9)
 
-Supported Types: FLOAT DOUBLE
+Supported Types: FLOAT, DOUBLE
 
 This encoding does not reduce the size of the data but can lead to a significantly better
 compression ratio and speed when a compression algorithm is used afterwards.
commit	bfc549b93e6927cb1fc425466e4084f76edc6d22	[log] [tgz]
author	Antoine Pitrou <antoine@python.org>	Thu Jan 19 23:20:21 2023 +0100
committer	GitHub <noreply@github.com>	Thu Jan 19 23:20:21 2023 +0100
tree	77108dc96122d3566807732bebf558ea8eca957e
parent	613a1cf4475c662457a0fd81a894ce4709799e3b [diff]