Merge pull request #69 from Parquet/delta_encoding

spec for delta encoding
diff --git a/Encodings.md b/Encodings.md
index d725fa4..d39ac47 100644
--- a/Encodings.md
+++ b/Encodings.md
@@ -54,4 +54,80 @@
 a complete group should still be output with 0's filling in for the remainder.
 For example, if the input was (1,2,3,4,5): the resulting encoding should
 behave as if the input was (1,2,3,4,5,0,0,0) and the two groups should be
-encoded back to back.
\ No newline at end of file
+encoded back to back.
+
+### Delta Encoding
+Supported Types: INT32, INT64
+
+This encoding is adapted from the Binary packing described in ["Decoding billions of integers per second through vectorization"](http://arxiv.org/pdf/1209.2137v5.pdf) by D. Lemire and L. Boytsov
+
+Delta encoding consists of a header followed by blocks of delta encoded values binary packed. Each block is made of miniblocks each binary packed with its own bit width. When there are not enough values to encode a full block we pad with zeros (added to the frame of reference).
+The header contains:
+```
+<block size in values> <number of miniblocks in a block> <total value count> <first value>
+```
+the block size is a multiple of 128 stored as VLQ int
+the miniblock count per block is a diviser of the block size stored as VLQ int the number of values in the miniblock is a multiple of 32.
+the total value count is stored as a VLQ int
+the first value is stored as a zigzag VLQ int
+
+Each block contains 
+```
+<min delta> <list of bitwidths of miniblocks> <miniblocks>
+```
+the min delta is a VLQ int (we compute a minimum as we need positive integers for bit packing)
+the bitwidth of each block is stored as a byte
+each miniblock is a list of bit packed ints according to the bit width stored at the begining of the block
+
+Having multiple blocks allows us to escape values and restart from a new base value.
+
+To encode each delta block, we will:
+
+1. Compute the deltas
+
+2. Encode the first value as zigzag VLQ int
+
+3. For each block, compute the frame of reference(minimum of the deltas) for the deltas. This guarantees
+all deltas are positive.
+
+4. encode the frame of reference delta as VLQ int followed by the delta values (minus the minimum) encoded as bit packed per miniblock.
+
+Steps 2 and 3 are skipped if the number of values in the block is 1.
+
+#### Example 1
+1, 2, 3, 4, 5
+
+After step 1), we compute the deltas as:
+
+1, 1, 1, 1
+
+The minimum delta is 1 and after step 2, the deltas become
+
+0, 0, 0, 0
+
+The final encoded data is:
+
+ header:
+8 (block size), 1 (miniblock count), 5 (value count), 1 (first value)
+
+ block
+1 (minimum delta), 0 (bitwidth), (no data needed for bitwidth 0)
+
+#### Example 2
+7, 5, 3, 1, 2, 3, 4, 5, the deltas would be
+
+-2, -2, -2, 1, 1, 1, 1
+
+The minimum is -2, so the relative deltas are:
+
+0, 0, 0, 3, 3, 3, 3
+
+The encoded data is
+
+ header:
+8 (block size), 1 (miniblock count), 8 (value count), 7 (first value)
+
+ block
+0 (minimum delta), 2 (bitwidth), 000000111111b (0,0,0,3,3,3 packed on 2 bits)
+
+
diff --git a/src/thrift/parquet.thrift b/src/thrift/parquet.thrift
index bddf831..6762578 100644
--- a/src/thrift/parquet.thrift
+++ b/src/thrift/parquet.thrift
@@ -145,6 +145,10 @@
   /** Bit packed encoding.  This can only be used if the data has a known max
    * width.  Usable for definition/repetition levels encoding.  **/
   BIT_PACKED = 4;
+
+  /** Delta encoding for integers. This can be used for int columns and works best 
+   * on sorted data */
+  DELTA_BINARY_PACKED = 5;
 }
 
 /**