| This file describes the svndiff format used by the Subversion code. |
| Its design borrows many ideas from the vdelta and vcdiff encoding |
| formats from AT&T Research Labs, but it is much simpler and thus a |
| little less compact. |
| |
| From the point of view of svndiff, a delta is a sequence of windows, |
| each containing a list of instructions for reconstructing a contiguous |
| section of the target using a contiguous section of the source as a |
| reference. The section of the target being reconstructed is called |
| the "target view"; the section of the source being referenced is |
| called the "source view." Source views must not slide backwards from |
| one window to the next; this allows svndiffs to be applied using a |
| single pass through the source file. Instructions in a window direct |
| copies to be made into the target view from one of three places: from |
| the source view, from the portion of the target view which has already |
| been reconstructed, or from a block of new data encoded inside the |
| window. |
| |
| An svndiff document begins with four bytes, "SVN" followed by a zero |
| byte which represents a version number. After the header come one or |
| more windows, until the document ends. (So the decoder must have |
| external context indicating when there is no more svndiff data.) |
| |
| A window is the concatenation of the following: |
| |
| The source view offset |
| The source view length |
| The target view length |
| The length of the instructions in bytes |
| The length of the new data in bytes |
| The window's instructions |
| The window's new data (as raw data) |
| |
| Integers (including the first five items listed above) are encoded |
| using a variable-length format. The high bit of each byte is used as |
| a continuation bit; 1 indicates that there is more data and 0 |
| indicates the final byte. The other seven bits of each byte are data. |
| Higher-order bits are encoded before lower-order bits. As an example, |
| 130 would be encoded as two bytes, 10000001 followed by 00000010. |
| |
| Instructions are encoded as follows: the two high bits of the first |
| byte compose an instruction selector, as follows: |
| |
| 00 Copy from source view |
| 01 Copy from target view |
| 10 Copy from new data |
| 11 invalid |
| |
| The remaining six bits of the first byte indicate the length of the |
| copy. If those six bytes are all zero, then the length is encoded as |
| an integer immediately following the first byte of the instruction. |
| If the instruction selector is 00 or 01, then the instruction encoding |
| continues with an offset encoded as an integer. If the instruction |
| selector is 10, then the offset into the new data is implicit; each |
| copy from the new data is always for "the next <length> bytes" after |
| the last copy. |
| |
| A copy from the target view must begin at a location before than the |
| current position in the target view, but its length may extend past |
| the current position. In this case, the target data copied is |
| repeated, as happens naturally if the copy is performed byte by byte |
| starting at the beginning. |
| |
| Following are some example instruction encodings. |
| |
| Copy 11 bytes from offset 0 in source view: |
| 00001011 00000000 |
| |
| Copy 64 bytes from offset 128 in target view: |
| 01000000 00100000 10000001 00000000 |
| |
| Copy the next 63 bytes of new data: |
| 10111111 |
| |
| Following is a complete example of an svndiff between the source |
| document "aaaabbbbcccc" and the target document "aaaaccccdddddddd": |
| |
| 01010011 01010110 01001110 00000000 Header ("SVN\0") |
| |
| 00000000 Source view offset 0 |
| 00001100 Source view length 12 |
| 00010000 Target view length 16 |
| 00001000 Instruction length 8 |
| 00000001 New data length 1 |
| |
| 00000100 00000000 Source, len 4, offset 0 |
| 00000100 00001000 Source, len 4, offset 8 |
| 10000001 New, len 1 |
| 01000111 00001000 Target, len 7, offset 8 |
| |
| 01100100 The new data: 'd' |