|  | This file describes the format produced by 'svnadmin dump' and | 
|  | consumed by 'svnadmin load'. | 
|  |  | 
|  | The format has undergone revisions over time.  They are presented in | 
|  | reverse chronological order here.  You may wish to start with the | 
|  | VERSION 1 description in order to get a baseline understanding first. | 
|  |  | 
|  | ===== SVN DUMPFILE VERSION 3 FORMAT ===== | 
|  |  | 
|  | (generated by SVN versions 1.1.0-present, if requested by the user) | 
|  |  | 
|  | This format is equivalent to the VERSION 2 format except for the | 
|  | following: | 
|  |  | 
|  | 1.) The format starts with the new version number of the dump format | 
|  | ("SVN-fs-dump-format-version: 3\n"). | 
|  |  | 
|  | 2.) There are several new optional headers for node changes: | 
|  |  | 
|  | [Text-delta: true|false] | 
|  | [Prop-delta: true|false] | 
|  | [Text-delta-base-md5: blob] | 
|  | [Text-delta-base-sha1: blob] | 
|  | [Text-copy-source-sha1: blob] | 
|  | [Text-content-sha1: blob] | 
|  |  | 
|  | The default value for the boolean headers is "false".  If the value is | 
|  | set to "true", then the text and property contents will be treated | 
|  | as deltas against the previous contents of the node (as determined | 
|  | by copy history for adds with history, or by the value in the | 
|  | previous revision for changes--just as with commits). | 
|  |  | 
|  | Property deltas have the same format as regular property lists except | 
|  | that (1) properties with the same value as in the previous contents of | 
|  | the node are not printed, and (2) deleted properties will be written | 
|  | out as | 
|  |  | 
|  | D <name length> | 
|  | <name> | 
|  |  | 
|  | just as a regular property is printed, but with the "K " changed to a | 
|  | "D " and with no value part. | 
|  |  | 
|  | Text deltas are written out as a series of svndiff0 windows.  If | 
|  | Text-delta-base-md5 is provided, it is the checksum of the base to | 
|  | which the text delta is applied; note that older versions (pre-1.5) of | 
|  | 'svnadmin load' may ignore the checksum. | 
|  |  | 
|  | Text-delta-base-sha1, Text-copy-source-sha1, and Text-content-sha1 are not | 
|  | currently used by the loader.  They are written by 1.6-and-later versions of | 
|  | Subversion so that future loaders can optionally choose which checksum to | 
|  | use for checking for corruption. | 
|  |  | 
|  | ===== SVN DUMPFILE VERSION 2 FORMAT ===== | 
|  |  | 
|  | (generated by SVN versions 0.18.0-present, by default) | 
|  |  | 
|  | This format is equivalent to the VERSION 1 format in every respect, | 
|  | except for the following: | 
|  |  | 
|  | 1.) The format starts with the new version number of the dump format | 
|  | ("SVN-fs-dump-format-version: 2\n"). | 
|  |  | 
|  | 2.) In addition to "Revision Records", another sort of record is supported: | 
|  | the "UUID" record, which should be of the form: | 
|  |  | 
|  | UUID: 7bf7a5ef-cabf-0310-b7d4-93df341afa7e | 
|  |  | 
|  | This should be used to indicate the UUID of the originating repository. | 
|  |  | 
|  | ===== SVN DUMPFILE VERSION 1 FORMAT ===== | 
|  |  | 
|  | (generated by SVN versions prior to 0.18.0) | 
|  |  | 
|  | The binary format starts with the version number of the dump format | 
|  | ("SVN-fs-dump-format-version: 1\n"), followed by a series of revision | 
|  | records.  Each revision record starts with information about the | 
|  | revision, followed by a variable number of node changes for that | 
|  | revision.  Fields in [braces] are optional, and unknown headers are | 
|  | always ignored, for backwards compatibility. | 
|  |  | 
|  | Revision-number: N | 
|  | Prop-content-length: P | 
|  | Content-length: L | 
|  |  | 
|  | ...P bytes of property data.  Properties are stored in the same | 
|  | human-readable hashdump format used by working copy property files, | 
|  | except that they end with "PROPS-END\n" for better readability. | 
|  |  | 
|  | Node-path: absolute/path/to/node/in/filesystem | 
|  | Node-kind: file | dir  (1) | 
|  | Node-action: change | add | delete | replace | 
|  | [Node-copyfrom-rev: X] | 
|  | [Node-copyfrom-path: path ] | 
|  | [Text-copy-source-md5: blob] (2) | 
|  | [Text-content-md5: blob] | 
|  | [Text-content-length: T] | 
|  | [Prop-content-length: P] | 
|  | Content-length: Y (3) | 
|  |  | 
|  | ... Y bytes of content data, divided into P bytes of "property" | 
|  | data and T bytes of "text" data.  The properties come first; their | 
|  | total length (including formatting) is Prop-content-length, and is | 
|  | included in Node-content-length.  The "PROPS-END\n" line always | 
|  | terminates the property section if there are props.  The remainder | 
|  | of the Y bytes (expected to be equivalent to Text-content-length) | 
|  | represent the contents of the node. | 
|  |  | 
|  |  | 
|  | Notes: | 
|  |  | 
|  | (1) if the node represents a deletion, this field is optional. | 
|  |  | 
|  | (2) this is a checksum of the source of the copy.  a loader process | 
|  | can use this checksum to determine that the copyfrom path/rev | 
|  | already present in a filesystem is really the *correct* one to | 
|  | use. | 
|  |  | 
|  | (3) the Content-length header is technically unnecessary, since the | 
|  | information it holds (and more) can be found in the | 
|  | Prop-content-length and Text-content-length fields.  Though | 
|  | Subversion itself does not make use of the header when reading | 
|  | a dumpfile, we include it for compatibility with generic RFC822 | 
|  | parsers. | 
|  |  | 
|  | (4) There are actually 2 types of version 1 dump streams. The | 
|  | regular ones are generated since r2634 (svn 0.14.0). Older ones | 
|  | also claim to be version 1, but miss the Props-content-length | 
|  | and Text-content-length fields in the block header. In those | 
|  | days there *always* was a properties block. | 
|  |  | 
|  | EXAMPLE: | 
|  |  | 
|  | Here's an example of revision 1422, whereby I added a new directory | 
|  | "baz", added a new file "bop" inside it, and modified the file "foo.c": | 
|  |  | 
|  | Revision-number: 1422 | 
|  | Prop-content-length: 80 | 
|  | Content-length: 80 | 
|  |  | 
|  | K 6 | 
|  | author | 
|  | V 7 | 
|  | sussman | 
|  | K 3 | 
|  | log | 
|  | V 33 | 
|  | Added two files, changed a third. | 
|  | PROPS-END | 
|  |  | 
|  | Node-path: bar/baz | 
|  | Node-kind: dir | 
|  | Node-action: add | 
|  | Prop-content-length: 35 | 
|  | Content-length: 35 | 
|  |  | 
|  | K 10 | 
|  | svn:ignore | 
|  | V 4 | 
|  | TAGS | 
|  | PROPS-END | 
|  |  | 
|  |  | 
|  | Node-path: bar/baz/bop | 
|  | Node-kind: file | 
|  | Node-action: add | 
|  | Prop-content-length: 76 | 
|  | Text-content-length: 54 | 
|  | Content-length: 130 | 
|  |  | 
|  | K 14 | 
|  | svn:executable | 
|  | V 2 | 
|  | on | 
|  | K 12 | 
|  | svn:keywords | 
|  | V 15 | 
|  | LastChangedDate | 
|  | PROPS-END | 
|  | Here is the text of the newly added 'bop' file. | 
|  | Whee. | 
|  |  | 
|  | Node-path: bar/foo.c | 
|  | Node-kind: file | 
|  | Node-action: change | 
|  | Text-content-length: 102 | 
|  | Content-length: 102 | 
|  |  | 
|  | Here is the fulltext of my change to an existing /bar/foo.c. | 
|  | Notice that this file has no properties. | 
|  |  | 
|  | -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- | 
|  |  | 
|  | Old discussion: | 
|  |  | 
|  | (This file started as a proposal, preserved here for posterity.) | 
|  |  | 
|  | A proposal for an svn filesystem dump/restore format. | 
|  |  | 
|  | Two problems we want to solve | 
|  | ============================= | 
|  |  | 
|  | 1.  When we change our node-id schema, we need to migrate all of our | 
|  | data (by dumping and restoring). | 
|  |  | 
|  | 2.  Serves as a backup format.  Could be read by other software tools | 
|  | someday. | 
|  |  | 
|  |  | 
|  | Design Goals | 
|  | ============ | 
|  |  | 
|  | A.  Written as two new public functions in svn_fs.h.  To be invoked | 
|  | by new 'svnadmin' subcommands. | 
|  |  | 
|  | B.  Format uses only timeless fs concepts. | 
|  |  | 
|  | The dump format needs to reference concepts that we *know* are | 
|  | general enough to never change.  These concepts must exist | 
|  | independently of any internal node-id schema, or any DB storage | 
|  | backend.  In other words, we're talking about the basic ideas in | 
|  | our original "design spec" from May 2000. | 
|  |  | 
|  |  | 
|  | Format Semantics | 
|  | ================ | 
|  |  | 
|  | Here are the timeless semantics of our fs design -- the things that | 
|  | would be stored in our dump format. | 
|  |  | 
|  | - A filesystem is an array of trees. | 
|  | Each tree is called a "revision" and has unversioned properties attached. | 
|  |  | 
|  | - A revision has a tree of "nodes" hanging off of it. | 
|  | Actually, the nodes in the filesystem form a DAG.  A revision | 
|  | always points to an initial node that represents the 'root' of some tree. | 
|  |  | 
|  | - The majority of a tree's nodes are hard-links (references) to | 
|  | nodes that were created in earlier trees. | 
|  |  | 
|  | - A node contains | 
|  |  | 
|  | - versioned text | 
|  | - versioned properties | 
|  | - predecessor history:  "which node am I a variant of?" | 
|  | - copy history:  "which node am I a copy of?" | 
|  |  | 
|  | The history values can be non-existent (meaning the node is | 
|  | completely new), or can have a value of {revision, path}. | 
|  |  | 
|  |  | 
|  | ------------------------------------------------------------------------ | 
|  | Refinement of proposal #2:  (after discussion with gstein) | 
|  | ========================= | 
|  |  | 
|  | Each node starts with RFC822-style headers at the top.  The final | 
|  | header is a 'Content-length:', followed by the content, so record | 
|  | boundaries can be inferred. | 
|  |  | 
|  | The content section has two implicit parts: a property hash, and the | 
|  | fulltext.  The division between these two sections is implied by the | 
|  | "PROPS-END\n" tag at the end of the prophash.  In the case of a | 
|  | directory node or a revision, only the prophash is present. |