| This file describes the format produced by 'svnadmin dump' and |
| consumed by 'svnadmin load'. |
| |
| The format has undergone revisions over time. They are presented in |
| reverse chronological order here. You may wish to start with the |
| VERSION 1 description in order to get a baseline understanding first. |
| |
| ===== SVN DUMPFILE VERSION 3 FORMAT ===== |
| |
| (generated by SVN versions 1.1.0-present, if requested by the user) |
| |
| This format is equivalent to the VERSION 2 format except for the |
| following: |
| |
| 1.) The format starts with the new version number of the dump format |
| ("SVN-fs-dump-format-version: 3\n"). |
| |
| 2.) There are several new optional headers for node changes: |
| |
| [Text-delta: true|false] |
| [Prop-delta: true|false] |
| [Text-delta-base-md5: blob] |
| [Text-delta-base-sha1: blob] |
| [Text-copy-source-sha1: blob] |
| [Text-content-sha1: blob] |
| |
| The default value for the boolean headers is "false". If the value is |
| set to "true", then the text and property contents will be treated |
| as deltas against the previous contents of the node (as determined |
| by copy history for adds with history, or by the value in the |
| previous revision for changes--just as with commits). |
| |
| Property deltas have the same format as regular property lists except |
| that (1) properties with the same value as in the previous contents of |
| the node are not printed, and (2) deleted properties will be written |
| out as |
| |
| D <name length> |
| <name> |
| |
| just as a regular property is printed, but with the "K " changed to a |
| "D " and with no value part. |
| |
| Text deltas are written out as a series of svndiff0 windows. If |
| Text-delta-base-md5 is provided, it is the checksum of the base to |
| which the text delta is applied; note that older versions (pre-1.5) of |
| 'svnadmin load' may ignore the checksum. |
| |
| Text-delta-base-sha1, Text-copy-source-sha1, and Text-content-sha1 are not |
| currently used by the loader. They are written by 1.6-and-later versions of |
| Subversion so that future loaders can optionally choose which checksum to |
| use for checking for corruption. |
| |
| ===== SVN DUMPFILE VERSION 2 FORMAT ===== |
| |
| (generated by SVN versions 0.18.0-present, by default) |
| |
| This format is equivalent to the VERSION 1 format in every respect, |
| except for the following: |
| |
| 1.) The format starts with the new version number of the dump format |
| ("SVN-fs-dump-format-version: 2\n"). |
| |
| 2.) In addition to "Revision Records", another sort of record is supported: |
| the "UUID" record, which should be of the form: |
| |
| UUID: 7bf7a5ef-cabf-0310-b7d4-93df341afa7e |
| |
| This should be used to indicate the UUID of the originating repository. |
| |
| ===== SVN DUMPFILE VERSION 1 FORMAT ===== |
| |
| (generated by SVN versions prior to 0.18.0) |
| |
| The binary format starts with the version number of the dump format |
| ("SVN-fs-dump-format-version: 1\n"), followed by a series of revision |
| records. Each revision record starts with information about the |
| revision, followed by a variable number of node changes for that |
| revision. Fields in [braces] are optional, and unknown headers are |
| always ignored, for backwards compatibility. |
| |
| Revision-number: N |
| Prop-content-length: P |
| Content-length: L |
| |
| ...P bytes of property data. Properties are stored in the same |
| human-readable hashdump format used by working copy property files, |
| except that they end with "PROPS-END\n" for better readability. |
| |
| Node-path: absolute/path/to/node/in/filesystem |
| Node-kind: file | dir (1) |
| Node-action: change | add | delete | replace |
| [Node-copyfrom-rev: X] |
| [Node-copyfrom-path: path ] |
| [Text-copy-source-md5: blob] (2) |
| [Text-content-md5: blob] |
| [Text-content-length: T] |
| [Prop-content-length: P] |
| Content-length: Y (3) |
| |
| ... Y bytes of content data, divided into P bytes of "property" |
| data and T bytes of "text" data. The properties come first; their |
| total length (including formatting) is Prop-content-length, and is |
| included in Node-content-length. The "PROPS-END\n" line always |
| terminates the property section if there are props. The remainder |
| of the Y bytes (expected to be equivalent to Text-content-length) |
| represent the contents of the node. |
| |
| |
| Notes: |
| |
| (1) if the node represents a deletion, this field is optional. |
| |
| (2) this is a checksum of the source of the copy. a loader process |
| can use this checksum to determine that the copyfrom path/rev |
| already present in a filesystem is really the *correct* one to |
| use. |
| |
| (3) the Content-length header is technically unnecessary, since the |
| information it holds (and more) can be found in the |
| Prop-content-length and Text-content-length fields. Though |
| Subversion itself does not make use of the header when reading |
| a dumpfile, we include it for compatibility with generic RFC822 |
| parsers. |
| |
| (4) There are actually 2 types of version 1 dump streams. The |
| regular ones are generated since r2634 (svn 0.14.0). Older ones |
| also claim to be version 1, but miss the Props-content-length |
| and Text-content-length fields in the block header. In those |
| days there *always* was a properties block. |
| |
| EXAMPLE: |
| |
| Here's an example of revision 1422, whereby I added a new directory |
| "baz", added a new file "bop" inside it, and modified the file "foo.c": |
| |
| Revision-number: 1422 |
| Prop-content-length: 80 |
| Content-length: 80 |
| |
| K 6 |
| author |
| V 7 |
| sussman |
| K 3 |
| log |
| V 33 |
| Added two files, changed a third. |
| PROPS-END |
| |
| Node-path: bar/baz |
| Node-kind: dir |
| Node-action: add |
| Prop-content-length: 35 |
| Content-length: 35 |
| |
| K 10 |
| svn:ignore |
| V 4 |
| TAGS |
| PROPS-END |
| |
| |
| Node-path: bar/baz/bop |
| Node-kind: file |
| Node-action: add |
| Prop-content-length: 76 |
| Text-content-length: 54 |
| Content-length: 130 |
| |
| K 14 |
| svn:executable |
| V 2 |
| on |
| K 12 |
| svn:keywords |
| V 15 |
| LastChangedDate |
| PROPS-END |
| Here is the text of the newly added 'bop' file. |
| Whee. |
| |
| Node-path: bar/foo.c |
| Node-kind: file |
| Node-action: change |
| Text-content-length: 102 |
| Content-length: 102 |
| |
| Here is the fulltext of my change to an existing /bar/foo.c. |
| Notice that this file has no properties. |
| |
| -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- |
| |
| Old discussion: |
| |
| (This file started as a proposal, preserved here for posterity.) |
| |
| A proposal for an svn filesystem dump/restore format. |
| |
| Two problems we want to solve |
| ============================= |
| |
| 1. When we change our node-id schema, we need to migrate all of our |
| data (by dumping and restoring). |
| |
| 2. Serves as a backup format. Could be read by other software tools |
| someday. |
| |
| |
| Design Goals |
| ============ |
| |
| A. Written as two new public functions in svn_fs.h. To be invoked |
| by new 'svnadmin' subcommands. |
| |
| B. Format uses only timeless fs concepts. |
| |
| The dump format needs to reference concepts that we *know* are |
| general enough to never change. These concepts must exist |
| independently of any internal node-id schema, or any DB storage |
| backend. In other words, we're talking about the basic ideas in |
| our original "design spec" from May 2000. |
| |
| |
| Format Semantics |
| ================ |
| |
| Here are the timeless semantics of our fs design -- the things that |
| would be stored in our dump format. |
| |
| - A filesystem is an array of trees. |
| Each tree is called a "revision" and has unversioned properties attached. |
| |
| - A revision has a tree of "nodes" hanging off of it. |
| Actually, the nodes in the filesystem form a DAG. A revision |
| always points to an initial node that represents the 'root' of some tree. |
| |
| - The majority of a tree's nodes are hard-links (references) to |
| nodes that were created in earlier trees. |
| |
| - A node contains |
| |
| - versioned text |
| - versioned properties |
| - predecessor history: "which node am I a variant of?" |
| - copy history: "which node am I a copy of?" |
| |
| The history values can be non-existent (meaning the node is |
| completely new), or can have a value of {revision, path}. |
| |
| |
| ------------------------------------------------------------------------ |
| Refinement of proposal #2: (after discussion with gstein) |
| ========================= |
| |
| Each node starts with RFC822-style headers at the top. The final |
| header is a 'Content-length:', followed by the content, so record |
| boundaries can be inferred. |
| |
| The content section has two implicit parts: a property hash, and the |
| fulltext. The division between these two sections is implied by the |
| "PROPS-END\n" tag at the end of the prophash. In the case of a |
| directory node or a revision, only the prophash is present. |