| |
| A proposal for an svn filesystem dump/restore format. |
| |
| Two problems we want to solve |
| ============================= |
| |
| 1. When we change our node-id schema, we need to migrate all of our |
| data (by dumping and restoring). |
| |
| 2. Serves as a backup format. Could be read by other software tools |
| someday. |
| |
| |
| Design Goals |
| ============ |
| |
| A. Written as two new public functions in svn_fs.h. To be invoked |
| by new 'svnadmin' subcommands. |
| |
| B. Format uses only timeless fs concepts. |
| |
| The dump format needs to reference concepts that we *know* are |
| general enough to never change. These concepts must exist |
| independently of any internal node-id schema, or any DB storage |
| backend. In other words, we're talking about the basic ideas in |
| our original "design spec" from May 2000. |
| |
| |
| Format Semantics |
| ================ |
| |
| Here are the timeless semantics of our fs design -- the things that |
| would be stored in our dump format. |
| |
| - A filesystem is an array of trees. |
| Each tree is called a "revision" and has unversioned properties attached. |
| |
| - A revision has a tree of "nodes" hanging off of it. |
| Actually, the nodes in the filesystem form a DAG. A revision |
| always points to an initial node that represents the 'root' of some tree. |
| |
| - The majority of a tree's nodes are hard-links (references) to |
| nodes that were created in earlier trees. |
| |
| - A node contains |
| |
| - versioned text |
| - versioned properties |
| - predecessor history: "which node am I a variant of?" |
| - copy history: "which node am I a copy of?" |
| |
| The history values can be non-existent (meaning the node is |
| completely new), or can have a value of {revision, path}. |
| |
| |
| ------------------------------------------------------------------------ |
| Refinement of proposal #2: (after discussion with gstein) |
| ========================= |
| |
| Each node starts with RFC822-style headers at the top. The final |
| header is a 'Content-length:', followed by the content, so record |
| boundaries can be inferred. |
| |
| The content section has two implicit parts: a property hash, and the |
| fulltext. The division between these two sections is implied by the |
| "PROPS-END\n" tag at the end of the prophash. In the case of a |
| directory node or a revision, only the prophash is present. |
| |
| ----------------------------------------------------------------- |
| |
| SVN DUMPFILE VERSION 1 FORMAT |
| |
| The format starts with the version number of the dump format |
| ("SVN-fs-dump-format-version: 1\n"), followed by a series of revision |
| records. Each revision record starts with information about the |
| revision, followed by a variable number of node changes for that |
| revision. Fields in [braces] are optional, and unknown headers are |
| always ignored, for backwards compatibility. |
| |
| Revision-number: N |
| [Revision-content-md5: blob] |
| Content-length: L |
| |
| ...N bytes of property data. Properties are stored in the same |
| human-readable hashdump format used by working copy property files, |
| except that they end with "PROPS-END\n" for better readability. |
| |
| Node-path: /absolute/path/to/node/in/filesystem |
| Node-kind: file | dir (1) |
| Node-action: change | add | delete | replace |
| [Node-copied-from: X, path ] |
| [Node-copy-source-checksum: blob] (2) |
| [Node-content-md5: blob] |
| Content-length: Y |
| |
| ... Y bytes of content data, divided into 'props' and 'text' |
| sections. The properties come first; their total length (including |
| formatting) is included in Node-content-length. The "PROPS-END\n" |
| line always terminates the property section; if there are no props, |
| "PROPS-END\n" still signifies the beginning of the node's text |
| content. |
| |
| |
| Notes: |
| |
| (1) if the node represents a deletion, this field is optional. |
| |
| (2) this is a checksum of the source of the copy. a loader process |
| can use this checksum to determine that the copyfrom path/rev |
| already present in a filesystem is really the *correct* one to use. |
| |
| |
| |
| |
| ----------------------------------------------------------------- |
| EXAMPLE |
| |
| Here's an example of revision 1422, whereby I added a new directory |
| "baz", added a new file "bop" inside it, and modified the file "foo.c": |
| |
| |
| Revision-number: 1422 |
| Content-length: 74 |
| |
| K 6 |
| author |
| V 7 |
| sussman |
| K 3 |
| log |
| V 17 |
| Added two files, changed a third. |
| PROPS-END |
| |
| Node-path: /bar/baz |
| Node-rev: 1422 |
| Node-kind: dir |
| Node-action: added |
| Content-checksum: oj3eu729 |
| Content-length: 29 |
| |
| K 10 |
| svn:ignore |
| V 4 |
| TAGS |
| PROPS-END |
| |
| Node-path: /bar/baz/bop |
| Node-rev: 1422 |
| Node-kind: file |
| Node-action: added |
| Content-checksum: bzz35te7 |
| Content-length: 124 |
| |
| K 12 |
| svn:keywords |
| V 15 |
| LastChangedDate |
| K 14 |
| svn:executable |
| V 2 |
| on |
| PROPS-END |
| Here is the text of the newly added 'bop' file. |
| Whee. |
| |
| Node-path: /bar/foo.c |
| Node-rev: 1422 |
| Node-kind: file |
| Node-action: added |
| Content-checksum: Ae73te7et |
| Content-length: 105 |
| |
| PROPS-END |
| Here is the fulltext of my change to an existing /bar/foo.c. |
| Notice that this file has no properties. |