blob: 049fe1539fb129f9aaf90702f25a7795a1e0683f [file] [log] [blame]
A proposal for an svn filesystem dump/restore format.
Two problems we want to solve
=============================
1. When we change our node-id schema, we need to migrate all of our
data (by dumping and restoring).
2. Serves as a backup format. Could be read by other software tools
someday.
Design Goals
============
A. Written as two new public functions in svn_fs.h. To be invoked
by new 'svnadmin' subcommands.
B. Format uses only timeless fs concepts.
The dump format needs to reference concepts that we *know* are
general enough to never change. These concepts must exist
independently of any internal node-id schema, or any DB storage
backend. In other words, we're talking about the basic ideas in
our original "design spec" from May 2000.
Format Semantics
================
Here are the timeless semantics of our fs design -- the things that
would be stored in our dump format.
- A filesystem is an array of trees.
Each tree is called a "revision" and has unversioned properties attached.
- A revision has a tree of "nodes" hanging off of it.
Actually, the nodes in the filesystem form a DAG. A revision
always points to an initial node that represents the 'root' of some tree.
- The majority of a tree's nodes are hard-links (references) to
nodes that were created in earlier trees.
- A node contains
- versioned text
- versioned properties
- predecessor history: "which node am I a variant of?"
- copy history: "which node am I a copy of?"
The history values can be non-existent (meaning the node is
completely new), or can have a value of {revision, path}.
------------------------------------------------------------------------
Refinement of proposal #2: (after discussion with gstein)
=========================
Each node starts with RFC822-style headers at the top. The final
header is a 'Content-length:', followed by the content, so record
boundaries can be inferred.
The content section has two implicit parts: a property hash, and the
fulltext. The division between these two sections is implied by the
"PROPS-END\n" tag at the end of the prophash. In the case of a
directory node or a revision, only the prophash is present.
-----------------------------------------------------------------
SVN DUMPFILE VERSION 1 FORMAT
The format starts with the version number of the dump format
("SVN-fs-dump-format-version: 1\n"), followed by a series of revision
records. Each revision record starts with information about the
revision, followed by a variable number of node changes for that
revision. Fields in [braces] are optional, and unknown headers are
always ignored, for backwards compatibility.
Revision-number: N
[Revision-content-md5: blob]
Content-length: L
...N bytes of property data. Properties are stored in the same
human-readable hashdump format used by working copy property files,
except that they end with "PROPS-END\n" for better readability.
Node-path: /absolute/path/to/node/in/filesystem
Node-kind: file | dir (1)
Node-action: change | add | delete | replace
[Node-copied-from: X, path ]
[Node-copy-source-checksum: blob] (2)
[Node-content-md5: blob]
Content-length: Y
... Y bytes of content data, divided into 'props' and 'text'
sections. The properties come first; their total length (including
formatting) is included in Node-content-length. The "PROPS-END\n"
line always terminates the property section; if there are no props,
"PROPS-END\n" still signifies the beginning of the node's text
content.
Notes:
(1) if the node represents a deletion, this field is optional.
(2) this is a checksum of the source of the copy. a loader process
can use this checksum to determine that the copyfrom path/rev
already present in a filesystem is really the *correct* one to use.
-----------------------------------------------------------------
EXAMPLE
Here's an example of revision 1422, whereby I added a new directory
"baz", added a new file "bop" inside it, and modified the file "foo.c":
Revision-number: 1422
Content-length: 74
K 6
author
V 7
sussman
K 3
log
V 17
Added two files, changed a third.
PROPS-END
Node-path: /bar/baz
Node-rev: 1422
Node-kind: dir
Node-action: added
Content-checksum: oj3eu729
Content-length: 29
K 10
svn:ignore
V 4
TAGS
PROPS-END
Node-path: /bar/baz/bop
Node-rev: 1422
Node-kind: file
Node-action: added
Content-checksum: bzz35te7
Content-length: 124
K 12
svn:keywords
V 15
LastChangedDate
K 14
svn:executable
V 2
on
PROPS-END
Here is the text of the newly added 'bop' file.
Whee.
Node-path: /bar/foo.c
Node-rev: 1422
Node-kind: file
Node-action: added
Content-checksum: Ae73te7et
Content-length: 105
PROPS-END
Here is the fulltext of my change to an existing /bar/foo.c.
Notice that this file has no properties.