blob: 60e67143144e5051adbbf1cbc64f574bb4b6ec08 [file] [log] [blame]
A proposal for an svn filesystem dump/restore format.
Two problems we want to solve
=============================
1. When we change our node-id schema, we need to migrate all of our
data (by dumping and restoring).
2. Serves as a backup format. Could be read by other software tools
someday.
Design Goals
============
A. Written as two new public functions in svn_fs.h. To be invoked
by new 'svnadmin' subcommands.
B. Format uses only timeless fs concepts.
The dump format needs to reference concepts that we *know* are
general enough to never change. These concepts must exist
independently of any internal node-id schema, or any DB storage
backend. In other words, we're talking about the basic ideas in
our original "design spec" from May 2000.
Format Semantics
================
Here are the timeless semantics of our fs design -- the things that
would be stored in our dump format.
- A filesystem is an array of trees.
Each tree is called a "revision" and has unversioned properties attached.
- A revision has a tree of "nodes" hanging off of it.
Actually, the nodes in the filesystem form a DAG. A revision
always points to an initial node that represents the 'root' of some tree.
- The majority of a tree's nodes are hard-links (references) to
nodes that were created in earlier trees.
- A node contains
- versioned text
- versioned properties
- predecessor history: "which node am I a variant of?"
- copy history: "which node am I a copy of?"
The history values can be non-existent (meaning the node is
completely new), or can have a value of {revision, path}.
------------------------------------------------------------------------
Refinement of proposal #2: (after discussion with gstein)
=========================
Each node starts with RFC822-style headers at the top. The final
header is a 'Content-length:', followed by the content, so record
boundaries can be inferred.
The content section has two implicit parts: a property hash, and the
fulltext. The division between these two sections is implied by the
"PROPS-END\n" tag at the end of the prophash. In the case of a
directory node or a revision, only the prophash is present.
-----------------------------------------------------------------
SVN DUMPFILE VERSION 1 FORMAT
The format starts with the version number of the dump format
("SVN-fs-dump-format-version: 1\n"), followed by a series of revision
records. Each revision record starts with information about the
revision, followed by a variable number of node changes for that
revision. Fields in [braces] are optional, and unknown headers are
always ignored, for backwards compatibility.
Revision-number: N
[Revision-content-md5: blob]
Prop-content-length: P
Content-length: L
...P bytes of property data. Properties are stored in the same
human-readable hashdump format used by working copy property files,
except that they end with "PROPS-END\n" for better readability.
Node-path: /absolute/path/to/node/in/filesystem
Node-kind: file | dir (1)
Node-action: change | add | delete | replace
[Node-copyfrom-rev: X]
[Node-copyfrom-path: /path ]
[Node-copy-source-md5: blob] (2)
[Node-content-md5: blob]
[Text-content-length: T]
[Prop-content-length: P]
Content-length: Y (3)
... Y bytes of content data, divided into P bytes of "property"
data and T bytes of "text" data. The properties come first; their
total length (including formatting) is Prop-content-length, and is
included in Node-content-length. The "PROPS-END\n" line always
terminates the property section if there are props. The remainder
of the Y bytes (expected to be equivalent to Text-content-length]
represent the contents of the node.
Notes:
(1) if the node represents a deletion, this field is optional.
(2) this is a checksum of the source of the copy. a loader process
can use this checksum to determine that the copyfrom path/rev
already present in a filesystem is really the *correct* one to use.
(3) the Content-length header is technically unnecessary, since the
information it holds (and more) can be found in the
Prop-content-length and Text-content-length fields. Though
Subversion itself does not make use of the header when reading a
dumpfile, we include it for compatibility with generic RFC822
parsers.
-----------------------------------------------------------------
EXAMPLE
Here's an example of revision 1422, whereby I added a new directory
"baz", added a new file "bop" inside it, and modified the file "foo.c":
Revision-number: 1422
Prop-content-length: 80
Content-length: 80
K 6
author
V 7
sussman
K 3
log
V 17
Added two files, changed a third.
PROPS-END
Node-path: bar/baz
Node-kind: dir
Node-action: add
Prop-content-length: 35
Content-length: 35
K 10
svn:ignore
V 4
TAGS
PROPS-END
Node-path: bar/baz/bop
Node-kind: file
Node-action: add
Prop-content-length: 76
Text-content-length: 54
Content-length: 130
K 14
svn:executable
V 2
on
K 12
svn:keywords
V 15
LastChangedDate
PROPS-END
Here is the text of the newly added 'bop' file.
Whee.
Node-path: bar/foo.c
Node-kind: file
Node-action: change
Text-content-length: 102
Content-length: 102
Here is the fulltext of my change to an existing /bar/foo.c.
Notice that this file has no properties.