doc/programmer/design/server.texi - subversion - Git at Google

 @node Server
 @chapter Server

 The term ``server'' is ambiguous, because it has at least two different
 meanings: it can refer to a powerful computer which offers services to
 users on a network, or it can refer to a CPU process designed to receive
 network requests.

 In Subversion, however, the @dfn{server} is just a set of libraries that
 implements @dfn{repositories} and makes them available to other
 programs.  No networking is required.

 There are two main libraries: the @dfn{Subversion Filesystem} library,
 and the @dfn{Subversion Repository} library.

 @menu
 * Filesystem::                  The Subversion Filesystem.
 * Repository Library::          The Subversion Repository interface.
 @end menu


 @c ----------------------------------------------------------------

 @node Filesystem
 @section Filesystem

 @menu
 * Filesystem Overview::
 * API::
 * Repository Structure::
 * Implementation::
 @end menu

 @node Filesystem Overview
 @subsection Filesystem Overview

 @itemize @bullet
 @item
   @b{Requires:}
   @itemize @minus
   @item
 some writable disk space
   @item
 (for now) Berkeley DB library
   @end itemize
 @item
   @b{Provides:}
   @itemize @minus
   @item
 a repository for storing files
   @item
 concurrent client transactions
   @item
 enforcement of user & group permissions [someday, not yet]
   @end itemize
 @end itemize

 This library implements a hierarchical filesystem which supports atomic
 changes to directory trees, and records a complete history of the
 changes.  In addition to recording changes to file and directory
 contents, the Subversion Filesystem records changes to file meta-data
 (see discussion of @dfn{properties} in @ref{Model}).

 @node API
 @subsection API

 There are two main files that describe the Subversion filesystem.

 First, read the section below (@xref{Repository Structure}.) for a
 general overview of how the filesystem works.

 Once you've done this, read Jim Blandy's own structural overview, which
 explains how nodes and revisions are organized (among other things) in
 the filesystem implementation: @file{subversion/libsvn_fs/structure}.

 Finally, read the well-documented API in
 @file{subversion/include/svn_fs.h}.


 @c ------------------------------------
 @node Repository Structure
 @subsection Repository Structure

 @menu
 * Schema::
 * Bubble-Up Method::
 * Diffy Storage::
 @end menu

 @node Schema
 @subsubsection Schema

 To begin, please be sure that you're already casually familiar with
 Subversion's ideas of files, directories, and revision histories.  If
 not, @xref{Model}.  We can now offer precise, technical descriptions of
 the terms introduced there.

 @c This is taken from jimb's very first Subversion spec!

 @quotation

 A @dfn{text string} is a string of Unicode characters which is
 canonically decomposed and ordered, according to the rules described in
 the Unicode standard.

 A @dfn{string of bytes} is what you'd expect.

 A @dfn{property list} is an unordered list of properties.  A
 @dfn{property} is a pair @code{(@var{name}, @var{value})}, where
 @var{name} is a text string, and @var{value} is a string of bytes.
 No two properties in a property list have the same name.

 A @dfn{file} is a property list and a string of bytes.

 A @dfn{node} is either a file or a directory.  (We define a directory
 below.)  Nodes are distinguished unions --- you can always tell whether
 a node is a file or a directory.

 A @dfn{node table} is an array mapping some set of positive integers,
 called @dfn{node numbers}, onto @dfn{nodes}.  If a node table maps some
 number @var{i} to some node @var{n}, then @var{i} is a @dfn{valid node
 number} in that table, and @dfn{node @var{i}} is @var{n}.  Otherwise,
 @var{i} is an @dfn{invalid node number} in that table.

 A @dfn{directory entry} is a triple @code{(@var{name}, @var{props},
 @var{node})}, where @var{name} is a text string, @var{props} is a
 property list, and @var{node} is a node number.

 A @dfn{directory} is an unordered list of directory entries, and a
 property list.

 A @dfn{revision} is a node number and a property list.

 A @dfn{history} is an array of revisions, indexed by a contiguous range
 of non-negative integers containing 0.

 A @dfn{repository} consists of node table and a history.

 @end quotation

 @c Some definitions: we say that a node @var{n} is a @dfn{direct child}
 @c of a directory @var{d} iff @var{d} contains a directory entry whose
 @c node number is @var{n}. A node @var{n} is a @dfn{child} of a
 @c directory @var{d} iff @var{n} is a direct child of @var{d}, or if
 @c there exists some directory @var{e} which is a direct child of
 @c @var{d}, and @var{n} is a child of @var{e}. Given this definition of
 @c ``direct child'' and ``child,'' the obvious definitions of ``direct
 @c parent'' and ``parent'' hold.

 @c In these restrictions, let @var{r} be any repository.  When we refer,
 @c implicitly or explicitly, to a node table without further clarification,
 @c we mean @var{r}'s node table.  Thus, if we refer to ``a valid node
 @c number'' without specifying the node table in which it is valid, we mean
 @c ``a valid node number in @var{r}'s node table''.  Similarly for
 @c @var{r}'s history.

 Now that we've explained the form of the data, we make some restrictions
 on that form.

 @b{Every revision has a root directory.}  Every revision's node number is
 a valid node number, and the node it refers to is always a directory.
 We call this the revision's @dfn{root directory}.

 @b{Revision 0 always contains an empty root directory.}  This baseline
 makes it easy to check out whole projects from the repository.

 @b{Directories contain only valid links.}
 Every directory entry's @var{node} is a valid node number.

 @b{Directory entries can be identified by name.}
 For any directory @var{d}, every directory entry in @var{d} has a
 distinct name.

 @b{There are no cycles of directories.}  No node is its own child.

 @b{Directories can have more than one parent.}  The Unix file system
 does not allow more than one hard link to a directory, but Subversion
 does allow the analogous situation.  Thus, the directories in a
 Subversion repository form a directed acyclic graph (@dfn{DAG}), not a
 tree.  However, it would be distracting and unhelpful to replace the
 familiar term ``directory tree'' with the unfamiliar term ``directory
 DAG'', so we still call it a ``directory tree'' here.

 @b{There are no dead nodes.}  Every node is a child of some revision's
 root directory.

 @c  </jimb>


 @node Bubble-Up Method
 @subsubsection Bubble-Up Method

 This section provides a conversational explanation of how the repository
 actually stores and revisions file trees.  It's not critical knowledge
 for a programmer using the Subversion Filesystem API, but most people
 probably still want to know what's going on ``under the hood'' of the
 repository.

 Suppose we have a new project, at revision 1, looking like this (using
 CVS syntax):

 @example
    prompt$ svn checkout myproj
    U myproj/
    U myproj/B
    U myproj/A
    U myproj/A/fish
    U myproj/A/fish/tuna
    prompt$
 @end example

 Only the file @file{tuna} is a regular file, everything else in myproj is
 a directory.

 Let's see what this looks like as an abstract data structure in the
 repository, and how that structure works in various operations (such
 as update, commit, and branch).

 In the diagrams that follow, lines represent parent-to-child connections
 in a directory hierarchy.  Boxes are "nodes".  A node is either a file
 or a directory -- a letter in the upper left indicates which kind.  A
 file node has a byte-string for its content, whereas directory nodes
 have a list of dir_entries, each pointing to another node.

 Parent-child links go both ways (i.e., a child knows who all its parents
 are), but a node's name is stored only in its parent, because a node
 with multiple parents may have different names in different parents.

 At the top of the repository is an array of revision numbers,
 stretching off to infinity.  Since the project is at revision 1, only
 index 1 points to anything; it points to the root node of revision 1 of
 the project:

 @example
 @group
                     ( myproj's revision array )
        ______________________________________________________
       |___1_______2________3________4________5_________6_____...
           |
           |
        ___|_____
       |D        |
       |         |
       |   A     |      /* Two dir_entries, `A' and `B'. */
       |    \    |
       |   B \   |
       |__/___\__|
         /     \
        |       \
        |        \
     ___|___   ___\____
    |D      | |D       |
    |       | |        |
    |       | | fish   |   /* One dir_entry, `fish'. */
    |_______| |___\____|
                   \
                    \
                  ___\____
                 |D       |
                 |        |
                 | tuna   |  /* One dir_entry, `tuna'. */
                 |___\____|
                      \
                       \
                     ___\____
                    |F       |
                    |        |
                    |        |   /* (Contents of tuna not shown.) */
                    |________|

 @end group
 @end example

 What happens when we modify @file{tuna} and commit?  First, we make a
 new @file{tuna} node, containing the latest text.  The new node is not
 connected to anything yet, it's just hanging out there in space:

 @example
 @group
                          ________
                         |F       |
                         |        |
                         |        |
                         |________|
 @end group
 @end example

 Next, we create a @emph{new} revision of its parent directory:

 @example
 @group
                  ________
                 |D       |
                 |        |
                 | tuna   |
                 |___\____|
                      \
                       \
                     ___\____
                    |F       |
                    |        |
                    |        |
                    |________|
 @end group
 @end example

 We continue up the line, creating a new revision of the next parent
 directory:

 @example
 @group
               ________
              |D       |
              |        |
              | fish   |
              |___\____|
                   \
                    \
                  ___\____
                 |D       |
                 |        |
                 | tuna   |
                 |___\____|
                      \
                       \
                     ___\____
                    |F       |
                    |        |
                    |        |
                    |________|
 @end group
 @end example

 Now it gets more tricky: we need to create a new revision of the root
 directory.  This new root directory needs an entry to point to the
 ``new'' directory A, but directory B hasn't changed at all.  Therefore,
 our new root directory also has an entry that still points to the
 @emph{old} directory B node!

 @example
 @group
        ______________________________________________________
       |___1_______2________3________4________5_________6_____...
           |
           |
        ___|_____             ________
       |D        |           |D       |
       |         |           |        |
       |   A     |           |   A    |
       |    \    |           |    \   |
       |   B \   |           |   B \  |
       |__/___\__|           |__/___\_|
         /     \               /     \
        |    ___\_____________/       \
        |   /    \                     \
     ___|__/   ___\____              ___\____
    |D      | |D       |            |D       |
    |       | |        |            |        |
    |       | | fish   |            | fish   |
    |_______| |___\____|            |___\____|
                   \                     \
                    \                     \
                  ___\____              ___\____
                 |D       |            |D       |
                 |        |            |        |
                 | tuna   |            | tuna   |
                 |___\____|            |___\____|
                      \                     \
                       \                     \
                     ___\____              ___\____
                    |F       |            |F       |
                    |        |            |        |
                    |        |            |        |
                    |________|            |________|

 @end group
 @end example

 Finally, after all our new nodes are written, we finish the ``bubble
 up'' process by linking this new tree to the next available revision in
 the history array.  In this case, the new tree becomes revision 2 in the
 repository.

 @example
 @group
        ______________________________________________________
       |___1_______2________3________4________5_________6_____...
           |        \
           |         \__________
        ___|_____             __\_____
       |D        |           |D       |
       |         |           |        |
       |   A     |           |   A    |
       |    \    |           |    \   |
       |   B \   |           |   B \  |
       |__/___\__|           |__/___\_|
         /     \               /     \
        |    ___\_____________/       \
        |   /    \                     \
     ___|__/   ___\____              ___\____
    |D      | |D       |            |D       |
    |       | |        |            |        |
    |       | | fish   |            | fish   |
    |_______| |___\____|            |___\____|
                   \                     \
                    \                     \
                  ___\____              ___\____
                 |D       |            |D       |
                 |        |            |        |
                 | tuna   |            | tuna   |
                 |___\____|            |___\____|
                      \                     \
                       \                     \
                     ___\____              ___\____
                    |F       |            |F       |
                    |        |            |        |
                    |        |            |        |
                    |________|            |________|

 @end group
 @end example

 Generalizing on this example, you can now see that each ``revision'' in
 the repository history represents a root node of a unique tree (and an
 atomic commit to the whole filesystem.)  There are many trees in the
 repository, and many of them share nodes.

 Many nice behaviors come from this model:

 @enumerate
 @item
 @b{Easy reads.}  If a filesystem reader wants to locate revision
 @var{X} of file @file{foo.c}, it need only traverse the repository's
 history, locate revision @var{X}'s root node, then walk down the tree to
 @file{foo.c}.
 @item
 @b{Writers don't interfere with readers.}  Writers can continue to
 create new nodes, bubbling their way up to the top, and concurrent
 readers cannot see the work in progress.  The new tree only becomes
 visible to readers after the writer makes its final ``link'' to the
 repository's history.
 @item
 @b{File structure is versioned.}  Unlike CVS, the very structure of
 each tree is being saved from revision to revision.  File and directory
 renames, additions, and deletions are part of the repository's history.
 @end enumerate

 Let's demonstrate the last point by renaming the @file{tuna} to
 @file{book}.

 We start by creating a new parent ``fish'' directory, except that this
 parent directory has a different dir_entry, one which points the
 @emph{same} old file node, but has a different name:

 @example
 @group
        ______________________________________________________
       |___1_______2________3________4________5_________6_____...
           |        \
           |         \__________
        ___|_____             __\_____
       |D        |           |D       |
       |         |           |        |
       |   A     |           |   A    |
       |    \    |           |    \   |
       |   B \   |           |   B \  |
       |__/___\__|           |__/___\_|
         /     \               /     \
        |    ___\_____________/       \
        |   /    \                     \
     ___|__/   ___\____              ___\____
    |D      | |D       |            |D       |
    |       | |        |            |        |
    |       | | fish   |            | fish   |
    |_______| |___\____|            |___\____|
                   \                     \
                    \                     \
                  ___\____              ___\____      ________
                 |D       |            |D       |    |D       |
                 |        |            |        |    |        |
                 | tuna   |            | tuna   |    | book   |
                 |___\____|            |___\____|    |_/______|
                      \                     \         /
                       \                     \       /
                     ___\____              ___\____ /
                    |F       |            |F       |
                    |        |            |        |
                    |        |            |        |
                    |________|            |________|
 @end group
 @end example

 From here, we finish with the bubble-up process.  We make new parent
 directories up to the top, culminating in a new root directory with two
 dir_entries (one points to the old ``B'' directory node we've had all
 along, the other to the new revision of ``A''), and finally link the new
 tree to the history as revision 3:

 @example
 @group
        ______________________________________________________
       |___1_______2________3________4________5_________6_____...
           |        \        \_________________
           |         \__________               \
        ___|_____             __\_____        __\_____
       |D        |           |D       |      |D       |
       |         |           |        |      |        |
       |   A     |           |   A    |      |   A    |
       |    \    |           |    \   |      |    \   |
       |   B \   |           |   B \  |      |   B \  |
       |__/___\__|           |__/___\_|      |__/___\_|
         /  ___________________/_____\_________/     \
        |  / ___\_____________/       \               \
        | / /    \                     \               \
     ___|/_/   ___\____              ___\____      _____\__
    |D      | |D       |            |D       |    |D       |
    |       | |        |            |        |    |        |
    |       | | fish   |            | fish   |    | fish   |
    |_______| |___\____|            |___\____|    |___\____|
                   \                     \             \
                    \                     \             \
                  ___\____              ___\____      ___\____
                 |D       |            |D       |    |D       |
                 |        |            |        |    |        |
                 | tuna   |            | tuna   |    | book   |
                 |___\____|            |___\____|    |_/______|
                      \                     \         /
                       \                     \       /
                     ___\____              ___\____ /
                    |F       |            |F       |
                    |        |            |        |
                    |        |            |        |
                    |________|            |________|

 @end group
 @end example

 For our last example, we'll demonstrate the way ``tags'' and
 ``branches'' are implemented in the repository.

 In a nutshell, they're one and the same thing.  Because nodes are so
 easily shared, we simply create a @emph{new} directory entry that points
 to an existing directory node.  It's an extremely cheap way of copying a
 tree; we call this new entry a @dfn{clone}, or more colloquially, a
 ``cheap copy''.

 Let's go back to our original tree, assuming that we're at revision 6 to
 begin with:

 @example
 @group
        ______________________________________________________
     ...___6_______7________8________9________10_________11_____...
           |
           |
        ___|_____
       |D        |
       |         |
       |   A     |
       |    \    |
       |   B \   |
       |__/___\__|
         /     \
        |       \
        |        \
     ___|___   ___\____
    |D      | |D       |
    |       | |        |
    |       | | fish   |
    |_______| |___\____|
                   \
                    \
                  ___\____
                 |D       |
                 |        |
                 | tuna   |
                 |___\____|
                      \
                       \
                     ___\____
                    |F       |
                    |        |
                    |        |
                    |________|

 @end group
 @end example

 Let's ``tag'' directory A.  To make the clone, we create a new dir_entry
 @b{T} in our root, pointing to A's node:

 @example
 @group
        ______________________________________________________
       |___6_______7________8________9________10_________11_____...
           |        \
           |         \
        ___|_____   __\______
       |D        | |D        |
       |         | |         |
       |   A     | |    A    |
       |    \    | |    |    |
       |   B \   | |  B |  T |
       |__/___\__| |_/__|__|_|
         /     \    /   |  |
        |    ___\__/   /  /
        |   /    \    /  /
     ___|__/   ___\__/_ /
    |D      | |D       |
    |       | |        |
    |       | | fish   |
    |_______| |___\____|
                   \
                    \
                  ___\____
                 |D       |
                 |        |
                 | tuna   |
                 |___\____|
                      \
                       \
                     ___\____
                    |F       |
                    |        |
                    |        |
                    |________|

 @end group
 @end example

 Now we're all set.  In the future, the contents of directories A and B
 may change quite a lot.  However, assuming we never make any changes
 to directory T, it will @emph{always} point to a particular pristine
 revision of directory A at some point in time.  Thus, T is a tag.

 (In theory, we can use some kind of authorization system to prevent
 anyone from writing to directory T.  In practice, a well-laid out
 repository should encourage ``tag directories'' to live in one place,
 so that it's clear to all users that they're not meant to change.)

 However, if we @emph{do} decide to allow commits in directory T, and
 now our repository tree increments to revision 8, then T becomes a
 branch.  Specifically, it's a branch of directory A which shares
 history with A up to a certain point, and then ``broke off'' from the
 main line at revision 8.

 @node Diffy Storage
 @subsubsection Diffy Storage

 You may have been thinking, ``Gee, this bubble up method seems nice, but
 it sure wastes a lot of space.  Every commit to the repository creates
 an entire line of new directory nodes!''

 Like many other revision control systems, Subversion stores changes as
 differences.  It doesn't make complete copies of nodes; instead, it
 stores the @emph{latest} revision as a full text, and previous revisions
 as a succession of reverse diffs (the word "diff" is used loosely here
 -- for files, it means vdeltas, for directories, it means a format that
 expresses changes to directories).


 @c -----------------------
 @node Implementation
 @subsection Implementation

 For the initial release of Subversion,

 @itemize @bullet
 @item
 The filesystem will be implemented as a library on Unix.
 @item
 The filesystem's data will probably be stored in a collection of .db
 files, using the Berkeley Database library.@footnote{In the future, of
 course, contributors are free modify the Subversion filesystem to
 operate with more powerful SQL database.}  (For more information, see
 @uref{http://www.sleepycat.com, Sleepycat Software}.)
 @end itemize


 @c ----------------------------------------------------------------

 @node Repository Library
 @section Repository Library


 @c Jimb, Karl:  Maybe we should turn this into a discussion about how
 @c the filesystem will use non-historical properties for internal ACLs,
 @c and how people can add "external" ACL systems via historical
 @c properties...?

 A subversion @dfn{repository} is a directory that contains a number of
 components:

 @itemize @bullet
 @item a versioned filesystem (typically a collection of .db files)
 @item some hook scripts (for executing before or after commits)
 @item a locking area (used by Berkeley DB or other processes)
 @item a configuration area (for changing global behaviors)
 @end itemize

 The Subversion filesystem is just that: a filesystem.  But it's also
 useful to provide an API that acts at the level of the repository.
 The repository library (@file{libsvn_repos}) does this.

 In particular, it wraps a few @file{libsvn_fs} routines, such as those
 for beginning and ending commits, so that hook-scripts can run.  A
 pre-commit-hook script might check for a valid log message, and a
 post-commit-hook script might send an email to a mailing list.

 Additionally, the repository library provides convenience routines for
 examining and manipulating the filesystem.  For example, a routine to
 generate a tree-delta by comparing two revisions, routines for
 constructing new transactions, routines for querying log messages, and
 routines for exporting and importing filesystem data.
	@node Server
	@chapter Server

	The term ``server'' is ambiguous, because it has at least two different
	meanings: it can refer to a powerful computer which offers services to
	users on a network, or it can refer to a CPU process designed to receive
	network requests.

	In Subversion, however, the @dfn{server} is just a set of libraries that
	implements @dfn{repositories} and makes them available to other
	programs. No networking is required.

	There are two main libraries: the @dfn{Subversion Filesystem} library,
	and the @dfn{Subversion Repository} library.

	@menu
	* Filesystem:: The Subversion Filesystem.
	* Repository Library:: The Subversion Repository interface.
	@end menu


	@c ----------------------------------------------------------------

	@node Filesystem
	@section Filesystem

	@menu
	* Filesystem Overview::
	* API::
	* Repository Structure::
	* Implementation::
	@end menu

	@node Filesystem Overview
	@subsection Filesystem Overview

	@itemize @bullet
	@item
	@b{Requires:}
	@itemize @minus
	@item
	some writable disk space
	@item
	(for now) Berkeley DB library
	@end itemize
	@item
	@b{Provides:}
	@itemize @minus
	@item
	a repository for storing files
	@item
	concurrent client transactions
	@item
	enforcement of user & group permissions [someday, not yet]
	@end itemize
	@end itemize

	This library implements a hierarchical filesystem which supports atomic
	changes to directory trees, and records a complete history of the
	changes. In addition to recording changes to file and directory
	contents, the Subversion Filesystem records changes to file meta-data
	(see discussion of @dfn{properties} in @ref{Model}).

	@node API
	@subsection API

	There are two main files that describe the Subversion filesystem.

	First, read the section below (@xref{Repository Structure}.) for a
	general overview of how the filesystem works.

	Once you've done this, read Jim Blandy's own structural overview, which
	explains how nodes and revisions are organized (among other things) in
	the filesystem implementation: @file{subversion/libsvn_fs/structure}.

	Finally, read the well-documented API in
	@file{subversion/include/svn_fs.h}.


	@c ------------------------------------
	@node Repository Structure
	@subsection Repository Structure

	@menu
	* Schema::
	* Bubble-Up Method::
	* Diffy Storage::
	@end menu

	@node Schema
	@subsubsection Schema

	To begin, please be sure that you're already casually familiar with
	Subversion's ideas of files, directories, and revision histories. If
	not, @xref{Model}. We can now offer precise, technical descriptions of
	the terms introduced there.

	@c This is taken from jimb's very first Subversion spec!

	@quotation

	A @dfn{text string} is a string of Unicode characters which is
	canonically decomposed and ordered, according to the rules described in
	the Unicode standard.

	A @dfn{string of bytes} is what you'd expect.

	A @dfn{property list} is an unordered list of properties. A
	@dfn{property} is a pair @code{(@var{name}, @var{value})}, where
	@var{name} is a text string, and @var{value} is a string of bytes.
	No two properties in a property list have the same name.

	A @dfn{file} is a property list and a string of bytes.

	A @dfn{node} is either a file or a directory. (We define a directory
	below.) Nodes are distinguished unions --- you can always tell whether
	a node is a file or a directory.

	A @dfn{node table} is an array mapping some set of positive integers,
	called @dfn{node numbers}, onto @dfn{nodes}. If a node table maps some
	number @var{i} to some node @var{n}, then @var{i} is a @dfn{valid node
	number} in that table, and @dfn{node @var{i}} is @var{n}. Otherwise,
	@var{i} is an @dfn{invalid node number} in that table.

	A @dfn{directory entry} is a triple @code{(@var{name}, @var{props},
	@var{node})}, where @var{name} is a text string, @var{props} is a
	property list, and @var{node} is a node number.

	A @dfn{directory} is an unordered list of directory entries, and a
	property list.

	A @dfn{revision} is a node number and a property list.

	A @dfn{history} is an array of revisions, indexed by a contiguous range
	of non-negative integers containing 0.

	A @dfn{repository} consists of node table and a history.

	@end quotation

	@c Some definitions: we say that a node @var{n} is a @dfn{direct child}
	@c of a directory @var{d} iff @var{d} contains a directory entry whose
	@c node number is @var{n}. A node @var{n} is a @dfn{child} of a
	@c directory @var{d} iff @var{n} is a direct child of @var{d}, or if
	@c there exists some directory @var{e} which is a direct child of
	@c @var{d}, and @var{n} is a child of @var{e}. Given this definition of
	@c ``direct child'' and ``child,'' the obvious definitions of ``direct
	@c parent'' and ``parent'' hold.

	@c In these restrictions, let @var{r} be any repository. When we refer,
	@c implicitly or explicitly, to a node table without further clarification,
	@c we mean @var{r}'s node table. Thus, if we refer to ``a valid node
	@c number'' without specifying the node table in which it is valid, we mean
	@c ``a valid node number in @var{r}'s node table''. Similarly for
	@c @var{r}'s history.

	Now that we've explained the form of the data, we make some restrictions
	on that form.

	@b{Every revision has a root directory.} Every revision's node number is
	a valid node number, and the node it refers to is always a directory.
	We call this the revision's @dfn{root directory}.

	@b{Revision 0 always contains an empty root directory.} This baseline
	makes it easy to check out whole projects from the repository.

	@b{Directories contain only valid links.}
	Every directory entry's @var{node} is a valid node number.

	@b{Directory entries can be identified by name.}
	For any directory @var{d}, every directory entry in @var{d} has a
	distinct name.

	@b{There are no cycles of directories.} No node is its own child.

	@b{Directories can have more than one parent.} The Unix file system
	does not allow more than one hard link to a directory, but Subversion
	does allow the analogous situation. Thus, the directories in a
	Subversion repository form a directed acyclic graph (@dfn{DAG}), not a
	tree. However, it would be distracting and unhelpful to replace the
	familiar term ``directory tree'' with the unfamiliar term ``directory
	DAG'', so we still call it a ``directory tree'' here.

	@b{There are no dead nodes.} Every node is a child of some revision's
	root directory.

	@c </jimb>


	@node Bubble-Up Method
	@subsubsection Bubble-Up Method

	This section provides a conversational explanation of how the repository
	actually stores and revisions file trees. It's not critical knowledge
	for a programmer using the Subversion Filesystem API, but most people
	probably still want to know what's going on ``under the hood'' of the
	repository.

	Suppose we have a new project, at revision 1, looking like this (using
	CVS syntax):

	@example
	prompt$ svn checkout myproj
	U myproj/
	U myproj/B
	U myproj/A
	U myproj/A/fish
	U myproj/A/fish/tuna
	prompt$
	@end example

	Only the file @file{tuna} is a regular file, everything else in myproj is
	a directory.

	Let's see what this looks like as an abstract data structure in the
	repository, and how that structure works in various operations (such
	as update, commit, and branch).

	In the diagrams that follow, lines represent parent-to-child connections
	in a directory hierarchy. Boxes are "nodes". A node is either a file
	or a directory -- a letter in the upper left indicates which kind. A
	file node has a byte-string for its content, whereas directory nodes
	have a list of dir_entries, each pointing to another node.

	Parent-child links go both ways (i.e., a child knows who all its parents
	are), but a node's name is stored only in its parent, because a node
	with multiple parents may have different names in different parents.

	At the top of the repository is an array of revision numbers,
	stretching off to infinity. Since the project is at revision 1, only
	index 1 points to anything; it points to the root node of revision 1 of
	the project:

	@example
	@group
	( myproj's revision array )
	______________________________________________________
	\|___1_______2________3________4________5_________6_____...
	\|
	\|
	___\|_____
	\|D \|
	\| \|
	\| A \| /* Two dir_entries, `A' and `B'. */
	\| \ \|
	\| B \ \|
	\|__/___\__\|
	/ \
	\| \
	\| \
	___\|___ ___\____
	\|D \| \|D \|
	\| \| \| \|
	\| \| \| fish \| /* One dir_entry, `fish'. */
	\|_______\| \|___\____\|
	\
	\
	___\____
	\|D \|
	\| \|
	\| tuna \| /* One dir_entry, `tuna'. */
	\|___\____\|
	\
	\
	___\____
	\|F \|
	\| \|
	\| \| /* (Contents of tuna not shown.) */
	\|________\|

	@end group
	@end example

	What happens when we modify @file{tuna} and commit? First, we make a
	new @file{tuna} node, containing the latest text. The new node is not
	connected to anything yet, it's just hanging out there in space:

	@example
	@group
	________
	\|F \|
	\| \|
	\| \|
	\|________\|
	@end group
	@end example

	Next, we create a @emph{new} revision of its parent directory:

	@example
	@group
	________
	\|D \|
	\| \|
	\| tuna \|
	\|___\____\|
	\
	\
	___\____
	\|F \|
	\| \|
	\| \|
	\|________\|
	@end group
	@end example

	We continue up the line, creating a new revision of the next parent
	directory:

	@example
	@group
	________
	\|D \|
	\| \|
	\| fish \|
	\|___\____\|
	\
	\
	___\____
	\|D \|
	\| \|
	\| tuna \|
	\|___\____\|
	\
	\
	___\____
	\|F \|
	\| \|
	\| \|
	\|________\|
	@end group
	@end example

	Now it gets more tricky: we need to create a new revision of the root
	directory. This new root directory needs an entry to point to the
	``new'' directory A, but directory B hasn't changed at all. Therefore,
	our new root directory also has an entry that still points to the
	@emph{old} directory B node!

	@example
	@group
	______________________________________________________
	\|___1_______2________3________4________5_________6_____...
	\|
	\|
	___\|_____ ________
	\|D \| \|D \|
	\| \| \| \|
	\| A \| \| A \|
	\| \ \| \| \ \|
	\| B \ \| \| B \ \|
	\|__/___\__\| \|__/___\_\|
	/ \ / \
	\| ___\_____________/ \
	\| / \ \
	___\|__/ ___\____ ___\____
	\|D \| \|D \| \|D \|
	\| \| \| \| \| \|
	\| \| \| fish \| \| fish \|
	\|_______\| \|___\____\| \|___\____\|
	\ \
	\ \
	___\____ ___\____
	\|D \| \|D \|
	\| \| \| \|
	\| tuna \| \| tuna \|
	\|___\____\| \|___\____\|
	\ \
	\ \
	___\____ ___\____
	\|F \| \|F \|
	\| \| \| \|
	\| \| \| \|
	\|________\| \|________\|

	@end group
	@end example

	Finally, after all our new nodes are written, we finish the ``bubble
	up'' process by linking this new tree to the next available revision in
	the history array. In this case, the new tree becomes revision 2 in the
	repository.

	@example
	@group
	______________________________________________________
	\|___1_______2________3________4________5_________6_____...
	\| \
	\| \__________
	___\|_____ __\_____
	\|D \| \|D \|
	\| \| \| \|
	\| A \| \| A \|
	\| \ \| \| \ \|
	\| B \ \| \| B \ \|
	\|__/___\__\| \|__/___\_\|
	/ \ / \
	\| ___\_____________/ \
	\| / \ \
	___\|__/ ___\____ ___\____
	\|D \| \|D \| \|D \|
	\| \| \| \| \| \|
	\| \| \| fish \| \| fish \|
	\|_______\| \|___\____\| \|___\____\|
	\ \
	\ \
	___\____ ___\____
	\|D \| \|D \|
	\| \| \| \|
	\| tuna \| \| tuna \|
	\|___\____\| \|___\____\|
	\ \
	\ \
	___\____ ___\____
	\|F \| \|F \|
	\| \| \| \|
	\| \| \| \|
	\|________\| \|________\|

	@end group
	@end example

	Generalizing on this example, you can now see that each ``revision'' in
	the repository history represents a root node of a unique tree (and an
	atomic commit to the whole filesystem.) There are many trees in the
	repository, and many of them share nodes.

	Many nice behaviors come from this model:

	@enumerate
	@item
	@b{Easy reads.} If a filesystem reader wants to locate revision
	@var{X} of file @file{foo.c}, it need only traverse the repository's
	history, locate revision @var{X}'s root node, then walk down the tree to
	@file{foo.c}.
	@item
	@b{Writers don't interfere with readers.} Writers can continue to
	create new nodes, bubbling their way up to the top, and concurrent
	readers cannot see the work in progress. The new tree only becomes
	visible to readers after the writer makes its final ``link'' to the
	repository's history.
	@item
	@b{File structure is versioned.} Unlike CVS, the very structure of
	each tree is being saved from revision to revision. File and directory
	renames, additions, and deletions are part of the repository's history.
	@end enumerate

	Let's demonstrate the last point by renaming the @file{tuna} to
	@file{book}.

	We start by creating a new parent ``fish'' directory, except that this
	parent directory has a different dir_entry, one which points the
	@emph{same} old file node, but has a different name:

	@example
	@group
	______________________________________________________
	\|___1_______2________3________4________5_________6_____...
	\| \
	\| \__________
	___\|_____ __\_____
	\|D \| \|D \|
	\| \| \| \|
	\| A \| \| A \|
	\| \ \| \| \ \|
	\| B \ \| \| B \ \|
	\|__/___\__\| \|__/___\_\|
	/ \ / \
	\| ___\_____________/ \
	\| / \ \
	___\|__/ ___\____ ___\____
	\|D \| \|D \| \|D \|
	\| \| \| \| \| \|
	\| \| \| fish \| \| fish \|
	\|_______\| \|___\____\| \|___\____\|
	\ \
	\ \
	___\____ ___\____ ________
	\|D \| \|D \| \|D \|
	\| \| \| \| \| \|
	\| tuna \| \| tuna \| \| book \|
	\|___\____\| \|___\____\| \|_/______\|
	\ \ /
	\ \ /
	___\____ ___\____ /
	\|F \| \|F \|
	\| \| \| \|
	\| \| \| \|
	\|________\| \|________\|
	@end group
	@end example

	From here, we finish with the bubble-up process. We make new parent
	directories up to the top, culminating in a new root directory with two
	dir_entries (one points to the old ``B'' directory node we've had all
	along, the other to the new revision of ``A''), and finally link the new
	tree to the history as revision 3:

	@example
	@group
	______________________________________________________
	\|___1_______2________3________4________5_________6_____...
	\| \ \_________________
	\| \__________ \
	___\|_____ __\_____ __\_____
	\|D \| \|D \| \|D \|
	\| \| \| \| \| \|
	\| A \| \| A \| \| A \|
	\| \ \| \| \ \| \| \ \|
	\| B \ \| \| B \ \| \| B \ \|
	\|__/___\__\| \|__/___\_\| \|__/___\_\|
	/ ___________________/_____\_________/ \
	\| / ___\_____________/ \ \
	\| / / \ \ \
	___\|/_/ ___\____ ___\____ _____\__
	\|D \| \|D \| \|D \| \|D \|
	\| \| \| \| \| \| \| \|
	\| \| \| fish \| \| fish \| \| fish \|
	\|_______\| \|___\____\| \|___\____\| \|___\____\|
	\ \ \
	\ \ \
	___\____ ___\____ ___\____
	\|D \| \|D \| \|D \|
	\| \| \| \| \| \|
	\| tuna \| \| tuna \| \| book \|
	\|___\____\| \|___\____\| \|_/______\|
	\ \ /
	\ \ /
	___\____ ___\____ /
	\|F \| \|F \|
	\| \| \| \|
	\| \| \| \|
	\|________\| \|________\|

	@end group
	@end example

	For our last example, we'll demonstrate the way ``tags'' and
	``branches'' are implemented in the repository.

	In a nutshell, they're one and the same thing. Because nodes are so
	easily shared, we simply create a @emph{new} directory entry that points
	to an existing directory node. It's an extremely cheap way of copying a
	tree; we call this new entry a @dfn{clone}, or more colloquially, a
	``cheap copy''.

	Let's go back to our original tree, assuming that we're at revision 6 to
	begin with:

	@example
	@group
	______________________________________________________
	...___6_______7________8________9________10_________11_____...
	\|
	\|
	___\|_____
	\|D \|
	\| \|
	\| A \|
	\| \ \|
	\| B \ \|
	\|__/___\__\|
	/ \
	\| \
	\| \
	___\|___ ___\____
	\|D \| \|D \|
	\| \| \| \|
	\| \| \| fish \|
	\|_______\| \|___\____\|
	\
	\
	___\____
	\|D \|
	\| \|
	\| tuna \|
	\|___\____\|
	\
	\
	___\____
	\|F \|
	\| \|
	\| \|
	\|________\|

	@end group
	@end example

	Let's ``tag'' directory A. To make the clone, we create a new dir_entry
	@b{T} in our root, pointing to A's node:

	@example
	@group
	______________________________________________________
	\|___6_______7________8________9________10_________11_____...
	\| \
	\| \
	___\|_____ __\______
	\|D \| \|D \|
	\| \| \| \|
	\| A \| \| A \|
	\| \ \| \| \| \|
	\| B \ \| \| B \| T \|
	\|__/___\__\| \|_/__\|__\|_\|
	/ \ / \| \|
	\| ___\__/ / /
	\| / \ / /
	___\|__/ ___\__/_ /
	\|D \| \|D \|
	\| \| \| \|
	\| \| \| fish \|
	\|_______\| \|___\____\|
	\
	\
	___\____
	\|D \|
	\| \|
	\| tuna \|
	\|___\____\|
	\
	\
	___\____
	\|F \|
	\| \|
	\| \|
	\|________\|

	@end group
	@end example

	Now we're all set. In the future, the contents of directories A and B
	may change quite a lot. However, assuming we never make any changes
	to directory T, it will @emph{always} point to a particular pristine
	revision of directory A at some point in time. Thus, T is a tag.

	(In theory, we can use some kind of authorization system to prevent
	anyone from writing to directory T. In practice, a well-laid out
	repository should encourage ``tag directories'' to live in one place,
	so that it's clear to all users that they're not meant to change.)

	However, if we @emph{do} decide to allow commits in directory T, and
	now our repository tree increments to revision 8, then T becomes a
	branch. Specifically, it's a branch of directory A which shares
	history with A up to a certain point, and then ``broke off'' from the
	main line at revision 8.

	@node Diffy Storage
	@subsubsection Diffy Storage

	You may have been thinking, ``Gee, this bubble up method seems nice, but
	it sure wastes a lot of space. Every commit to the repository creates
	an entire line of new directory nodes!''

	Like many other revision control systems, Subversion stores changes as
	differences. It doesn't make complete copies of nodes; instead, it
	stores the @emph{latest} revision as a full text, and previous revisions
	as a succession of reverse diffs (the word "diff" is used loosely here
	-- for files, it means vdeltas, for directories, it means a format that
	expresses changes to directories).


	@c -----------------------
	@node Implementation
	@subsection Implementation

	For the initial release of Subversion,

	@itemize @bullet
	@item
	The filesystem will be implemented as a library on Unix.
	@item
	The filesystem's data will probably be stored in a collection of .db
	files, using the Berkeley Database library.@footnote{In the future, of
	course, contributors are free modify the Subversion filesystem to
	operate with more powerful SQL database.} (For more information, see
	@uref{http://www.sleepycat.com, Sleepycat Software}.)
	@end itemize


	@c ----------------------------------------------------------------

	@node Repository Library
	@section Repository Library


	@c Jimb, Karl: Maybe we should turn this into a discussion about how
	@c the filesystem will use non-historical properties for internal ACLs,
	@c and how people can add "external" ACL systems via historical
	@c properties...?

	A subversion @dfn{repository} is a directory that contains a number of
	components:

	@itemize @bullet
	@item a versioned filesystem (typically a collection of .db files)
	@item some hook scripts (for executing before or after commits)
	@item a locking area (used by Berkeley DB or other processes)
	@item a configuration area (for changing global behaviors)
	@end itemize

	The Subversion filesystem is just that: a filesystem. But it's also
	useful to provide an API that acts at the level of the repository.
	The repository library (@file{libsvn_repos}) does this.

	In particular, it wraps a few @file{libsvn_fs} routines, such as those
	for beginning and ending commits, so that hook-scripts can run. A
	pre-commit-hook script might check for a valid log message, and a
	post-commit-hook script might send an email to a mailing list.

	Additionally, the repository library provides convenience routines for
	examining and manipulating the filesystem. For example, a routine to
	generate a tree-delta by comparing two revisions, routines for
	constructing new transactions, routines for querying log messages, and
	routines for exporting and importing filesystem data.