| @node Server |
| @chapter Server |
| |
| The term ``server'' is ambiguous, because it has at least two different |
| meanings: it can refer to a powerful computer which offers services to |
| users on a network, or it can refer to a CPU process designed to receive |
| network requests. |
| |
| In Subversion, however, the @dfn{server} is just a set of libraries that |
| implements @dfn{repositories} and makes them available to other |
| programs. No networking is required. |
| |
| There are two main libraries: the @dfn{Subversion Filesystem} library, |
| and the @dfn{Subversion Repository} library. |
| |
| @menu |
| * Filesystem:: The Subversion Filesystem. |
| * Repository Library:: The Subversion Repository interface. |
| @end menu |
| |
| |
| @c ---------------------------------------------------------------- |
| |
| @node Filesystem |
| @section Filesystem |
| |
| @menu |
| * Filesystem Overview:: |
| * API:: |
| * Repository Structure:: |
| * Implementation:: |
| @end menu |
| |
| @node Filesystem Overview |
| @subsection Filesystem Overview |
| |
| @itemize @bullet |
| @item |
| @b{Requires:} |
| @itemize @minus |
| @item |
| some writable disk space |
| @item |
| (for now) Berkeley DB library |
| @end itemize |
| @item |
| @b{Provides:} |
| @itemize @minus |
| @item |
| a repository for storing files |
| @item |
| concurrent client transactions |
| @item |
| enforcement of user & group permissions [someday, not yet] |
| @end itemize |
| @end itemize |
| |
| This library implements a hierarchical filesystem which supports atomic |
| changes to directory trees, and records a complete history of the |
| changes. In addition to recording changes to file and directory |
| contents, the Subversion Filesystem records changes to file meta-data |
| (see discussion of @dfn{properties} in @ref{Model}). |
| |
| @node API |
| @subsection API |
| |
| There are two main files that describe the Subversion filesystem. |
| |
| First, read the section below (@xref{Repository Structure}.) for a |
| general overview of how the filesystem works. |
| |
| Once you've done this, read Jim Blandy's own structural overview, which |
| explains how nodes and revisions are organized (among other things) in |
| the filesystem implementation: @file{subversion/libsvn_fs/structure}. |
| |
| Finally, read the well-documented API in |
| @file{subversion/include/svn_fs.h}. |
| |
| |
| @c ------------------------------------ |
| @node Repository Structure |
| @subsection Repository Structure |
| |
| @menu |
| * Schema:: |
| * Bubble-Up Method:: |
| * Diffy Storage:: |
| @end menu |
| |
| @node Schema |
| @subsubsection Schema |
| |
| To begin, please be sure that you're already casually familiar with |
| Subversion's ideas of files, directories, and revision histories. If |
| not, @xref{Model}. We can now offer precise, technical descriptions of |
| the terms introduced there. |
| |
| @c This is taken from jimb's very first Subversion spec! |
| |
| @quotation |
| |
| A @dfn{text string} is a string of Unicode characters which is |
| canonically decomposed and ordered, according to the rules described in |
| the Unicode standard. |
| |
| A @dfn{string of bytes} is what you'd expect. |
| |
| A @dfn{property list} is an unordered list of properties. A |
| @dfn{property} is a pair @code{(@var{name}, @var{value})}, where |
| @var{name} is a text string, and @var{value} is a string of bytes. |
| No two properties in a property list have the same name. |
| |
| A @dfn{file} is a property list and a string of bytes. |
| |
| A @dfn{node} is either a file or a directory. (We define a directory |
| below.) Nodes are distinguished unions --- you can always tell whether |
| a node is a file or a directory. |
| |
| A @dfn{node table} is an array mapping some set of positive integers, |
| called @dfn{node numbers}, onto @dfn{nodes}. If a node table maps some |
| number @var{i} to some node @var{n}, then @var{i} is a @dfn{valid node |
| number} in that table, and @dfn{node @var{i}} is @var{n}. Otherwise, |
| @var{i} is an @dfn{invalid node number} in that table. |
| |
| A @dfn{directory entry} is a triple @code{(@var{name}, @var{props}, |
| @var{node})}, where @var{name} is a text string, @var{props} is a |
| property list, and @var{node} is a node number. |
| |
| A @dfn{directory} is an unordered list of directory entries, and a |
| property list. |
| |
| A @dfn{revision} is a node number and a property list. |
| |
| A @dfn{history} is an array of revisions, indexed by a contiguous range |
| of non-negative integers containing 0. |
| |
| A @dfn{repository} consists of node table and a history. |
| |
| @end quotation |
| |
| @c Some definitions: we say that a node @var{n} is a @dfn{direct child} |
| @c of a directory @var{d} iff @var{d} contains a directory entry whose |
| @c node number is @var{n}. A node @var{n} is a @dfn{child} of a |
| @c directory @var{d} iff @var{n} is a direct child of @var{d}, or if |
| @c there exists some directory @var{e} which is a direct child of |
| @c @var{d}, and @var{n} is a child of @var{e}. Given this definition of |
| @c ``direct child'' and ``child,'' the obvious definitions of ``direct |
| @c parent'' and ``parent'' hold. |
| |
| @c In these restrictions, let @var{r} be any repository. When we refer, |
| @c implicitly or explicitly, to a node table without further clarification, |
| @c we mean @var{r}'s node table. Thus, if we refer to ``a valid node |
| @c number'' without specifying the node table in which it is valid, we mean |
| @c ``a valid node number in @var{r}'s node table''. Similarly for |
| @c @var{r}'s history. |
| |
| Now that we've explained the form of the data, we make some restrictions |
| on that form. |
| |
| @b{Every revision has a root directory.} Every revision's node number is |
| a valid node number, and the node it refers to is always a directory. |
| We call this the revision's @dfn{root directory}. |
| |
| @b{Revision 0 always contains an empty root directory.} This baseline |
| makes it easy to check out whole projects from the repository. |
| |
| @b{Directories contain only valid links.} |
| Every directory entry's @var{node} is a valid node number. |
| |
| @b{Directory entries can be identified by name.} |
| For any directory @var{d}, every directory entry in @var{d} has a |
| distinct name. |
| |
| @b{There are no cycles of directories.} No node is its own child. |
| |
| @b{Directories can have more than one parent.} The Unix file system |
| does not allow more than one hard link to a directory, but Subversion |
| does allow the analogous situation. Thus, the directories in a |
| Subversion repository form a directed acyclic graph (@dfn{DAG}), not a |
| tree. However, it would be distracting and unhelpful to replace the |
| familiar term ``directory tree'' with the unfamiliar term ``directory |
| DAG'', so we still call it a ``directory tree'' here. |
| |
| @b{There are no dead nodes.} Every node is a child of some revision's |
| root directory. |
| |
| @c </jimb> |
| |
| |
| @node Bubble-Up Method |
| @subsubsection Bubble-Up Method |
| |
| This section provides a conversational explanation of how the repository |
| actually stores and revisions file trees. It's not critical knowledge |
| for a programmer using the Subversion Filesystem API, but most people |
| probably still want to know what's going on ``under the hood'' of the |
| repository. |
| |
| Suppose we have a new project, at revision 1, looking like this (using |
| CVS syntax): |
| |
| @example |
| prompt$ svn checkout myproj |
| U myproj/ |
| U myproj/B |
| U myproj/A |
| U myproj/A/fish |
| U myproj/A/fish/tuna |
| prompt$ |
| @end example |
| |
| Only the file @file{tuna} is a regular file, everything else in myproj is |
| a directory. |
| |
| Let's see what this looks like as an abstract data structure in the |
| repository, and how that structure works in various operations (such |
| as update, commit, and branch). |
| |
| In the diagrams that follow, lines represent parent-to-child connections |
| in a directory hierarchy. Boxes are "nodes". A node is either a file |
| or a directory -- a letter in the upper left indicates which kind. A |
| file node has a byte-string for its content, whereas directory nodes |
| have a list of dir_entries, each pointing to another node. |
| |
| Parent-child links go both ways (i.e., a child knows who all its parents |
| are), but a node's name is stored only in its parent, because a node |
| with multiple parents may have different names in different parents. |
| |
| At the top of the repository is an array of revision numbers, |
| stretching off to infinity. Since the project is at revision 1, only |
| index 1 points to anything; it points to the root node of revision 1 of |
| the project: |
| |
| @example |
| @group |
| ( myproj's revision array ) |
| ______________________________________________________ |
| |___1_______2________3________4________5_________6_____... |
| | |
| | |
| ___|_____ |
| |D | |
| | | |
| | A | /* Two dir_entries, `A' and `B'. */ |
| | \ | |
| | B \ | |
| |__/___\__| |
| / \ |
| | \ |
| | \ |
| ___|___ ___\____ |
| |D | |D | |
| | | | | |
| | | | fish | /* One dir_entry, `fish'. */ |
| |_______| |___\____| |
| \ |
| \ |
| ___\____ |
| |D | |
| | | |
| | tuna | /* One dir_entry, `tuna'. */ |
| |___\____| |
| \ |
| \ |
| ___\____ |
| |F | |
| | | |
| | | /* (Contents of tuna not shown.) */ |
| |________| |
| |
| @end group |
| @end example |
| |
| What happens when we modify @file{tuna} and commit? First, we make a |
| new @file{tuna} node, containing the latest text. The new node is not |
| connected to anything yet, it's just hanging out there in space: |
| |
| @example |
| @group |
| ________ |
| |F | |
| | | |
| | | |
| |________| |
| @end group |
| @end example |
| |
| Next, we create a @emph{new} revision of its parent directory: |
| |
| @example |
| @group |
| ________ |
| |D | |
| | | |
| | tuna | |
| |___\____| |
| \ |
| \ |
| ___\____ |
| |F | |
| | | |
| | | |
| |________| |
| @end group |
| @end example |
| |
| We continue up the line, creating a new revision of the next parent |
| directory: |
| |
| @example |
| @group |
| ________ |
| |D | |
| | | |
| | fish | |
| |___\____| |
| \ |
| \ |
| ___\____ |
| |D | |
| | | |
| | tuna | |
| |___\____| |
| \ |
| \ |
| ___\____ |
| |F | |
| | | |
| | | |
| |________| |
| @end group |
| @end example |
| |
| Now it gets more tricky: we need to create a new revision of the root |
| directory. This new root directory needs an entry to point to the |
| ``new'' directory A, but directory B hasn't changed at all. Therefore, |
| our new root directory also has an entry that still points to the |
| @emph{old} directory B node! |
| |
| @example |
| @group |
| ______________________________________________________ |
| |___1_______2________3________4________5_________6_____... |
| | |
| | |
| ___|_____ ________ |
| |D | |D | |
| | | | | |
| | A | | A | |
| | \ | | \ | |
| | B \ | | B \ | |
| |__/___\__| |__/___\_| |
| / \ / \ |
| | ___\_____________/ \ |
| | / \ \ |
| ___|__/ ___\____ ___\____ |
| |D | |D | |D | |
| | | | | | | |
| | | | fish | | fish | |
| |_______| |___\____| |___\____| |
| \ \ |
| \ \ |
| ___\____ ___\____ |
| |D | |D | |
| | | | | |
| | tuna | | tuna | |
| |___\____| |___\____| |
| \ \ |
| \ \ |
| ___\____ ___\____ |
| |F | |F | |
| | | | | |
| | | | | |
| |________| |________| |
| |
| @end group |
| @end example |
| |
| Finally, after all our new nodes are written, we finish the ``bubble |
| up'' process by linking this new tree to the next available revision in |
| the history array. In this case, the new tree becomes revision 2 in the |
| repository. |
| |
| @example |
| @group |
| ______________________________________________________ |
| |___1_______2________3________4________5_________6_____... |
| | \ |
| | \__________ |
| ___|_____ __\_____ |
| |D | |D | |
| | | | | |
| | A | | A | |
| | \ | | \ | |
| | B \ | | B \ | |
| |__/___\__| |__/___\_| |
| / \ / \ |
| | ___\_____________/ \ |
| | / \ \ |
| ___|__/ ___\____ ___\____ |
| |D | |D | |D | |
| | | | | | | |
| | | | fish | | fish | |
| |_______| |___\____| |___\____| |
| \ \ |
| \ \ |
| ___\____ ___\____ |
| |D | |D | |
| | | | | |
| | tuna | | tuna | |
| |___\____| |___\____| |
| \ \ |
| \ \ |
| ___\____ ___\____ |
| |F | |F | |
| | | | | |
| | | | | |
| |________| |________| |
| |
| @end group |
| @end example |
| |
| Generalizing on this example, you can now see that each ``revision'' in |
| the repository history represents a root node of a unique tree (and an |
| atomic commit to the whole filesystem.) There are many trees in the |
| repository, and many of them share nodes. |
| |
| Many nice behaviors come from this model: |
| |
| @enumerate |
| @item |
| @b{Easy reads.} If a filesystem reader wants to locate revision |
| @var{X} of file @file{foo.c}, it need only traverse the repository's |
| history, locate revision @var{X}'s root node, then walk down the tree to |
| @file{foo.c}. |
| @item |
| @b{Writers don't interfere with readers.} Writers can continue to |
| create new nodes, bubbling their way up to the top, and concurrent |
| readers cannot see the work in progress. The new tree only becomes |
| visible to readers after the writer makes its final ``link'' to the |
| repository's history. |
| @item |
| @b{File structure is versioned.} Unlike CVS, the very structure of |
| each tree is being saved from revision to revision. File and directory |
| renames, additions, and deletions are part of the repository's history. |
| @end enumerate |
| |
| Let's demonstrate the last point by renaming the @file{tuna} to |
| @file{book}. |
| |
| We start by creating a new parent ``fish'' directory, except that this |
| parent directory has a different dir_entry, one which points the |
| @emph{same} old file node, but has a different name: |
| |
| @example |
| @group |
| ______________________________________________________ |
| |___1_______2________3________4________5_________6_____... |
| | \ |
| | \__________ |
| ___|_____ __\_____ |
| |D | |D | |
| | | | | |
| | A | | A | |
| | \ | | \ | |
| | B \ | | B \ | |
| |__/___\__| |__/___\_| |
| / \ / \ |
| | ___\_____________/ \ |
| | / \ \ |
| ___|__/ ___\____ ___\____ |
| |D | |D | |D | |
| | | | | | | |
| | | | fish | | fish | |
| |_______| |___\____| |___\____| |
| \ \ |
| \ \ |
| ___\____ ___\____ ________ |
| |D | |D | |D | |
| | | | | | | |
| | tuna | | tuna | | book | |
| |___\____| |___\____| |_/______| |
| \ \ / |
| \ \ / |
| ___\____ ___\____ / |
| |F | |F | |
| | | | | |
| | | | | |
| |________| |________| |
| @end group |
| @end example |
| |
| From here, we finish with the bubble-up process. We make new parent |
| directories up to the top, culminating in a new root directory with two |
| dir_entries (one points to the old ``B'' directory node we've had all |
| along, the other to the new revision of ``A''), and finally link the new |
| tree to the history as revision 3: |
| |
| @example |
| @group |
| ______________________________________________________ |
| |___1_______2________3________4________5_________6_____... |
| | \ \_________________ |
| | \__________ \ |
| ___|_____ __\_____ __\_____ |
| |D | |D | |D | |
| | | | | | | |
| | A | | A | | A | |
| | \ | | \ | | \ | |
| | B \ | | B \ | | B \ | |
| |__/___\__| |__/___\_| |__/___\_| |
| / ___________________/_____\_________/ \ |
| | / ___\_____________/ \ \ |
| | / / \ \ \ |
| ___|/_/ ___\____ ___\____ _____\__ |
| |D | |D | |D | |D | |
| | | | | | | | | |
| | | | fish | | fish | | fish | |
| |_______| |___\____| |___\____| |___\____| |
| \ \ \ |
| \ \ \ |
| ___\____ ___\____ ___\____ |
| |D | |D | |D | |
| | | | | | | |
| | tuna | | tuna | | book | |
| |___\____| |___\____| |_/______| |
| \ \ / |
| \ \ / |
| ___\____ ___\____ / |
| |F | |F | |
| | | | | |
| | | | | |
| |________| |________| |
| |
| @end group |
| @end example |
| |
| For our last example, we'll demonstrate the way ``tags'' and |
| ``branches'' are implemented in the repository. |
| |
| In a nutshell, they're one and the same thing. Because nodes are so |
| easily shared, we simply create a @emph{new} directory entry that points |
| to an existing directory node. It's an extremely cheap way of copying a |
| tree; we call this new entry a @dfn{clone}, or more colloquially, a |
| ``cheap copy''. |
| |
| Let's go back to our original tree, assuming that we're at revision 6 to |
| begin with: |
| |
| @example |
| @group |
| ______________________________________________________ |
| ...___6_______7________8________9________10_________11_____... |
| | |
| | |
| ___|_____ |
| |D | |
| | | |
| | A | |
| | \ | |
| | B \ | |
| |__/___\__| |
| / \ |
| | \ |
| | \ |
| ___|___ ___\____ |
| |D | |D | |
| | | | | |
| | | | fish | |
| |_______| |___\____| |
| \ |
| \ |
| ___\____ |
| |D | |
| | | |
| | tuna | |
| |___\____| |
| \ |
| \ |
| ___\____ |
| |F | |
| | | |
| | | |
| |________| |
| |
| @end group |
| @end example |
| |
| Let's ``tag'' directory A. To make the clone, we create a new dir_entry |
| @b{T} in our root, pointing to A's node: |
| |
| @example |
| @group |
| ______________________________________________________ |
| |___6_______7________8________9________10_________11_____... |
| | \ |
| | \ |
| ___|_____ __\______ |
| |D | |D | |
| | | | | |
| | A | | A | |
| | \ | | | | |
| | B \ | | B | T | |
| |__/___\__| |_/__|__|_| |
| / \ / | | |
| | ___\__/ / / |
| | / \ / / |
| ___|__/ ___\__/_ / |
| |D | |D | |
| | | | | |
| | | | fish | |
| |_______| |___\____| |
| \ |
| \ |
| ___\____ |
| |D | |
| | | |
| | tuna | |
| |___\____| |
| \ |
| \ |
| ___\____ |
| |F | |
| | | |
| | | |
| |________| |
| |
| @end group |
| @end example |
| |
| Now we're all set. In the future, the contents of directories A and B |
| may change quite a lot. However, assuming we never make any changes |
| to directory T, it will @emph{always} point to a particular pristine |
| revision of directory A at some point in time. Thus, T is a tag. |
| |
| (In theory, we can use some kind of authorization system to prevent |
| anyone from writing to directory T. In practice, a well-laid out |
| repository should encourage ``tag directories'' to live in one place, |
| so that it's clear to all users that they're not meant to change.) |
| |
| However, if we @emph{do} decide to allow commits in directory T, and |
| now our repository tree increments to revision 8, then T becomes a |
| branch. Specifically, it's a branch of directory A which shares |
| history with A up to a certain point, and then ``broke off'' from the |
| main line at revision 8. |
| |
| @node Diffy Storage |
| @subsubsection Diffy Storage |
| |
| You may have been thinking, ``Gee, this bubble up method seems nice, but |
| it sure wastes a lot of space. Every commit to the repository creates |
| an entire line of new directory nodes!'' |
| |
| Like many other revision control systems, Subversion stores changes as |
| differences. It doesn't make complete copies of nodes; instead, it |
| stores the @emph{latest} revision as a full text, and previous revisions |
| as a succession of reverse diffs (the word "diff" is used loosely here |
| -- for files, it means vdeltas, for directories, it means a format that |
| expresses changes to directories). |
| |
| |
| @c ----------------------- |
| @node Implementation |
| @subsection Implementation |
| |
| For the initial release of Subversion, |
| |
| @itemize @bullet |
| @item |
| The filesystem will be implemented as a library on Unix. |
| @item |
| The filesystem's data will probably be stored in a collection of .db |
| files, using the Berkeley Database library.@footnote{In the future, of |
| course, contributors are free modify the Subversion filesystem to |
| operate with more powerful SQL database.} (For more information, see |
| @uref{http://www.sleepycat.com, Sleepycat Software}.) |
| @end itemize |
| |
| |
| @c ---------------------------------------------------------------- |
| |
| @node Repository Library |
| @section Repository Library |
| |
| |
| @c Jimb, Karl: Maybe we should turn this into a discussion about how |
| @c the filesystem will use non-historical properties for internal ACLs, |
| @c and how people can add "external" ACL systems via historical |
| @c properties...? |
| |
| A subversion @dfn{repository} is a directory that contains a number of |
| components: |
| |
| @itemize @bullet |
| @item a versioned filesystem (typically a collection of .db files) |
| @item some hook scripts (for executing before or after commits) |
| @item a locking area (used by Berkeley DB or other processes) |
| @item a configuration area (for changing global behaviors) |
| @end itemize |
| |
| The Subversion filesystem is just that: a filesystem. But it's also |
| useful to provide an API that acts at the level of the repository. |
| The repository library (@file{libsvn_repos}) does this. |
| |
| In particular, it wraps a few @file{libsvn_fs} routines, such as those |
| for beginning and ending commits, so that hook-scripts can run. A |
| pre-commit-hook script might check for a valid log message, and a |
| post-commit-hook script might send an email to a mailing list. |
| |
| Additionally, the repository library provides convenience routines for |
| examining and manipulating the filesystem. For example, a routine to |
| generate a tree-delta by comparing two revisions, routines for |
| constructing new transactions, routines for querying log messages, and |
| routines for exporting and importing filesystem data. |
| |
| |
| |