blob: ef1fbc4bf5263d95834c57be575b51815d3047d0 [file]
@node Directory Versioning
@chapter Directory Versioning
@quotation
@emph{The three cardinal virtues of a master technologist are: laziness,
impatience, and hubris." -- Larry Wall}
@end quotation
This section describes some of the pitfalls around the (possibly
arrogant) notion that one can simply version directories just as one
versions files.
@menu
* Revisions:: Extending revisions to directories.
* The Lagging Directory:: When directory revisions fall behind.
* The Overeager Directory:: When directory revisions jump ahead.
* User impact:: How these problems affect the user.
@end menu
@c ------------------------------------------------------------------
@node Revisions
@section Revisions
To begin, recall that the Subversion repository is an array of trees.
Each tree represents the application of a new atomic commit, and is
called a @dfn{revision}. This is very different from a CVS repository,
which stores file histories in a collection of RCS files (and doesn't
track tree-structure.)
So when we refer to "revision 4 of foo.c" (written @dfn{foo.c:4}) in
CVS, this means the fourth distinct version of @file{foo.c} -- but in
Subversion this means "the version of foo.c in the fourth revision
(tree)". It's quite possible that @file{foo.c} has never changed at all
since revision 1! In other words, in Subversion, different revision
numbers of the same versioned item do @emph{not} imply different
contents.
Nevertheless, the contents of @samp{foo.c:4} is still well-defined. The
file @file{foo.c} in revision 4 has a specific text and properties.
Suppose, now, that we extend this concept to directories. If we have a
directory @file{DIR}, define @dfn{DIR:N} to be "the directory DIR in the
fourth revision." The contents are defined to be a particular set of
directory entries (@dfn{dirents}) and properties.
So far, so good. The concept of versioning directories seems fine in
the repository -- the repository is very theoretically pure anyway.
However, because working copies allow mixed revisions, it's easy to
create problematic use-cases.
@c ------------------------------------------------------------------
@node The Lagging Directory
@section The Lagging Directory
@subsection Problem
@c This is the first part of of the "Greg Hudson" problem, so named
@c because he was the first one to bring it up and define it well. :-)
Suppose our working copy has directory @samp{DIR:1} containing file
@samp{foo:1}, along with some other files. We remove @file{foo} and
commit.
Already, we have a problem: our working copy still claims to have
@samp{DIR:1}. But on the repository, revision 1 of DIR is
@emph{defined} to contain @samp{foo} -- and our working copy DIR clearly
does not have it anymore. How can we truthfully say that we still have
@samp{DIR:1}?
One answer is to force DIR to be updated when we commit foo's deletion.
Assuming that our commit created revision 2, we would immediately update
our working copy to @samp{DIR:2}. Then the client and server would both
agree that @samp{DIR:2} does not contain foo, and that @samp{DIR:2} is
indeed exactly what is in the working copy.
This solution has nasty, un-user-friendly side effects, though. It's
likely that other people may have committed before us, possibly adding
new properties to DIR, or adding a new file @file{bar}. Now pretend our
committed deletion creates revision 5 in the repository. If we
instantly update our local DIR to 5, that means unexpectedly receiving a
copy of @file{bar} and some new propchanges. This clearly violates a UI
principle: "the client will never change your working copy until you ask
it to." Committing changes to the repository is a server-write
operation only; it should @emph{not} modify your working data!
Another solution is to do the naive thing: after committing the
deletion of @file{foo}, simply stop tracking the file in the @file{.svn}
administrative directory. The client then loses all knowledge of the
file.
But this doesn't work either: if we now update our working copy, the
communication between client and server is incorrect. The client still
believes that it has @samp{DIR:1} -- which is false, since a "true"
@samp{DIR:1} contains @file{foo}. The client gives this incorrect
report to the repository, and the repository decides that in order to
update to revision 2, @file{foo} must be deleted. Thus the repository
sends a bogus (or at least unnecessary) deletion command.
@subsection Solution
This problem is solved through tricky administrative tracking in the
client.
After deleting @file{foo} and committing, the file is @emph{not}
totally forgotten by the @file{.svn} directory. While the file is no
longer considered to be under revision control, it is still secretly
remembered as having been `deleted'.
When the user updates the working copy, the client correctly informs the
server that the file is already missing from its local @samp{DIR:1};
therefore the repository doesn't try to re-delete it when patching the
client up to revision 2.
@c Notes, for coders, about how the `deleted' flag works under the hood:
@c * the `svn status' command won't display a deleted item, unless
@c you make the deleted item the specific target of status.
@c
@c * when a deleted item's parent is updated, one of two things will happen:
@c
@c (1) the repository will re-add the item, thereby overwriting
@c the entire entry. (no more `deleted' flag)
@c
@c (2) the repository will say nothing about the item, which means
@c that it's fully aware that your item is gone, and this is
@c the correct state to be in. In this case, the entire entry
@c is removed. (no more `deleted' flag)
@c
@c * if a user schedules an item for addition that has the same name
@c as a `deleted' entry, then entry will have both flags
@c simultaneously. This is perfectly fine:
@c
@c * the commit-crawler will notice both flags and do a delete()
@c and then an add(). This ensures that the transaction is
@c built correctly. (without the delete(), the add() would be
@c on top of an already-existing item.)
@c
@c * when the commit completes, the client rewrites the entry as
@c normal. (no more `deleted' flag)
@c ------------------------------------------------------------------
@node The Overeager Directory
@section The Overeager Directory
@c This is the 2nd part of the "Greg Hudson" problem.
@subsection Problem
Again, suppose our working copy has directory @samp{DIR:1} containing
file @samp{foo:1}, along with some other files.
Now, unbeknownst to us, somebody else adds a new file @file{bar} to this
directory, creating revision 2 (and @samp{DIR:2}).
Now we add a property to @file{DIR} and commit, which creates revision
3. Our working-copy @file{DIR} is now marked as being at revision 3.
Of course, this is false; our working copy does @emph{not} have
@samp{DIR:3}, because the "true" @samp{DIR:3} on the repository contains
the new file @file{bar}. Our working copy has no knowledge of
@file{bar} at all.
Again, we can't follow our commit of @file{DIR} with an automatic update
(and addition of @file{bar}). As mentioned previously, commits are a
one-way write operation; they must not change working copy data.
@subsection Solution
Let's enumerate exactly those times when a directory's local revision
number changes:
@itemize @bullet
@item
@b{when a directory is updated}: if the directory is either the direct
target of an update command, or is a child of an updated directory, it
will be bumped (along with many other siblings and children) to a
uniform revision number.
@item
@b{when a directory is committed}: a directory can only be considered a
"committed object" if it has a new property change. (Otherwise, to
"commit a directory" really implies that its modified children are being
committed, and only such children will have local revisions bumped.)
@end itemize
In this light, it's clear that our "overeager directory" problem only
happens in the second situation -- those times when we're committing
directory propchanges.
Thus the answer is simply not to allow property-commits on directories
that are out-of-date. It sounds a bit restrictive, but there's no other
way to keep directory revisions accurate.
@c Note to developers: this restriction is enforced by the filesystem
@c merge() routine.
@c Once merge() has established that {ancestor, source, target} are all
@c different node-rev-ids, it examines the property-keys of ancestor
@c and target. If they're *different*, it returns a conflict error.
@c ------------------------------------------------------------------
@node User impact
@section User impact
Really, the Subversion client seems to have two difficult---almost
contradictory---goals.
First, it needs to make the user experience friendly, which generally
means being a bit "sloppy" about deciding what a user can or cannot do.
This is why it allows mixed-revision working copies, and why it tries to
let users execute local tree-changing operations (delete, add, move,
copy) in situations that aren't always perfectly, theoretically "safe"
or pure.
Second, the client tries to keep the working copy in correctly in sync
with the repository using as little communication as possible. Of
course, this is made much harder by the first goal!
So in the end, there's a tension here, and the resolutions to problems
can vary. In one case (the "lagging directory"), the problem can be
solved through secret, complex tracking in the client. In the other
case ("the overeager directory"), the only solution is to restrict some
of the theoretical laxness allowed by the client.