| @node Directory Versioning |
| @chapter Directory Versioning |
| |
| |
| |
| @quotation |
| @emph{The three cardinal virtues of a master technologist are: laziness, |
| impatience, and hubris." -- Larry Wall} |
| @end quotation |
| |
| |
| |
| This section describes some of the pitfalls around the (possibly |
| arrogant) notion that one can simply version directories just as one |
| versions files. |
| |
| @menu |
| * Revisions:: Extending revisions to directories. |
| * The Lagging Directory:: When directory revisions fall behind. |
| * The Overeager Directory:: When directory revisions jump ahead. |
| * User impact:: How these problems affect the user. |
| @end menu |
| |
| |
| @c ------------------------------------------------------------------ |
| @node Revisions |
| @section Revisions |
| |
| To begin, recall that the Subversion repository is an array of trees. |
| Each tree represents the application of a new atomic commit, and is |
| called a @dfn{revision}. This is very different from a CVS repository, |
| which stores file histories in a collection of RCS files (and doesn't |
| track tree-structure.) |
| |
| So when we refer to "revision 4 of foo.c" (written @dfn{foo.c:4}) in |
| CVS, this means the fourth distinct version of @file{foo.c} -- but in |
| Subversion this means "the version of foo.c in the fourth revision |
| (tree)". It's quite possible that @file{foo.c} has never changed at all |
| since revision 1! In other words, in Subversion, different revision |
| numbers of the same versioned item do @emph{not} imply different |
| contents. |
| |
| Nevertheless, the contents of @samp{foo.c:4} is still well-defined. The |
| file @file{foo.c} in revision 4 has a specific text and properties. |
| |
| Suppose, now, that we extend this concept to directories. If we have a |
| directory @file{DIR}, define @dfn{DIR:N} to be "the directory DIR in the |
| fourth revision." The contents are defined to be a particular set of |
| directory entries (@dfn{dirents}) and properties. |
| |
| So far, so good. The concept of versioning directories seems fine in |
| the repository -- the repository is very theoretically pure anyway. |
| However, because working copies allow mixed revisions, it's easy to |
| create problematic use-cases. |
| |
| |
| @c ------------------------------------------------------------------ |
| @node The Lagging Directory |
| @section The Lagging Directory |
| |
| |
| @subsection Problem |
| |
| @c This is the first part of of the "Greg Hudson" problem, so named |
| @c because he was the first one to bring it up and define it well. :-) |
| |
| Suppose our working copy has directory @samp{DIR:1} containing file |
| @samp{foo:1}, along with some other files. We remove @file{foo} and |
| commit. |
| |
| Already, we have a problem: our working copy still claims to have |
| @samp{DIR:1}. But on the repository, revision 1 of DIR is |
| @emph{defined} to contain @samp{foo} -- and our working copy DIR clearly |
| does not have it anymore. How can we truthfully say that we still have |
| @samp{DIR:1}? |
| |
| One answer is to force DIR to be updated when we commit foo's deletion. |
| Assuming that our commit created revision 2, we would immediately update |
| our working copy to @samp{DIR:2}. Then the client and server would both |
| agree that @samp{DIR:2} does not contain foo, and that @samp{DIR:2} is |
| indeed exactly what is in the working copy. |
| |
| This solution has nasty, un-user-friendly side effects, though. It's |
| likely that other people may have committed before us, possibly adding |
| new properties to DIR, or adding a new file @file{bar}. Now pretend our |
| committed deletion creates revision 5 in the repository. If we |
| instantly update our local DIR to 5, that means unexpectedly receiving a |
| copy of @file{bar} and some new propchanges. This clearly violates a UI |
| principle: "the client will never change your working copy until you ask |
| it to." Committing changes to the repository is a server-write |
| operation only; it should @emph{not} modify your working data! |
| |
| Another solution is to do the naive thing: after committing the |
| deletion of @file{foo}, simply stop tracking the file in the @file{.svn} |
| administrative directory. The client then loses all knowledge of the |
| file. |
| |
| But this doesn't work either: if we now update our working copy, the |
| communication between client and server is incorrect. The client still |
| believes that it has @samp{DIR:1} -- which is false, since a "true" |
| @samp{DIR:1} contains @file{foo}. The client gives this incorrect |
| report to the repository, and the repository decides that in order to |
| update to revision 2, @file{foo} must be deleted. Thus the repository |
| sends a bogus (or at least unnecessary) deletion command. |
| |
| |
| @subsection Solution |
| |
| This problem is solved through tricky administrative tracking in the |
| client. |
| |
| After deleting @file{foo} and committing, the file is @emph{not} |
| totally forgotten by the @file{.svn} directory. While the file is no |
| longer considered to be under revision control, it is still secretly |
| remembered as having been `deleted'. |
| |
| When the user updates the working copy, the client correctly informs the |
| server that the file is already missing from its local @samp{DIR:1}; |
| therefore the repository doesn't try to re-delete it when patching the |
| client up to revision 2. |
| |
| @c Notes, for coders, about how the `deleted' flag works under the hood: |
| |
| @c * the `svn status' command won't display a deleted item, unless |
| @c you make the deleted item the specific target of status. |
| @c |
| @c * when a deleted item's parent is updated, one of two things will happen: |
| @c |
| @c (1) the repository will re-add the item, thereby overwriting |
| @c the entire entry. (no more `deleted' flag) |
| @c |
| @c (2) the repository will say nothing about the item, which means |
| @c that it's fully aware that your item is gone, and this is |
| @c the correct state to be in. In this case, the entire entry |
| @c is removed. (no more `deleted' flag) |
| @c |
| @c * if a user schedules an item for addition that has the same name |
| @c as a `deleted' entry, then entry will have both flags |
| @c simultaneously. This is perfectly fine: |
| @c |
| @c * the commit-crawler will notice both flags and do a delete() |
| @c and then an add(). This ensures that the transaction is |
| @c built correctly. (without the delete(), the add() would be |
| @c on top of an already-existing item.) |
| @c |
| @c * when the commit completes, the client rewrites the entry as |
| @c normal. (no more `deleted' flag) |
| |
| |
| @c ------------------------------------------------------------------ |
| @node The Overeager Directory |
| @section The Overeager Directory |
| |
| |
| @c This is the 2nd part of the "Greg Hudson" problem. |
| |
| @subsection Problem |
| |
| Again, suppose our working copy has directory @samp{DIR:1} containing |
| file @samp{foo:1}, along with some other files. |
| |
| Now, unbeknownst to us, somebody else adds a new file @file{bar} to this |
| directory, creating revision 2 (and @samp{DIR:2}). |
| |
| Now we add a property to @file{DIR} and commit, which creates revision |
| 3. Our working-copy @file{DIR} is now marked as being at revision 3. |
| |
| Of course, this is false; our working copy does @emph{not} have |
| @samp{DIR:3}, because the "true" @samp{DIR:3} on the repository contains |
| the new file @file{bar}. Our working copy has no knowledge of |
| @file{bar} at all. |
| |
| Again, we can't follow our commit of @file{DIR} with an automatic update |
| (and addition of @file{bar}). As mentioned previously, commits are a |
| one-way write operation; they must not change working copy data. |
| |
| |
| @subsection Solution |
| |
| Let's enumerate exactly those times when a directory's local revision |
| number changes: |
| |
| @itemize @bullet |
| @item |
| @b{when a directory is updated}: if the directory is either the direct |
| target of an update command, or is a child of an updated directory, it |
| will be bumped (along with many other siblings and children) to a |
| uniform revision number. |
| @item |
| @b{when a directory is committed}: a directory can only be considered a |
| "committed object" if it has a new property change. (Otherwise, to |
| "commit a directory" really implies that its modified children are being |
| committed, and only such children will have local revisions bumped.) |
| @end itemize |
| |
| In this light, it's clear that our "overeager directory" problem only |
| happens in the second situation -- those times when we're committing |
| directory propchanges. |
| |
| Thus the answer is simply not to allow property-commits on directories |
| that are out-of-date. It sounds a bit restrictive, but there's no other |
| way to keep directory revisions accurate. |
| |
| @c Note to developers: this restriction is enforced by the filesystem |
| @c merge() routine. |
| |
| @c Once merge() has established that {ancestor, source, target} are all |
| @c different node-rev-ids, it examines the property-keys of ancestor |
| @c and target. If they're *different*, it returns a conflict error. |
| |
| |
| @c ------------------------------------------------------------------ |
| @node User impact |
| @section User impact |
| |
| |
| Really, the Subversion client seems to have two difficult---almost |
| contradictory---goals. |
| |
| First, it needs to make the user experience friendly, which generally |
| means being a bit "sloppy" about deciding what a user can or cannot do. |
| This is why it allows mixed-revision working copies, and why it tries to |
| let users execute local tree-changing operations (delete, add, move, |
| copy) in situations that aren't always perfectly, theoretically "safe" |
| or pure. |
| |
| Second, the client tries to keep the working copy in correctly in sync |
| with the repository using as little communication as possible. Of |
| course, this is made much harder by the first goal! |
| |
| So in the end, there's a tension here, and the resolutions to problems |
| can vary. In one case (the "lagging directory"), the problem can be |
| solved through secret, complex tracking in the client. In the other |
| case ("the overeager directory"), the only solution is to restrict some |
| of the theoretical laxness allowed by the client. |
| |
| |
| |