doc/user/manual/dirversioning.texi - subversion - Git at Google

 @node Directory Versioning
 @chapter Directory Versioning


 @quotation
 @emph{The three cardinal virtues of a master technologist are: laziness,
 impatience, and hubris." -- Larry Wall}
 @end quotation


 This section describes some of the pitfalls around the (possibly
 arrogant) notion that one can simply version directories just as one
 versions files.

 @menu
 * Revisions::                   Extending revisions to directories.
 * The Lagging Directory::       When directory revisions fall behind.
 * The Overeager Directory::     When directory revisions jump ahead.
 * User impact::                 How these problems affect the user.
 @end menu


 @c ------------------------------------------------------------------
 @node Revisions
 @section Revisions

 To begin, recall that the Subversion repository is an array of trees.
 Each tree represents the application of a new atomic commit, and is
 called a @dfn{revision}.  This is very different from a CVS repository,
 which stores file histories in a collection of RCS files (and doesn't
 track tree-structure.)

 So when we refer to "revision 4 of foo.c" (written @dfn{foo.c:4}) in
 CVS, this means the fourth distinct version of @file{foo.c} -- but in
 Subversion this means "the version of foo.c in the fourth revision
 (tree)".  It's quite possible that @file{foo.c} has never changed at all
 since revision 1!  In other words, in Subversion, different revision
 numbers of the same versioned item do @emph{not} imply different
 contents.

 Nevertheless, the contents of @samp{foo.c:4} is still well-defined.  The
 file @file{foo.c} in revision 4 has a specific text and properties.

 Suppose, now, that we extend this concept to directories.  If we have a
 directory @file{DIR}, define @dfn{DIR:N} to be "the directory DIR in the
 fourth revision."  The contents are defined to be a particular set of
 directory entries (@dfn{dirents}) and properties.

 So far, so good.  The concept of versioning directories seems fine in
 the repository -- the repository is very theoretically pure anyway.
 However, because working copies allow mixed revisions, it's easy to
 create problematic use-cases.


 @c ------------------------------------------------------------------
 @node The Lagging Directory
 @section The Lagging Directory


 @subsection Problem

 @c This is the first part of of the "Greg Hudson" problem, so named
 @c because he was the first one to bring it up and define it well.  :-)

 Suppose our working copy has directory @samp{DIR:1} containing file
 @samp{foo:1}, along with some other files.  We remove @file{foo} and
 commit.

 Already, we have a problem: our working copy still claims to have
 @samp{DIR:1}.  But on the repository, revision 1 of DIR is
 @emph{defined} to contain @samp{foo} -- and our working copy DIR clearly
 does not have it anymore.  How can we truthfully say that we still have
 @samp{DIR:1}?

 One answer is to force DIR to be updated when we commit foo's deletion.
 Assuming that our commit created revision 2, we would immediately update
 our working copy to @samp{DIR:2}.  Then the client and server would both
 agree that @samp{DIR:2} does not contain foo, and that @samp{DIR:2} is
 indeed exactly what is in the working copy.

 This solution has nasty, un-user-friendly side effects, though.  It's
 likely that other people may have committed before us, possibly adding
 new properties to DIR, or adding a new file @file{bar}.  Now pretend our
 committed deletion creates revision 5 in the repository.  If we
 instantly update our local DIR to 5, that means unexpectedly receiving a
 copy of @file{bar} and some new propchanges.  This clearly violates a UI
 principle: "the client will never change your working copy until you ask
 it to."  Committing changes to the repository is a server-write
 operation only; it should @emph{not} modify your working data!

 Another solution is to do the naive thing:  after committing the
 deletion of @file{foo}, simply stop tracking the file in the @file{.svn}
 administrative directory.  The client then loses all knowledge of the
 file.

 But this doesn't work either: if we now update our working copy, the
 communication between client and server is incorrect.  The client still
 believes that it has @samp{DIR:1} -- which is false, since a "true"
 @samp{DIR:1} contains @file{foo}.  The client gives this incorrect
 report to the repository, and the repository decides that in order to
 update to revision 2, @file{foo} must be deleted.  Thus the repository
 sends a bogus (or at least unnecessary) deletion command.


 @subsection Solution

 This problem is solved through tricky administrative tracking in the
 client.

 After deleting @file{foo} and committing, the file is @emph{not}
 totally forgotten by the @file{.svn} directory.  While the file is no
 longer considered to be under revision control, it is still secretly
 remembered as having been `deleted'.

 When the user updates the working copy, the client correctly informs the
 server that the file is already missing from its local @samp{DIR:1};
 therefore the repository doesn't try to re-delete it when patching the
 client up to revision 2.

 @c Notes, for coders, about how the `deleted' flag works under the hood:

 @c   * the `svn status' command won't display a deleted item, unless
 @c     you make the deleted item the specific target of status.
 @c
 @c   * when a deleted item's parent is updated, one of two things will happen:
 @c
 @c       (1) the repository will re-add the item, thereby overwriting
 @c           the entire entry.  (no more `deleted' flag)
 @c
 @c       (2) the repository will say nothing about the item, which means
 @c           that it's fully aware that your item is gone, and this is
 @c           the correct state to be in.  In this case, the entire entry
 @c           is removed.  (no more `deleted' flag)
 @c
 @c   * if a user schedules an item for addition that has the same name
 @c     as a `deleted' entry, then entry will have both flags
 @c     simultaneously.  This is perfectly fine:
 @c
 @c         * the commit-crawler will notice both flags and do a delete()
 @c           and then an add().  This ensures that the transaction is
 @c           built correctly. (without the delete(), the add() would be
 @c           on top of an already-existing  item.)
 @c
 @c         * when the commit completes, the client rewrites the entry as
 @c           normal.  (no more `deleted' flag)


 @c ------------------------------------------------------------------
 @node The Overeager Directory
 @section The Overeager Directory


 @c This is the 2nd part of the "Greg Hudson" problem.

 @subsection Problem

 Again, suppose our working copy has directory @samp{DIR:1} containing
 file @samp{foo:1}, along with some other files.

 Now, unbeknownst to us, somebody else adds a new file @file{bar} to this
 directory, creating revision 2 (and @samp{DIR:2}).

 Now we add a property to @file{DIR} and commit, which creates revision
 3.  Our working-copy @file{DIR} is now marked as being at revision 3.

 Of course, this is false; our working copy does @emph{not} have
 @samp{DIR:3}, because the "true" @samp{DIR:3} on the repository contains
 the new file @file{bar}.  Our working copy has no knowledge of
 @file{bar} at all.

 Again, we can't follow our commit of @file{DIR} with an automatic update
 (and addition of @file{bar}).  As mentioned previously, commits are a
 one-way write operation; they must not change working copy data.


 @subsection Solution

 Let's enumerate exactly those times when a directory's local revision
 number changes:

 @itemize @bullet
 @item
 @b{when a directory is updated}:  if the directory is either the direct
 target of an update command, or is a child of an updated directory, it
 will be bumped (along with many other siblings and children) to a
 uniform revision number.
 @item
 @b{when a directory is committed}: a directory can only be considered a
 "committed object" if it has a new property change.  (Otherwise, to
 "commit a directory" really implies that its modified children are being
 committed, and only such children will have local revisions bumped.)
 @end itemize

 In this light, it's clear that our "overeager directory" problem only
 happens in the second situation -- those times when we're committing
 directory propchanges.

 Thus the answer is simply not to allow property-commits on directories
 that are out-of-date.  It sounds a bit restrictive, but there's no other
 way to keep directory revisions accurate.

 @c  Note to developers:  this restriction is enforced by the filesystem
 @c  merge() routine.

 @c  Once merge() has established that {ancestor, source, target} are all
 @c  different node-rev-ids, it examines the property-keys of ancestor
 @c  and target.  If they're *different*, it returns a conflict error.


 @c ------------------------------------------------------------------
 @node User impact
 @section User impact


 Really, the Subversion client seems to have two difficult---almost
 contradictory---goals.

 First, it needs to make the user experience friendly, which generally
 means being a bit "sloppy" about deciding what a user can or cannot do.
 This is why it allows mixed-revision working copies, and why it tries to
 let users execute local tree-changing operations (delete, add, move,
 copy) in situations that aren't always perfectly, theoretically "safe"
 or pure.

 Second, the client tries to keep the working copy in correctly in sync
 with the repository using as little communication as possible.  Of
 course, this is made much harder by the first goal!

 So in the end, there's a tension here, and the resolutions to problems
 can vary.  In one case (the "lagging directory"), the problem can be
 solved through secret, complex tracking in the client.  In the other
 case ("the overeager directory"), the only solution is to restrict some
 of the theoretical laxness allowed by the client.
	@node Directory Versioning
	@chapter Directory Versioning



	@quotation
	@emph{The three cardinal virtues of a master technologist are: laziness,
	impatience, and hubris." -- Larry Wall}
	@end quotation



	This section describes some of the pitfalls around the (possibly
	arrogant) notion that one can simply version directories just as one
	versions files.

	@menu
	* Revisions:: Extending revisions to directories.
	* The Lagging Directory:: When directory revisions fall behind.
	* The Overeager Directory:: When directory revisions jump ahead.
	* User impact:: How these problems affect the user.
	@end menu


	@c ------------------------------------------------------------------
	@node Revisions
	@section Revisions

	To begin, recall that the Subversion repository is an array of trees.
	Each tree represents the application of a new atomic commit, and is
	called a @dfn{revision}. This is very different from a CVS repository,
	which stores file histories in a collection of RCS files (and doesn't
	track tree-structure.)

	So when we refer to "revision 4 of foo.c" (written @dfn{foo.c:4}) in
	CVS, this means the fourth distinct version of @file{foo.c} -- but in
	Subversion this means "the version of foo.c in the fourth revision
	(tree)". It's quite possible that @file{foo.c} has never changed at all
	since revision 1! In other words, in Subversion, different revision
	numbers of the same versioned item do @emph{not} imply different
	contents.

	Nevertheless, the contents of @samp{foo.c:4} is still well-defined. The
	file @file{foo.c} in revision 4 has a specific text and properties.

	Suppose, now, that we extend this concept to directories. If we have a
	directory @file{DIR}, define @dfn{DIR:N} to be "the directory DIR in the
	fourth revision." The contents are defined to be a particular set of
	directory entries (@dfn{dirents}) and properties.

	So far, so good. The concept of versioning directories seems fine in
	the repository -- the repository is very theoretically pure anyway.
	However, because working copies allow mixed revisions, it's easy to
	create problematic use-cases.


	@c ------------------------------------------------------------------
	@node The Lagging Directory
	@section The Lagging Directory


	@subsection Problem

	@c This is the first part of of the "Greg Hudson" problem, so named
	@c because he was the first one to bring it up and define it well. :-)

	Suppose our working copy has directory @samp{DIR:1} containing file
	@samp{foo:1}, along with some other files. We remove @file{foo} and
	commit.

	Already, we have a problem: our working copy still claims to have
	@samp{DIR:1}. But on the repository, revision 1 of DIR is
	@emph{defined} to contain @samp{foo} -- and our working copy DIR clearly
	does not have it anymore. How can we truthfully say that we still have
	@samp{DIR:1}?

	One answer is to force DIR to be updated when we commit foo's deletion.
	Assuming that our commit created revision 2, we would immediately update
	our working copy to @samp{DIR:2}. Then the client and server would both
	agree that @samp{DIR:2} does not contain foo, and that @samp{DIR:2} is
	indeed exactly what is in the working copy.

	This solution has nasty, un-user-friendly side effects, though. It's
	likely that other people may have committed before us, possibly adding
	new properties to DIR, or adding a new file @file{bar}. Now pretend our
	committed deletion creates revision 5 in the repository. If we
	instantly update our local DIR to 5, that means unexpectedly receiving a
	copy of @file{bar} and some new propchanges. This clearly violates a UI
	principle: "the client will never change your working copy until you ask
	it to." Committing changes to the repository is a server-write
	operation only; it should @emph{not} modify your working data!

	Another solution is to do the naive thing: after committing the
	deletion of @file{foo}, simply stop tracking the file in the @file{.svn}
	administrative directory. The client then loses all knowledge of the
	file.

	But this doesn't work either: if we now update our working copy, the
	communication between client and server is incorrect. The client still
	believes that it has @samp{DIR:1} -- which is false, since a "true"
	@samp{DIR:1} contains @file{foo}. The client gives this incorrect
	report to the repository, and the repository decides that in order to
	update to revision 2, @file{foo} must be deleted. Thus the repository
	sends a bogus (or at least unnecessary) deletion command.


	@subsection Solution

	This problem is solved through tricky administrative tracking in the
	client.

	After deleting @file{foo} and committing, the file is @emph{not}
	totally forgotten by the @file{.svn} directory. While the file is no
	longer considered to be under revision control, it is still secretly
	remembered as having been `deleted'.

	When the user updates the working copy, the client correctly informs the
	server that the file is already missing from its local @samp{DIR:1};
	therefore the repository doesn't try to re-delete it when patching the
	client up to revision 2.

	@c Notes, for coders, about how the `deleted' flag works under the hood:

	@c * the `svn status' command won't display a deleted item, unless
	@c you make the deleted item the specific target of status.
	@c
	@c * when a deleted item's parent is updated, one of two things will happen:
	@c
	@c (1) the repository will re-add the item, thereby overwriting
	@c the entire entry. (no more `deleted' flag)
	@c
	@c (2) the repository will say nothing about the item, which means
	@c that it's fully aware that your item is gone, and this is
	@c the correct state to be in. In this case, the entire entry
	@c is removed. (no more `deleted' flag)
	@c
	@c * if a user schedules an item for addition that has the same name
	@c as a `deleted' entry, then entry will have both flags
	@c simultaneously. This is perfectly fine:
	@c
	@c * the commit-crawler will notice both flags and do a delete()
	@c and then an add(). This ensures that the transaction is
	@c built correctly. (without the delete(), the add() would be
	@c on top of an already-existing item.)
	@c
	@c * when the commit completes, the client rewrites the entry as
	@c normal. (no more `deleted' flag)


	@c ------------------------------------------------------------------
	@node The Overeager Directory
	@section The Overeager Directory


	@c This is the 2nd part of the "Greg Hudson" problem.

	@subsection Problem

	Again, suppose our working copy has directory @samp{DIR:1} containing
	file @samp{foo:1}, along with some other files.

	Now, unbeknownst to us, somebody else adds a new file @file{bar} to this
	directory, creating revision 2 (and @samp{DIR:2}).

	Now we add a property to @file{DIR} and commit, which creates revision
	3. Our working-copy @file{DIR} is now marked as being at revision 3.

	Of course, this is false; our working copy does @emph{not} have
	@samp{DIR:3}, because the "true" @samp{DIR:3} on the repository contains
	the new file @file{bar}. Our working copy has no knowledge of
	@file{bar} at all.

	Again, we can't follow our commit of @file{DIR} with an automatic update
	(and addition of @file{bar}). As mentioned previously, commits are a
	one-way write operation; they must not change working copy data.


	@subsection Solution

	Let's enumerate exactly those times when a directory's local revision
	number changes:

	@itemize @bullet
	@item
	@b{when a directory is updated}: if the directory is either the direct
	target of an update command, or is a child of an updated directory, it
	will be bumped (along with many other siblings and children) to a
	uniform revision number.
	@item
	@b{when a directory is committed}: a directory can only be considered a
	"committed object" if it has a new property change. (Otherwise, to
	"commit a directory" really implies that its modified children are being
	committed, and only such children will have local revisions bumped.)
	@end itemize

	In this light, it's clear that our "overeager directory" problem only
	happens in the second situation -- those times when we're committing
	directory propchanges.

	Thus the answer is simply not to allow property-commits on directories
	that are out-of-date. It sounds a bit restrictive, but there's no other
	way to keep directory revisions accurate.

	@c Note to developers: this restriction is enforced by the filesystem
	@c merge() routine.

	@c Once merge() has established that {ancestor, source, target} are all
	@c different node-rev-ids, it examines the property-keys of ancestor
	@c and target. If they're different, it returns a conflict error.


	@c ------------------------------------------------------------------
	@node User impact
	@section User impact


	Really, the Subversion client seems to have two difficult---almost
	contradictory---goals.

	First, it needs to make the user experience friendly, which generally
	means being a bit "sloppy" about deciding what a user can or cannot do.
	This is why it allows mixed-revision working copies, and why it tries to
	let users execute local tree-changing operations (delete, add, move,
	copy) in situations that aren't always perfectly, theoretically "safe"
	or pure.

	Second, the client tries to keep the working copy in correctly in sync
	with the repository using as little communication as possible. Of
	course, this is made much harder by the first goal!

	So in the end, there's a tension here, and the resolutions to problems
	can vary. In one case (the "lagging directory"), the problem can be
	solved through secret, complex tracking in the client. In the other
	case ("the overeager directory"), the only solution is to restrict some
	of the theoretical laxness allowed by the client.