notes/branch_brainstorm.txt - subversion - Git at Google


          ***  Branches and Tags -- Brainstorm ***
                           or
                  M7:  The Final Frontier


 First, let's define what it means to create branches/tags in a
 repository.  Simply put, the repository creates a new dirent which is
 just an "fs_link" to an existent directory node-rev-id.  It's the same
 process that happens when we build transactions.

 That reminds me of an important point; sometimes the user wants to
 make a "true" copy of something (svn_fs_copy), and sometimes the user
 just wants to make a "cheap" copy of something (svn_fs_link).

 [svn_fs_copy]

 * Use case: the user has a file or dir that she wants to duplicate.

    Instead, the user would type 'svn cp foo bar', and bar would
    automatically be scheduled for addition (and have other ancestry
    info recorded).

       * do a system copy of foo to bar, including props and text-base
       * schedule 'foo' for deletion in its original parent
       * schedule 'bar' for addition -- with ancestry -- in a new parent

    The commit would supply "copyfrom" args to the editor, which would
    then run 'svn_fs_copy'.  The new node-rev-id for bar would then
    contain special copy-info in its skel.

    This works as of M6, for files at least.


 [svn_fs_rename]

 * Use case:  the user wants to rename/move a file or dir.

    In M6, this is 'svn mv src dst' is just a wrapper (literally!)
    around 'svn cp src dst; svn rm src'.

    Here's the problem:  why on earth wouldn't the commit-process just
    run 'svn_fs_copy' when it sees the editor's "copyfrom" args on bar?
    We want 'svn_fs_rename' to run instead, which means that the editor
    somehow needs to realize that the deletion and addition are
    connected.

    Cmpilato sez:  svn_fs_rename exists only to "fill out" the
    filesystem API.  But we never expect an svn client to call it;  our
    editor model has already defined that a 'move' is an 'add with
    history and a delete."  So we're fine.


 [svn_fs_link]

 * Use case:  the user wants to create a tag or branch.

    Because the user may not have a large-enough tree checked out, I
    think it's best that the 'svn branch' and 'svn tag' subcommands
    take two URLs as "source" and "dest" arguments.  This even allows
    someone to make tags/branches without a working copy.

    (Eventually, I suppose, we could loosen the restriction and allow
    people to refer to paths in their working copy.  As with 'svn log',
    the wc paths are just converted to repository URLs anyway.)

    So a user might type

     % svn tag http://svn.collab.net/repos/svn/trunk \
               http://svn.collab.net/repos/svn/branches/one

    And this command would cause a simple 'svn_fs_link' in the
    repository.  (### RA mechanism for this? )

    To remove the tag, a user could check out the repository's root and
    just delete the tags/m5 directory and commit.  Or maybe a different
    subcommand could make this an easier task -- one which wouldn't
    require a big working copy.

    [ Nothing could be easier:

        svn remove http://svn.collab.net/repos/svn/branches/one

                                                                --xbc ]


 [dir_delta updates]

 * Use case:  the user wants to move their existing working copy onto a
   tag or branch.

    I guess this is just a plain old update, except that dir_delta
    isn't going to be called with identical path arguments (like it
    usually is.)

    dir_delta notices a relationship between the two paths, and
    therefore won't do anything stupid like delete and re-checkout your
    whole working copy.  Instead, it sends patches as needed, based on
    relationships between node-rev-ids.  (At least, according to
    cmpilato, dir_deltas already *should* behave this way.)

    As usual with updates, local mods will be preserved, merged, or
    come into conflict.  If the user really wanted a perfectly clean
    branch-move, they shouldn't have local mods lying around... or they
    should just checkout the branch directly!

    THE REAL WORK:  instead of running log_do_committed during the update,
    the editor should be running a *new* log command that tweaks the
    'ancestry' URLs in the entries files.  The entries files need to
    point to the new branch URL.  [Ben sez:  this same URL-tweaking
    functionality is needed when running 'svn cp dir1 dir2'.]


 [MERGING]

    Assume that our working copy corresponds to /branches/one in the
    repository.   /branches/one started out as a cheap copy of /trunk
    some weeks ago, but both directories have received commits since
    then.

    Now we want to absorb any new changes from /trunk into our working
    copy branch.

    Here's what needs to happen:

      * somebody calls dir_delta with two special arguments.  The first
        argument is the common ancestor of /branches/one and /trunk; in
        other words: "/branches/one in the revision where it first came
        into existence".  The second argument is "/trunk in the head
        revision."

      * Now the changes start coming down to the working copy.
        Unfortunately, the working copy can't just use its vanilla
        update editor to apply them.  Why not?  Because dir_deltas
        thinks that the working copy is different than it really is --
        it thinks the working copy looks like the very origin of
        /branches/one!

        Therefore, one of two things need to happen:

          * dir_delta is told to send fulltext (because any svndiff
            patches *wouldn't* correctly apply to pristine files!)  The
            client then does a diff between the fulltext and the
            pristine file, and applies that patch to the working file.
            The fulltext is then deleted, and the pristine copy is NOT
            touched!  (Otherwise our next commit would be screwy!)

          * dir_delta sends a flexible, *contextual* diffs against the
            head of /trunk.  We then apply these diffs directly to the
            working files (again, leaving the pristine files
            untouched.)


 [GENETIC MERGING]

     In theory, every time we do a merge, we should *store* the second
     argument to dir_delta in the working copy somewhere.  In other
     words, we remember exactly the "tip" of trunk from which changes
     were merged.

     Now, the next time you merge, this "tip" becomes the *first*
     argument (base) to dir_delta.  This prevents you from re-merging
     patches you already have, and solves a major CVS annoyance.

     Really, I'm not sure we ever need to track each merged changeset
     one by one.  That might be unnecessary here.

	* Branches and Tags -- Brainstorm *
	or
	M7: The Final Frontier


	First, let's define what it means to create branches/tags in a
	repository. Simply put, the repository creates a new dirent which is
	just an "fs_link" to an existent directory node-rev-id. It's the same
	process that happens when we build transactions.

	That reminds me of an important point; sometimes the user wants to
	make a "true" copy of something (svn_fs_copy), and sometimes the user
	just wants to make a "cheap" copy of something (svn_fs_link).

	[svn_fs_copy]

	* Use case: the user has a file or dir that she wants to duplicate.

	Instead, the user would type 'svn cp foo bar', and bar would
	automatically be scheduled for addition (and have other ancestry
	info recorded).

	* do a system copy of foo to bar, including props and text-base
	* schedule 'foo' for deletion in its original parent
	* schedule 'bar' for addition -- with ancestry -- in a new parent

	The commit would supply "copyfrom" args to the editor, which would
	then run 'svn_fs_copy'. The new node-rev-id for bar would then
	contain special copy-info in its skel.

	This works as of M6, for files at least.


	[svn_fs_rename]

	* Use case: the user wants to rename/move a file or dir.

	In M6, this is 'svn mv src dst' is just a wrapper (literally!)
	around 'svn cp src dst; svn rm src'.

	Here's the problem: why on earth wouldn't the commit-process just
	run 'svn_fs_copy' when it sees the editor's "copyfrom" args on bar?
	We want 'svn_fs_rename' to run instead, which means that the editor
	somehow needs to realize that the deletion and addition are
	connected.

	Cmpilato sez: svn_fs_rename exists only to "fill out" the
	filesystem API. But we never expect an svn client to call it; our
	editor model has already defined that a 'move' is an 'add with
	history and a delete." So we're fine.


	[svn_fs_link]

	* Use case: the user wants to create a tag or branch.

	Because the user may not have a large-enough tree checked out, I
	think it's best that the 'svn branch' and 'svn tag' subcommands
	take two URLs as "source" and "dest" arguments. This even allows
	someone to make tags/branches without a working copy.

	(Eventually, I suppose, we could loosen the restriction and allow
	people to refer to paths in their working copy. As with 'svn log',
	the wc paths are just converted to repository URLs anyway.)

	So a user might type

	% svn tag http://svn.collab.net/repos/svn/trunk \
	http://svn.collab.net/repos/svn/branches/one

	And this command would cause a simple 'svn_fs_link' in the
	repository. (### RA mechanism for this? )

	To remove the tag, a user could check out the repository's root and
	just delete the tags/m5 directory and commit. Or maybe a different
	subcommand could make this an easier task -- one which wouldn't
	require a big working copy.

	[ Nothing could be easier:

	svn remove http://svn.collab.net/repos/svn/branches/one

	--xbc ]


	[dir_delta updates]

	* Use case: the user wants to move their existing working copy onto a
	tag or branch.

	I guess this is just a plain old update, except that dir_delta
	isn't going to be called with identical path arguments (like it
	usually is.)

	dir_delta notices a relationship between the two paths, and
	therefore won't do anything stupid like delete and re-checkout your
	whole working copy. Instead, it sends patches as needed, based on
	relationships between node-rev-ids. (At least, according to
	cmpilato, dir_deltas already should behave this way.)

	As usual with updates, local mods will be preserved, merged, or
	come into conflict. If the user really wanted a perfectly clean
	branch-move, they shouldn't have local mods lying around... or they
	should just checkout the branch directly!

	THE REAL WORK: instead of running log_do_committed during the update,
	the editor should be running a new log command that tweaks the
	'ancestry' URLs in the entries files. The entries files need to
	point to the new branch URL. [Ben sez: this same URL-tweaking
	functionality is needed when running 'svn cp dir1 dir2'.]


	[MERGING]

	Assume that our working copy corresponds to /branches/one in the
	repository. /branches/one started out as a cheap copy of /trunk
	some weeks ago, but both directories have received commits since
	then.

	Now we want to absorb any new changes from /trunk into our working
	copy branch.

	Here's what needs to happen:

	* somebody calls dir_delta with two special arguments. The first
	argument is the common ancestor of /branches/one and /trunk; in
	other words: "/branches/one in the revision where it first came
	into existence". The second argument is "/trunk in the head
	revision."

	* Now the changes start coming down to the working copy.
	Unfortunately, the working copy can't just use its vanilla
	update editor to apply them. Why not? Because dir_deltas
	thinks that the working copy is different than it really is --
	it thinks the working copy looks like the very origin of
	/branches/one!

	Therefore, one of two things need to happen:

	* dir_delta is told to send fulltext (because any svndiff
	patches wouldn't correctly apply to pristine files!) The
	client then does a diff between the fulltext and the
	pristine file, and applies that patch to the working file.
	The fulltext is then deleted, and the pristine copy is NOT
	touched! (Otherwise our next commit would be screwy!)

	* dir_delta sends a flexible, contextual diffs against the
	head of /trunk. We then apply these diffs directly to the
	working files (again, leaving the pristine files
	untouched.)


	[GENETIC MERGING]

	In theory, every time we do a merge, we should store the second
	argument to dir_delta in the working copy somewhere. In other
	words, we remember exactly the "tip" of trunk from which changes
	were merged.

	Now, the next time you merge, this "tip" becomes the first
	argument (base) to dir_delta. This prevents you from re-merging
	patches you already have, and solves a major CVS annoyance.

	Really, I'm not sure we ever need to track each merged changeset
	one by one. That might be unnecessary here.