www/merge-tracking/func-spec.html - subversion - Git at Google

 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
 <html xmlns="http://www.w3.org/1999/xhtml">
 <head>
 <style type="text/css"> /* <![CDATA[ */
   @import "../branding/css/tigris.css";
   @import "../branding/css/inst.css";
   /* ]]> */</style>
 <link rel="stylesheet" type="text/css" media="print"
   href="../branding/css/print.css"/>
 <script type="text/javascript" src="../branding/scripts/tigris.js"></script>
 <title>Merge Tracking Functional Specification</title>
 </head>

 <body>
 <div class="h1">
 <h1>Merge Tracking Functional Specification</h1>

 <p style="color: red">*** UNDER CONSTRUCTION ***</p>

 <p><a href="index.html">Merge tracking</a> functional
 specification.</p>

 <p style="color: red">TODO: Incorporate <a
 href="summit.html">CollabNet Summit findings</a>.  Describe how each
 <a href="requirements.html">requirement</a> will actually function for
 Subversion.  Remove redundancies.</p>

 <div class="h2" id="diff-status">
 <h2>Diff/Status operations</h2>

 <p>Output is shown the same as pre-Merge Tracking, except for:</p>

 <ul>
   <li>Diffs pretty-print changes to merge info in an easily
     human-readable form.</li>

   <li>Status represents changes to the merge info for the root of a
     tree as a property change.</li>
 </ul>

 </div>  <!-- diff-status -->


 <div class="h2" id="copy-move">
 <h2>Copy/Move operations</h2>

 <p>Copy and move operations handle two types of merge info:</p>

 <dl>
   <dt>Explicit</dt>
   <dd>The pre-existing value of the <code>svn:mergeinfo</code>
     property on the source path.</dd>

   <dt>Implicit</dt>
   <dd>All revisions represented by the object at the source path (from
     its "appeared in" revision to its current revision).</dd>
 </dl>

 <div class="h3" id="ra-copy-move">
 <h3>Repository Access operation</h3>

 <p>Copy/move operations which contact the repository include:</p>

 <ul>
   <li>WC to URL</li>
   <li>URL to WC</li>
   <li>URL to URL</li>
 </ul>

 <p>These operations always propogate both explicit and implicit merge
 info.  Other than the inclusion of merge info, operation is
 effectively the same as pre-Merge Tracking.</p>

 </div>  <!-- ra-copy-move -->

 <div class="h3" id="wc-wc-copy-move">
 <h3>Working Copy to Working Copy operation</h3>

 <p>Pre-Merge Tracking, WC to WC operations occurred offline (e.g. with
 no repository access).  This is a typical behavior of refactoring
 tools (e.g. IDEs like Eclipse), and is very useful when offline
 (e.g. on an airplane or subway, or at a cafe).</p>

 <p>However, to propogate merge info during copy/move operations,
 access to both a path's comprehensive merge info and its history is
 necessary.  To preserve offline operation, the Merge Tracking
 implementation supports two modes:</p>

 <ul>
   <li>A compatibility mode, which neither contacts the repository, nor
     does any merge info propogation (unless a copy source's merge info
     has been locally modified, in which its value is propogated the as
     any Subversion revision property).</li>

   <li>A mode which requires repository access (e.g. isn't offline),
     but which propogates all merge info from source path to
     destination.</li>
 </ul>

 <p>This behavior is comparable to the difference between <code>svn
 status</code> and <code>svn status -u</code>.</p>

 <p>While some state indicating delayed merge info retrieval and
 handling could instead be stored in WC to preserve offline operation,
 there are complications with this when subsequent uncommited revert
 operations should change the merge info (we'd have to store negative
 merge info in the WC).</p>

 </div>  <!-- wc-wc-copy-move -->

 </div>  <!-- copy-move -->

 <div class="h2" id="repeated-merge">
 <h2>Repeated Merge</h2>

 <p>There are two general schemes for solving the <a
 href="requirements.html#repeated-merge">repeated merge</a>
 problem.</p>

 <div class="h3" id="mrca-merge">
 <h3>The Most Recent Common Ancestor approach (MRCA)</h3>

 <p>In this scheme, you record an optional set of merge sources in each
 node-revision.  When asked to do a merge with only one source (that
 is, just "svn merge URL", with no second argument), you compute the
 most recent ancestor and do a three-way merge between the common
 ancestor, the given URL, and the WC.</p>

 <p>To compute the most recent ancestor, you chain off the immediate
 predecessors of each node-revision.  The immediate predecessors are
 the direct predecessor (the most recent node-revision within the node)
 and the merge sources.  An interleaved breadth-first search should
 find the most recent common ancestor.</p>

 </div>  <!-- mrca-merge -->

 <div class="h3" id="as-merge">
 <h3>The Ancestry Set approach (AS)</h3>

 <p>In this scheme, you record the full ancestry set for each
 node-revision -- that is, the set of all changes which are accounted
 for in that node-revision.  (How you store this ancestry set is
 unimportant; the point is, you need a reasonably efficient way of
 determining it when asked.)  If you are asked to "svn merge URL", you
 apply the changes present in URL's ancestry but absent in WC's
 ancestry.  Note that this is not a single three-way merge; you may
 have to apply a large number of disjoint changes to the WC.</p>

 <p>For a longer description of this approach, see the <a
 href="/design.html#model.merging-and-ancestry">"Merging and Ancestry"
 section</a> of the original <a href="/design.html">design doc</a>.</p>

 <div class="h4" id="aslb-merge">
 <h4>Ancestry-Sensitive Line-Based Merge</h4>

 <p>Make 'hunks' of contextually-merged text sensitive to ancestry.</p>

 <p>A high-resolution version of <a
 href="requirements.html#repeated-merge">repeated merge</a>.  Rather
 than tracking whole changesets, we track the lineage of specific lines
 of code within a file.  The basic idea is that when re-merging a
 particular hunk of code, the contextual-merging process is aware that
 certain lines of code already represent the merging of particular
 lines of development.  Jack Repenning has a great example of this from
 ClearCase (see ASCII diagram below).</p>

 <p>See the <a href="../variance-adjusted-patching.html">variance
 adjusted patching</a> document for an extended discussion of how to
 implement this by composing diffs; see <a
 href="http://svn.collab.net/svn-doxygen/svn__diff_8h.html#a11"
 ><code>svn_diff_diff4()</code></a> for an implementation of same.  We
 may be closer to ancestry-sensitive merging than we think.</p>

 <p>Here's an example demonstrating how individual lines of code can be
 tracked.  In this diagram, we're drawing the lineage of a single file,
 with time flowing downwards.  The file begins life with three lines of
 text, "1\n2\n\3\n".  The file then splits into two lines of
 development.</p>

 <pre>
                     1
                     2
                     3
                   /   \
                  /     \
                 /       \
             one           1
             two           2.5
             three         3
              |     \      |
              |      \     |
              |       \    |
              |        \   |
              |         \ one                ## This node is a human's
              |           two-point-five     ## merge of two sides.
              |           three
              |            |
              |            |
              |            |
             one          one
             Two          two-point-five
             three        newline
                \         three
                 \         |
                  \        |
                   \       |
                    \      |
                     \     |
                      \    |
                       \   |
                        \  |
                          one                ## This node is a human's
                          Two-point-five     ## merge of the changes
                          newline            ## since the last merge.
                          three
 </pre>

 <p>It's the second merge that's important here.</p>

 <p>In a system like Subversion, the second merge of the left branch to
 the right will fail miserably: the whole file's contents will be
 placed within conflict markers.  That's because it's trying to dumbly
 apply a patch that changes "1\n2\n3" to "one\nTwo\nthree", and the
 target file has no matching lines at all.</p>

 <p>A smarter system (like Clearcase) would remember that the previous
 merge had happened, and specifically notice that the lines "one" and
 "three" are the results of that previous merge.  Therefore, it would
 ask the human only to deal with the "Two" versus "two-point-five"
 conflict; the earlier changes ("1\n2\n3" to "one\ntwo\nthree") would
 already be accounted for.</p>

 </div>  <!-- aslb-merge -->

 </div>  <!-- as-merge -->

 <div class="h3">
 <h3>Comparisons, Arguments, and Questions</h3>

 <p>AS allows you to merge changes from a branch out of order, without
 doing any bookkeeping.  MRCA requires you to merge changes from a
 branch in order.</p>

 <p>At first look, MRCA may be simpler to implement, since it results
 in a single three-way merge.  However, it likely does not handle all
 edge cases.  For instance, it may break down faster if the merging
 topology is not hierarchical.</p>

 <p>MRCA may be easier for users to understand, even though AS is
 probably simpler to a mathematician.</p>

 <p>Consistency with other modern version controls systems is
 desirable.</p>

 <p>If a user asks to merge a directory, should we apply MRCA or AS to
 each subdirectory and file to determine what ancestor(s) to use?  Or
 should we apply MRCA or AS just once, to the directory itself?  The
 latter approach seems simpler and more efficient, but will break down
 quickly if the user wants to merge subdirectories of a branch in
 advance of merging in the whole thing.</p>

 </div>  <!-- h3 -->

 </div>  <!-- repeated-merge -->


 <div class="h2" id="related-documents">
 <h2>Related Documents and Discussion</h2>

 <ul>
   <li><a href="http://www.codeville.org/">Codeville</a> is reputed to
   excel both in its usefulness of storage of line-history in the
   <em>weave</em> format, and a corresponding merge algorithm:

   <ul>
     <li><a href="http://revctrl.org/PreciseCodevilleMerge">"Precise
     Codeville merge"</a> algorithm and Python implementation.  The
     algorithm takes into account line history, history points where
     they came from, the ability to retrieve ancestors' text as needed,
     and a snapshot of the current file.  It purports accuracy where
     <a href="http://revctrl.org/CodevilleMerge">other algorithms fall
     down</a>.</li>

     <li>Bram Cohen describes <a
     href="http://thread.gmane.org/gmane.comp.version-control.revctrl/2"
     >the merge algorithm</a> (May 2005)</li>
   </ul>
   </li>

   <li><a
   href="http://svn.collab.net/repos/svn/trunk/subversion/libsvn_fs_base/notes/structure">Structure
   of the Subversion FS BerkeleyDB backend</a></li>

 </ul>

 </div>  <!-- related-documents -->

 </div>  <!-- h1 -->
 </body>
 </html>
	<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
	"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
	<html xmlns="http://www.w3.org/1999/xhtml">
	<head>
	<style type="text/css"> /* <![CDATA[ */
	@import "../branding/css/tigris.css";
	@import "../branding/css/inst.css";
	/* ]]> */</style>
	<link rel="stylesheet" type="text/css" media="print"
	href="../branding/css/print.css"/>
	<script type="text/javascript" src="../branding/scripts/tigris.js"></script>
	<title>Merge Tracking Functional Specification</title>
	</head>

	<body>
	<div class="h1">
	<h1>Merge Tracking Functional Specification</h1>

	<p style="color: red">* UNDER CONSTRUCTION *</p>

	<p><a href="index.html">Merge tracking</a> functional
	specification.</p>

	<p style="color: red">TODO: Incorporate <a
	href="summit.html">CollabNet Summit findings</a>. Describe how each
	<a href="requirements.html">requirement</a> will actually function for
	Subversion. Remove redundancies.</p>

	<div class="h2" id="diff-status">
	<h2>Diff/Status operations</h2>

	<p>Output is shown the same as pre-Merge Tracking, except for:</p>

	<ul>
	<li>Diffs pretty-print changes to merge info in an easily
	human-readable form.</li>

	<li>Status represents changes to the merge info for the root of a
	tree as a property change.</li>
	</ul>

	</div> <!-- diff-status -->


	<div class="h2" id="copy-move">
	<h2>Copy/Move operations</h2>

	<p>Copy and move operations handle two types of merge info:</p>

	<dl>
	<dt>Explicit</dt>
	<dd>The pre-existing value of the <code>svn:mergeinfo</code>
	property on the source path.</dd>

	<dt>Implicit</dt>
	<dd>All revisions represented by the object at the source path (from
	its "appeared in" revision to its current revision).</dd>
	</dl>

	<div class="h3" id="ra-copy-move">
	<h3>Repository Access operation</h3>

	<p>Copy/move operations which contact the repository include:</p>

	<ul>
	<li>WC to URL</li>
	<li>URL to WC</li>
	<li>URL to URL</li>
	</ul>

	<p>These operations always propogate both explicit and implicit merge
	info. Other than the inclusion of merge info, operation is
	effectively the same as pre-Merge Tracking.</p>

	</div> <!-- ra-copy-move -->

	<div class="h3" id="wc-wc-copy-move">
	<h3>Working Copy to Working Copy operation</h3>

	<p>Pre-Merge Tracking, WC to WC operations occurred offline (e.g. with
	no repository access). This is a typical behavior of refactoring
	tools (e.g. IDEs like Eclipse), and is very useful when offline
	(e.g. on an airplane or subway, or at a cafe).</p>

	<p>However, to propogate merge info during copy/move operations,
	access to both a path's comprehensive merge info and its history is
	necessary. To preserve offline operation, the Merge Tracking
	implementation supports two modes:</p>

	<ul>
	<li>A compatibility mode, which neither contacts the repository, nor
	does any merge info propogation (unless a copy source's merge info
	has been locally modified, in which its value is propogated the as
	any Subversion revision property).</li>

	<li>A mode which requires repository access (e.g. isn't offline),
	but which propogates all merge info from source path to
	destination.</li>
	</ul>

	<p>This behavior is comparable to the difference between <code>svn
	status</code> and <code>svn status -u</code>.</p>

	<p>While some state indicating delayed merge info retrieval and
	handling could instead be stored in WC to preserve offline operation,
	there are complications with this when subsequent uncommited revert
	operations should change the merge info (we'd have to store negative
	merge info in the WC).</p>

	</div> <!-- wc-wc-copy-move -->

	</div> <!-- copy-move -->

	<div class="h2" id="repeated-merge">
	<h2>Repeated Merge</h2>

	<p>There are two general schemes for solving the <a
	href="requirements.html#repeated-merge">repeated merge</a>
	problem.</p>

	<div class="h3" id="mrca-merge">
	<h3>The Most Recent Common Ancestor approach (MRCA)</h3>

	<p>In this scheme, you record an optional set of merge sources in each
	node-revision. When asked to do a merge with only one source (that
	is, just "svn merge URL", with no second argument), you compute the
	most recent ancestor and do a three-way merge between the common
	ancestor, the given URL, and the WC.</p>

	<p>To compute the most recent ancestor, you chain off the immediate
	predecessors of each node-revision. The immediate predecessors are
	the direct predecessor (the most recent node-revision within the node)
	and the merge sources. An interleaved breadth-first search should
	find the most recent common ancestor.</p>

	</div> <!-- mrca-merge -->

	<div class="h3" id="as-merge">
	<h3>The Ancestry Set approach (AS)</h3>

	<p>In this scheme, you record the full ancestry set for each
	node-revision -- that is, the set of all changes which are accounted
	for in that node-revision. (How you store this ancestry set is
	unimportant; the point is, you need a reasonably efficient way of
	determining it when asked.) If you are asked to "svn merge URL", you
	apply the changes present in URL's ancestry but absent in WC's
	ancestry. Note that this is not a single three-way merge; you may
	have to apply a large number of disjoint changes to the WC.</p>

	<p>For a longer description of this approach, see the <a
	href="/design.html#model.merging-and-ancestry">"Merging and Ancestry"
	section</a> of the original <a href="/design.html">design doc</a>.</p>

	<div class="h4" id="aslb-merge">
	<h4>Ancestry-Sensitive Line-Based Merge</h4>

	<p>Make 'hunks' of contextually-merged text sensitive to ancestry.</p>

	<p>A high-resolution version of <a
	href="requirements.html#repeated-merge">repeated merge</a>. Rather
	than tracking whole changesets, we track the lineage of specific lines
	of code within a file. The basic idea is that when re-merging a
	particular hunk of code, the contextual-merging process is aware that
	certain lines of code already represent the merging of particular
	lines of development. Jack Repenning has a great example of this from
	ClearCase (see ASCII diagram below).</p>

	<p>See the <a href="../variance-adjusted-patching.html">variance
	adjusted patching</a> document for an extended discussion of how to
	implement this by composing diffs; see <a
	href="http://svn.collab.net/svn-doxygen/svn__diff_8h.html#a11"
	><code>svn_diff_diff4()</code></a> for an implementation of same. We
	may be closer to ancestry-sensitive merging than we think.</p>

	<p>Here's an example demonstrating how individual lines of code can be
	tracked. In this diagram, we're drawing the lineage of a single file,
	with time flowing downwards. The file begins life with three lines of
	text, "1\n2\n\3\n". The file then splits into two lines of
	development.</p>

	<pre>
	1
	2
	3
	/ \
	/ \
	/ \
	one 1
	two 2.5
	three 3
	\| \ \|
	\| \ \|
	\| \ \|
	\| \ \|
	\| \ one ## This node is a human's
	\| two-point-five ## merge of two sides.
	\| three
	\| \|
	\| \|
	\| \|
	one one
	Two two-point-five
	three newline
	\ three
	\ \|
	\ \|
	\ \|
	\ \|
	\ \|
	\ \|
	\ \|
	\ \|
	one ## This node is a human's
	Two-point-five ## merge of the changes
	newline ## since the last merge.
	three
	</pre>

	<p>It's the second merge that's important here.</p>

	<p>In a system like Subversion, the second merge of the left branch to
	the right will fail miserably: the whole file's contents will be
	placed within conflict markers. That's because it's trying to dumbly
	apply a patch that changes "1\n2\n3" to "one\nTwo\nthree", and the
	target file has no matching lines at all.</p>

	<p>A smarter system (like Clearcase) would remember that the previous
	merge had happened, and specifically notice that the lines "one" and
	"three" are the results of that previous merge. Therefore, it would
	ask the human only to deal with the "Two" versus "two-point-five"
	conflict; the earlier changes ("1\n2\n3" to "one\ntwo\nthree") would
	already be accounted for.</p>

	</div> <!-- aslb-merge -->

	</div> <!-- as-merge -->

	<div class="h3">
	<h3>Comparisons, Arguments, and Questions</h3>

	<p>AS allows you to merge changes from a branch out of order, without
	doing any bookkeeping. MRCA requires you to merge changes from a
	branch in order.</p>

	<p>At first look, MRCA may be simpler to implement, since it results
	in a single three-way merge. However, it likely does not handle all
	edge cases. For instance, it may break down faster if the merging
	topology is not hierarchical.</p>

	<p>MRCA may be easier for users to understand, even though AS is
	probably simpler to a mathematician.</p>

	<p>Consistency with other modern version controls systems is
	desirable.</p>

	<p>If a user asks to merge a directory, should we apply MRCA or AS to
	each subdirectory and file to determine what ancestor(s) to use? Or
	should we apply MRCA or AS just once, to the directory itself? The
	latter approach seems simpler and more efficient, but will break down
	quickly if the user wants to merge subdirectories of a branch in
	advance of merging in the whole thing.</p>

	</div> <!-- h3 -->

	</div> <!-- repeated-merge -->


	<div class="h2" id="related-documents">
	<h2>Related Documents and Discussion</h2>

	<ul>
	<li><a href="http://www.codeville.org/">Codeville</a> is reputed to
	excel both in its usefulness of storage of line-history in the
	<em>weave</em> format, and a corresponding merge algorithm:

	<ul>
	<li><a href="http://revctrl.org/PreciseCodevilleMerge">"Precise
	Codeville merge"</a> algorithm and Python implementation. The
	algorithm takes into account line history, history points where
	they came from, the ability to retrieve ancestors' text as needed,
	and a snapshot of the current file. It purports accuracy where
	<a href="http://revctrl.org/CodevilleMerge">other algorithms fall
	down</a>.</li>

	<li>Bram Cohen describes <a
	href="http://thread.gmane.org/gmane.comp.version-control.revctrl/2"
	>the merge algorithm</a> (May 2005)</li>
	</ul>
	</li>

	<li><a
	href="http://svn.collab.net/repos/svn/trunk/subversion/libsvn_fs_base/notes/structure">Structure
	of the Subversion FS BerkeleyDB backend</a></li>

	</ul>

	</div> <!-- related-documents -->

	</div> <!-- h1 -->
	</body>
	</html>