blob: 221c07e70742fba6de949cf700258790c1d6340f [file] [log] [blame]
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<style type="text/css"> /* <![CDATA[ */
@import "../branding/css/tigris.css";
@import "../branding/css/inst.css";
/* ]]> */</style>
<link rel="stylesheet" type="text/css" media="print"
href="../branding/css/print.css"/>
<script type="text/javascript" src="../branding/scripts/tigris.js"></script>
<title>Merge Tracking Functional Specification</title>
</head>
<body>
<div class="h1">
<h1>Merge Tracking Functional Specification</h1>
<p style="color: red">*** UNDER CONSTRUCTION ***</p>
<p><a href="index.html">Merge tracking</a> functional
specification.</p>
<p style="color: red">TODO: Incorporate <a
href="summit.html">CollabNet Summit findings</a>. Describe how each
<a href="requirements.html">requirement</a> will actually function for
Subversion. Remove redundancies.</p>
<div class="h2" id="diff-status">
<h2>Diff/Status operations</h2>
<p>Output is shown the same as pre-Merge Tracking, except for:</p>
<ul>
<li>Diffs pretty-print changes to merge info in an easily
human-readable form.</li>
<li>Status represents changes to the merge info for the root of a
tree as a property change.</li>
</ul>
</div> <!-- diff-status -->
<div class="h2" id="copy-move">
<h2>Copy/Move operations</h2>
<p>Copy and move operations handle two types of merge info:</p>
<dl>
<dt>Explicit</dt>
<dd>The pre-existing value of the <code>svn:mergeinfo</code>
property on the source path.</dd>
<dt>Implicit</dt>
<dd>All revisions represented by the object at the source path (from
its "appeared in" revision to its current revision).</dd>
</dl>
<div class="h3" id="ra-copy-move">
<h3>Repository Access operation</h3>
<p>Copy/move operations which contact the repository include:</p>
<ul>
<li>WC to URL</li>
<li>URL to WC</li>
<li>URL to URL</li>
</ul>
<p>These operations always propogate both explicit and implicit merge
info. Other than the inclusion of merge info, operation is
effectively the same as pre-Merge Tracking.</p>
</div> <!-- ra-copy-move -->
<div class="h3" id="wc-wc-copy-move">
<h3>Working Copy to Working Copy operation</h3>
<p>Pre-Merge Tracking, WC to WC operations occurred offline (e.g. with
no repository access). This is a typical behavior of refactoring
tools (e.g. IDEs like Eclipse), and is very useful when offline
(e.g. on an airplane or subway, or at a cafe).</p>
<p>However, to propogate merge info during copy/move operations,
access to both a path's comprehensive merge info and its history is
necessary. To preserve offline operation, the Merge Tracking
implementation supports two modes:</p>
<ul>
<li>A compatibility mode, which neither contacts the repository, nor
does any merge info propogation (unless a copy source's merge info
has been locally modified, in which its value is propogated the as
any Subversion revision property).</li>
<li>A mode which requires repository access (e.g. isn't offline),
but which propogates all merge info from source path to
destination.</li>
</ul>
<p>This behavior is comparable to the difference between <code>svn
status</code> and <code>svn status -u</code>.</p>
<p>While some state indicating delayed merge info retrieval and
handling could instead be stored in WC to preserve offline operation,
there are complications with this when subsequent uncommited revert
operations should change the merge info (we'd have to store negative
merge info in the WC).</p>
</div> <!-- wc-wc-copy-move -->
</div> <!-- copy-move -->
<div class="h2" id="repeated-merge">
<h2>Repeated Merge</h2>
<p>There are two general schemes for solving the <a
href="requirements.html#repeated-merge">repeated merge</a>
problem.</p>
<div class="h3" id="mrca-merge">
<h3>The Most Recent Common Ancestor approach (MRCA)</h3>
<p>In this scheme, you record an optional set of merge sources in each
node-revision. When asked to do a merge with only one source (that
is, just "svn merge URL", with no second argument), you compute the
most recent ancestor and do a three-way merge between the common
ancestor, the given URL, and the WC.</p>
<p>To compute the most recent ancestor, you chain off the immediate
predecessors of each node-revision. The immediate predecessors are
the direct predecessor (the most recent node-revision within the node)
and the merge sources. An interleaved breadth-first search should
find the most recent common ancestor.</p>
</div> <!-- mrca-merge -->
<div class="h3" id="as-merge">
<h3>The Ancestry Set approach (AS)</h3>
<p>In this scheme, you record the full ancestry set for each
node-revision -- that is, the set of all changes which are accounted
for in that node-revision. (How you store this ancestry set is
unimportant; the point is, you need a reasonably efficient way of
determining it when asked.) If you are asked to "svn merge URL", you
apply the changes present in URL's ancestry but absent in WC's
ancestry. Note that this is not a single three-way merge; you may
have to apply a large number of disjoint changes to the WC.</p>
<p>For a longer description of this approach, see the <a
href="/design.html#model.merging-and-ancestry">"Merging and Ancestry"
section</a> of the original <a href="/design.html">design doc</a>.</p>
<div class="h4" id="aslb-merge">
<h4>Ancestry-Sensitive Line-Based Merge</h4>
<p>Make 'hunks' of contextually-merged text sensitive to ancestry.</p>
<p>A high-resolution version of <a
href="requirements.html#repeated-merge">repeated merge</a>. Rather
than tracking whole changesets, we track the lineage of specific lines
of code within a file. The basic idea is that when re-merging a
particular hunk of code, the contextual-merging process is aware that
certain lines of code already represent the merging of particular
lines of development. Jack Repenning has a great example of this from
ClearCase (see ASCII diagram below).</p>
<p>See the <a href="../variance-adjusted-patching.html">variance
adjusted patching</a> document for an extended discussion of how to
implement this by composing diffs; see <a
href="http://svn.collab.net/svn-doxygen/svn__diff_8h.html#a11"
><code>svn_diff_diff4()</code></a> for an implementation of same. We
may be closer to ancestry-sensitive merging than we think.</p>
<p>Here's an example demonstrating how individual lines of code can be
tracked. In this diagram, we're drawing the lineage of a single file,
with time flowing downwards. The file begins life with three lines of
text, "1\n2\n\3\n". The file then splits into two lines of
development.</p>
<pre>
1
2
3
/ \
/ \
/ \
one 1
two 2.5
three 3
| \ |
| \ |
| \ |
| \ |
| \ one ## This node is a human's
| two-point-five ## merge of two sides.
| three
| |
| |
| |
one one
Two two-point-five
three newline
\ three
\ |
\ |
\ |
\ |
\ |
\ |
\ |
\ |
one ## This node is a human's
Two-point-five ## merge of the changes
newline ## since the last merge.
three
</pre>
<p>It's the second merge that's important here.</p>
<p>In a system like Subversion, the second merge of the left branch to
the right will fail miserably: the whole file's contents will be
placed within conflict markers. That's because it's trying to dumbly
apply a patch that changes "1\n2\n3" to "one\nTwo\nthree", and the
target file has no matching lines at all.</p>
<p>A smarter system (like Clearcase) would remember that the previous
merge had happened, and specifically notice that the lines "one" and
"three" are the results of that previous merge. Therefore, it would
ask the human only to deal with the "Two" versus "two-point-five"
conflict; the earlier changes ("1\n2\n3" to "one\ntwo\nthree") would
already be accounted for.</p>
</div> <!-- aslb-merge -->
</div> <!-- as-merge -->
<div class="h3">
<h3>Comparisons, Arguments, and Questions</h3>
<p>AS allows you to merge changes from a branch out of order, without
doing any bookkeeping. MRCA requires you to merge changes from a
branch in order.</p>
<p>At first look, MRCA may be simpler to implement, since it results
in a single three-way merge. However, it likely does not handle all
edge cases. For instance, it may break down faster if the merging
topology is not hierarchical.</p>
<p>MRCA may be easier for users to understand, even though AS is
probably simpler to a mathematician.</p>
<p>Consistency with other modern version controls systems is
desirable.</p>
<p>If a user asks to merge a directory, should we apply MRCA or AS to
each subdirectory and file to determine what ancestor(s) to use? Or
should we apply MRCA or AS just once, to the directory itself? The
latter approach seems simpler and more efficient, but will break down
quickly if the user wants to merge subdirectories of a branch in
advance of merging in the whole thing.</p>
</div> <!-- h3 -->
</div> <!-- repeated-merge -->
<div class="h2" id="related-documents">
<h2>Related Documents and Discussion</h2>
<ul>
<li><a href="http://www.codeville.org/">Codeville</a> is reputed to
excel both in its usefulness of storage of line-history in the
<em>weave</em> format, and a corresponding merge algorithm:
<ul>
<li><a href="http://revctrl.org/PreciseCodevilleMerge">"Precise
Codeville merge"</a> algorithm and Python implementation. The
algorithm takes into account line history, history points where
they came from, the ability to retrieve ancestors' text as needed,
and a snapshot of the current file. It purports accuracy where
<a href="http://revctrl.org/CodevilleMerge">other algorithms fall
down</a>.</li>
<li>Bram Cohen describes <a
href="http://thread.gmane.org/gmane.comp.version-control.revctrl/2"
>the merge algorithm</a> (May 2005)</li>
</ul>
</li>
<li><a
href="http://svn.collab.net/repos/svn/trunk/subversion/libsvn_fs_base/notes/structure">Structure
of the Subversion FS BerkeleyDB backend</a></li>
</ul>
</div> <!-- related-documents -->
</div> <!-- h1 -->
</body>
</html>