blob: 3f3625a934f5428393110c2b9c536811f2bc3e0d [file] [log] [blame]
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<style type="text/css"> /* <![CDATA[ */
@import "../branding/css/tigris.css";
@import "../branding/css/inst.css";
/* ]]> */</style>
<link rel="stylesheet" type="text/css" media="print"
href="../branding/css/print.css"/>
<script type="text/javascript" src="../branding/scripts/tigris.js"></script>
<title>Merge Tracking Functional Specification</title>
</head>
<body>
<div class="h1">
<h1>Merge Tracking Functional Specification</h1>
<p style="color: red">*** UNDER CONSTRUCTION ***</p>
<p><a href="index.html">Merge tracking</a> functional specification.
Describes Subversion 1.5.0, except where noted as
<i>unimplemented.</i></p>
<p style="color: red">TODO: Describe how each <a
href="requirements.html">requirement</a> will actually function for
Subversion. Remove redundancies.</p>
<div class="h2" id="diff-status">
<h2>Diff/Status operations</h2>
<p>Output is shown the same as pre-Merge Tracking, except for:</p>
<ul>
<li>Diffs pretty-print changes to merge info in an easily
human-readable form.</li>
<li>Diffs sometimes report spurious property changes from merge info
(bug?).</li>
<li>Status represents changes to the merge info for the root of a
tree as a property change.</li>
</ul>
</div> <!-- diff-status -->
<div class="h2" id="copy-move">
<h2>Copy/Move operations</h2>
<p>Copy and move operations handle two types of merge info:</p>
<dl>
<dt>Explicit</dt>
<dd>The pre-existing value of the <code>svn:mergeinfo</code>
property on the source path.</dd>
<dt>Implicit</dt>
<dd>All revisions represented by the object at the source path (from
its "appeared in" revision to its current revision).</dd>
</dl>
<div class="h3" id="ra-copy-move">
<h3>Repository Access operation</h3>
<p>Copy/move operations which contact the repository include:</p>
<ul>
<li>WC to URL (<i>code in progress, tests complete</i>, copy test
#11 still failing over ra_dav)</li>
<li>URL to WC</li>
<li>URL to URL</li>
</ul>
<p>These operations always propogate both explicit and implicit merge
info. Other than the inclusion of merge info, operation is
effectively the same as pre-Merge Tracking.</p>
</div> <!-- ra-copy-move -->
<div class="h3" id="wc-wc-copy-move">
<h3>Working Copy to Working Copy operation</h3>
<p>Pre-Merge Tracking, WC to WC operations occurred offline (e.g. with
no repository access). This is a typical behavior of refactoring
tools (e.g. IDEs like Eclipse), and is very useful when offline
(e.g. on an airplane or subway, or at a cafe).</p>
<p>However, to propogate merge info during copy/move operations,
access to both a path's comprehensive merge info and its history is
necessary. To preserve offline operation, the Merge Tracking
implementation supports two modes:</p>
<ul>
<li>A compatibility mode, which neither contacts the repository, nor
does any merge info propogation (unless a copy source's merge info
has been locally modified, in which its value is propogated the as
any Subversion revision property).</li>
<li>A mode which requires repository access (e.g. isn't offline),
but which propogates all merge info from source path to
destination (<i>unimplemented</i>, start with copy test #31).</li>
</ul>
<p>This behavior is comparable to the difference between <code>svn
status</code> and <code>svn status -u</code>.</p>
<p>While some state indicating delayed merge info retrieval and
handling could instead be stored in WC to preserve offline operation,
there are complications with this when subsequent uncommited revert
operations should change the merge info (we'd have to store negative
merge info in the WC).</p>
</div> <!-- wc-wc-copy-move -->
</div> <!-- copy-move -->
<div class="h2" id="meta-data">
<h2>Merge-related Meta Data</h2>
<p>Merge Tracking meta data is stored in housekeeping properties
(e.g. <code>svn:mergeinfo</code>).</p>
<div class="h3" id="meta-data-mainpulation">
<h3>Meta Data Manipulation</h3>
<p>While direct manipulation of housekeeping properties can be used to
change merge info, commands to manipulate this information have been
provided. Either style of operation supports adjustment of merge info
when <a href="requirements.html#manual-merge">manual merges</a> occur,
and can also be used to fulfill <a
href="requirements.html#revision-blocking">block changes undesired for
merge</a> (later, this might be better-addressed by a separate
housekeeping property).</p>
<ul>
<li><code>merge --record-only</code> adds (or subtracts, if a
reversed revision range is supplied) merge info for a path
<i>without performing the actual merge</i>.</li>
<li><code>propedit</code>/<code>propset</code> changes merge info
for a path.</li>
<li><code>propdel</code> removes mere info for a path.</li>
</ul>
</div> <!-- meta-data-mainpulation -->
<div class="h3" id="meta-data-audit">
<h3>Meta Data Audit and Query</h3>
<p>These features may or may not be completed for 1.5.0.</p>
<ul>
<li>Change Set Merge Availability (TODO)</li>
<li>Find Change Set (TODO)</li>
<li><a href="#commutative-author-and-rev">Commutative Author and Revision
Reporting</a></li>
</ul>
<div class="h4" id="commutative-author-and-rev">
<h4>Commutative Author and Revision Auditing</h4>
<div class="h5" id="auditing-scope">
<h5>Scope</h5>
<p>Most commands which show username and merge information should also
respect merge information and support <a
href="requirements.html#commutative-author-and-rev">Commutative
Auditing</a>. These commands, collectively referred to <em>auditing
commands</em>, are:</p>
<ul>
<li><code>svn log</code></li>
<li><code>svn blame</code></li>
<li><code>svn status --show-updates</code></li>
</ul>
<p><code>svn info</code> is purposely not included in this list, on
the grounds that one would typically need more information than it can
reasonably provide.</p>
<p>A new switch, <code>--merge-sensitive</code>, along with a corresponding
single-character shortcut, will be introduced for the auditing commands.
Using it will enable these commands to show the additional information gleaned
from parsing and processing the merge info on the targets in question. This
switch will also work with <code>--xml</code> to include additional merge
information. The new functionality added by <code>--merge-sensitive</code> is
as follows.</p>
<dl>
<dt><code>svn log</code></dt>
<dd><p>The original log message, in the current format, with the
addition of a list of revisions and merge source paths that have
been merged into the target. The output for <code>log</code> should
be consistent with the <code>diff</code> output for the
<code>svn:mergeinfo</code> property.</p>
<p>The <code>--verbose</code> switch will output the log information
for the merged revisions as well. This output may be in the style
of <code>svnmerge.py</code>: the primary log message, followed by
each of the original log messages indented with separators between
them.</p>
</dd>
<dt><code>svn blame</code></dt>
<dd>Two additional columns for each line, with the original revision
and author of that line. Unlike other commands, we do not need to
worry about multiple source revisions, because each line can have at
most one author.</dd>
<dt><code>svn status --show-updates</code></dt>
<dd>Add additional columns, reflecting the last original authors and
revisions.</dd>
</dl>
</div> <!-- auditing-scope -->
<div class="h5" id="auditing-questions">
<h5>Pending Questions</h5>
<ul>
<li>How will <code>--merge-sensitive</code> behave for commits which remove
merge info (e.g. reverts)?</li>
<li>In the case of <code>svn log</code>, would the user be better served if we
just included the original revision logs in line with the logs (i.e., no
special indentation, etc.)?</li>
<li>What about <code>svn ls --verbose</code>, which also shows revisions and
usernames?</li>
</ul>
</div> <!-- auditing-questions -->
<div class="h5" id="auditing-extra-credit">
<h5>Additional Features</h5>
<p>Although not part of the initial implementation, additional features have
been suggested:</p>
<ul>
<li>A configuration option to always enable <code>--merge-sensitive</code>.
</li>
</ul>
</div> <!-- auditing-extra-credit -->
</div> <!-- commutative-author-and-rev -->
</div> <!-- meta-data-audit -->
</div> <!-- meta-data -->
<div class="h2" id="repeated-merge">
<h2>Repeated Merge</h2>
<p>There are two general schemes for solving the <a
href="requirements.html#repeated-merge">repeated merge</a> problem.
Subversion 1.5 uses the <a href="#mrca-merge">Most Recent Common
Ancestor (MRCA)</a> approach. If a later version of Subversion
(e.g. 2.0) overhauls the Merge Tracking implementation, it'll likely
use the <a href="#as-merge">Ancestry Set (AS)</a> approach.</p>
<p>Either solution also supports the <a
href="requirements.html#cherry-picking">cherry picking</a>, <a
href="requirements.html#rollback-merge">rollback</a>, and <a
href="requirements.html#properties">property merging</a> use cases. A
<a href="requirements.html#merge-previews">merge preview</a> which is
lighter-weight than an uncommitted merge into a WC is not
supported.</p>
<div class="h3" id="mrca-merge">
<h3>The Most Recent Common Ancestor approach</h3>
<p>In this scheme, An optional set of merge sources in each
node-revision. When asked to do a merge with only one source (that
is, just <code>svn merge URL</code>, with no second argument), you
compute the most recent ancestor and do a three-way merge between the
common ancestor, the given URL, and the WC.</p>
<p>To compute the most recent ancestor, you chain off the immediate
predecessors of each node-revision. The immediate predecessors are
the direct predecessor (the most recent node-revision within the node)
and the merge sources. An interleaved breadth-first search should
find the most recent common ancestor.</p>
</div> <!-- mrca-merge -->
<div class="h3" id="as-merge">
<h3>The Ancestry Set approach</h3>
<p>In this scheme, you record the full ancestry set for each
node-revision -- that is, the set of all changes which are accounted
for in that node-revision. (How you store this ancestry set is
unimportant; the point is, you need a reasonably efficient way of
determining it when asked.) If you are asked to "svn merge URL", you
apply the changes present in URL's ancestry but absent in WC's
ancestry. Note that this is not a single three-way merge; you may
have to apply a large number of disjoint changes to the WC.</p>
<p>For a longer description of this approach, see the <a
href="/design.html#model.merging-and-ancestry">"Merging and Ancestry"
section</a> of the original <a href="/design.html">design doc</a>.</p>
<div class="h4" id="aslb-merge">
<h4>Ancestry-Sensitive Line-Based Merge</h4>
<p>Make 'hunks' of contextually-merged text sensitive to ancestry.</p>
<p>A high-resolution version of <a
href="requirements.html#repeated-merge">repeated merge</a>. Rather
than tracking whole changesets, we track the lineage of specific lines
of code within a file. The basic idea is that when re-merging a
particular hunk of code, the contextual-merging process is aware that
certain lines of code already represent the merging of particular
lines of development. Jack Repenning has a great example of this from
ClearCase (see ASCII diagram below).</p>
<p>See the <a href="../variance-adjusted-patching.html">variance
adjusted patching</a> document for an extended discussion of how to
implement this by composing diffs; see <a
href="http://svn.collab.net/svn-doxygen/svn__diff_8h.html#a11"
><code>svn_diff_diff4()</code></a> for an implementation of same. We
may be closer to ancestry-sensitive merging than we think.</p>
<p>Here's an example demonstrating how individual lines of code can be
tracked. In this diagram, we're drawing the lineage of a single file,
with time flowing downwards. The file begins life with three lines of
text, "1\n2\n\3\n". The file then splits into two lines of
development.</p>
<pre>
1
2
3
/ \
/ \
/ \
one 1
two 2.5
three 3
| \ |
| \ |
| \ |
| \ |
| \ one ## This node is a human's
| two-point-five ## merge of two sides.
| three
| |
| |
| |
one one
Two two-point-five
three newline
\ three
\ |
\ |
\ |
\ |
\ |
\ |
\ |
\ |
one ## This node is a human's
Two-point-five ## merge of the changes
newline ## since the last merge.
three
</pre>
<p>It's the second merge that's important here.</p>
<p>In a system like Subversion, the second merge of the left branch to
the right will fail miserably: the whole file's contents will be
placed within conflict markers. That's because it's trying to dumbly
apply a patch that changes "1\n2\n3" to "one\nTwo\nthree", and the
target file has no matching lines at all.</p>
<p>A smarter system (like Clearcase) would remember that the previous
merge had happened, and specifically notice that the lines "one" and
"three" are the results of that previous merge. Therefore, it would
ask the human only to deal with the "Two" versus "two-point-five"
conflict; the earlier changes ("1\n2\n3" to "one\ntwo\nthree") would
already be accounted for.</p>
</div> <!-- aslb-merge -->
</div> <!-- as-merge -->
<div class="h3">
<h3>Comparisons, Arguments, and Questions</h3>
<p>AS allows you to merge changes from a branch out of order, without
doing any bookkeeping. MRCA requires you to merge changes from a
branch in order.</p>
<p>MRCA is simpler to implement, since it results in a three-way merge
(which is well-understood by Subversion). However, it may not handle
all edge cases. For instance, it may break down faster if the merging
topology is not hierarchical.</p>
<p>MRCA may be easier for users to understand, even though AS is
probably simpler to a mathematician.</p>
<p>Consistency with other modern version controls systems is
desirable.</p>
<p>If a user asks to merge a directory, should we apply MRCA or AS to
each subdirectory and file to determine what ancestor(s) to use? Or
should we apply MRCA or AS just once, to the directory itself? The
latter approach seems simpler and more efficient, but will break down
quickly if the user wants to merge subdirectories of a branch in
advance of merging in the whole thing.</p>
</div> <!-- h3 -->
</div> <!-- repeated-merge -->
<div class="h2" id="conflict-resolution">
<h2>Merge Conflict Resolution</h2>
<p>Merging inevitably produces conflicts which cannot be resolved by
an algorithm alone. In such a case, human intervention is required to
resolve the conflicts. The merge algorithm used by Subversion's Merge
Tracking implementation makes this problem worse, since it breaks a
requested merge range into several merges to avoid <a
href="requirements.html#repeated-merge">repeating merges</a> which
have already been applied to a merge target or its children.</p>
<p>To help alleviate the pain of conflict resolution, a merge conflict
resolution callback can be employed by Subversion clients
(<i>unimplemented</i>). This callback is invoked whenever merge
conflicts are encountered, and can takes steps like launching a
graphical merge tool (for interactive conflict resolution), or
following a pre-specified directive like "always use the version from
my merge source". This last implementation can be used to support the
<a href="requirements.html#automated-merge">SCM automated merge</a>
use case.</p>
<p>In a future release, the command-line client may supply a
merge conflict resolution callback which will behave much like
<em>svk</em>, when in interactive mode displaying some context for
each conflict and prompting for how to resolve it, or when in
non-interactive mode, taking directives beforehand
(<i>unimplemented</i>).</p>
<p>Related discussion from the dev@ mailing list can be found
here:</p>
<ul>
<li><a
href="http://subversion.tigris.org/servlets/ReadMsg?listName=dev&amp;msgNo=121756">
Feedback solicited from IDE developers</a></li>
<li><a
href="http://subversion.tigris.org/servlets/ReadMsg?listName=dev&amp;msgNo=121263"
>Original API proposal</a> (likely requires changes)</li>
</ul>
<p><a href="http://subversion.tigris.org/issues/show_bug.cgi?id=2022"
>Issue #2022</a> is loosely related.</p>
<div class="h3" id="distributable-resolution">
<h3>Distribution of Conflict Resolution</h3>
<p>No explicit facility is provided for distribution of conflict
resolution. To support this use case, developers can co-ordinate with
each other to resolve merge conflicts on portions of a tree, and trade
patches.</p>
</div> <!-- distributable-resolution -->
</div> <!-- conflict-resolution -->
<div class="h2" id="migration-and-interoperability">
<h2>Migration and Interoperability</h2>
<div class="h3" id="migration">
<h3>Migration</h3>
<p>No explicit steps are necessary to migrate the content of a
pre-Merge Tracking repository. Only an upgrade to Subversion 1.5.0 is
necessary.</p>
<p>TODO: Merge meta data from svnmerge.py. Dan Berlin has written
Python code to perform this migration; it needs to be made available
in the <code>tools/server-side/</code> area of the distribution .</p>
</div> <!-- migration -->
<div class="h3" id="interoperability">
<h3>Interoperability</h3>
<p>Executive summary for client/repository inter-op:</p>
<ul>
<li>Older Subversion clients may <a href="requirements.html#compatibility"
>interact with a 1.5.x+ Subversion repository</a>, but will continue
to lack Merge Tracking functionality for:
<ul>
<li>Recording meta data about any merges performed.</li>
<li>Using merge meta data to avoid <a
href="requirements.html#repeated-merge">repeated merging</a>.</li>
</ul>
</li>
<li>1.5.x+ Subversion clients may interact with a older Subversion
repositories, with Merge Tracking functionality effectively
neutralized.</li>
</ul>
<p>Gory detail for client/repository inter-op:</p>
<ul>
<li>A repository 1.4.x- doesn't provide any way to retrieve
inherited merge info for a path (regardless of client version). For
a 1.5.x+ client which could theoretically make use of any merge info
available to it, this will typically neutralize its Merge Tracking
functionality. The one case where merge info might come into play
is when the merge info for a path is available locally (e.g. in the
client's WC); in this case, repeated merges may be avoided.</li>
<li>A 1.5.x client will record merge tracking meta data for merges
performed, regardless of repository version. However, a repository
1.4.x- won't know to do anything special with this merge info. When
the repository is upgraded to 1.5.x+, we'll retain this merge info
in the svn:mergeinfo property, but I'm not yet clear on what'll
happen to the sqlite merge info index. We may need some sort of
upgrade path here, but don't have one yet, and aren't promising
one.</li>
</ul>
<p>Subversion <a href="requirements.html#dump-load">dump files
continue to be fully portable</a> between pre- and post-Merge Tracking
versions of Subversion.</p>
</div> <!-- interoperability -->
</div> <!-- migration-and-interoperability -->
<div class="h2" id="related-documents">
<h2>Related Documents and Discussion</h2>
<ul>
<li><a href="http://subversion.tigris.org/merge-tracking/summit.html"
>CollabNet customer Merge Tracking Summit</a></li>
<li><a href="http://www.codeville.org/">Codeville</a> is reputed to
excel both in its usefulness of storage of line-history in the
<em>weave</em> format, and a corresponding merge algorithm:
<ul>
<li><a href="http://revctrl.org/PreciseCodevilleMerge">"Precise
Codeville merge"</a> algorithm and Python implementation. The
algorithm takes into account line history, history points where
they came from, the ability to retrieve ancestors' text as needed,
and a snapshot of the current file. It purports accuracy where
<a href="http://revctrl.org/CodevilleMerge">other algorithms fall
down</a>.</li>
<li>Bram Cohen describes <a
href="http://thread.gmane.org/gmane.comp.version-control.revctrl/2"
>the merge algorithm</a> (May 2005)</li>
</ul>
</li>
<li><a
href="http://svn.collab.net/repos/svn/trunk/subversion/libsvn_fs_base/notes/structure">Structure
of the Subversion FS BerkeleyDB backend</a></li>
</ul>
</div> <!-- related-documents -->
<p>$Date$</p>
</div> <!-- h1 -->
</body>
</html>