|  | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" | 
|  | "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> | 
|  | <html xmlns="http://www.w3.org/1999/xhtml"> | 
|  | <head> | 
|  | <style type="text/css"> /* <![CDATA[ */ | 
|  | @import "branding/css/tigris.css"; | 
|  | @import "branding/css/inst.css"; | 
|  | /* ]]> */</style> | 
|  | <link rel="stylesheet" type="text/css" media="print" | 
|  | href="branding/css/print.css"/> | 
|  | <script type="text/javascript" src="branding/scripts/tigris.js"></script> | 
|  | <title>Variance-Adjusted Patching for Divergent Texts</title> | 
|  | </head> | 
|  |  | 
|  | <body> | 
|  | <div class="app"> | 
|  |  | 
|  | <h2>Variance-Adjusted Patching for Divergent Texts</h2> | 
|  | <p>Karl Fogel <tt><kfogel@collab.net></tt></p> | 
|  |  | 
|  | <p>Date of First Publication: 22 May, 2001</p> | 
|  |  | 
|  | <p>$LastChangedDate$<br /> | 
|  | Canonical URL: <a href="http://svn.collab.net/repos/svn/trunk/www/variance-adjusted-patching.html">http://svn.collab.net/repos/svn/trunk/www/variance-adjusted-patching.html</a></p> | 
|  |  | 
|  | <hr />  <!-- ====================================================== --> | 
|  |  | 
|  | <p> | 
|  | Version control systems that use traditional context diff / unidiff | 
|  | format for branch merging tend to fail spuriously in high variance | 
|  | situations.  A "high variance" situation is one where the branch text | 
|  | differs from the source text by more than roughly 5-10%, counting | 
|  | lines changed, deleted, and added against the source.  This appendix | 
|  | describes a technique called "variance-adjusted patching", by which | 
|  | such applications can be made to succeed, when a common ancestor of | 
|  | the source and destination texts is available. | 
|  | </p> | 
|  |  | 
|  | <p> | 
|  | The problem with straight patch application is that it depends on the | 
|  | affected text hunks and their context not having changed significantly | 
|  | since source.  They may "float", that is, text may have been inserted | 
|  | or deleted around them, thus changing their locations in the target | 
|  | file.  But they themselves may not have changed, except for whitespace | 
|  | adjustments or other trivial (i.e., automatically ignorable) | 
|  | modifications.  If they have changed, the patch application will fail. | 
|  | </p> | 
|  |  | 
|  | <p> | 
|  | Variance adjustment is the process of transforming a patch to account | 
|  | for differences that have arisen in the target text since the patch | 
|  | was generated, so that the new patch will be applicable to the new | 
|  | target text and yet still have the "same" effect as the old patch.  It | 
|  | is, in essence, patching a patch.  For example, suppose a branch B | 
|  | diverges from a trunk T, and an engineer wants to merge the change | 
|  | from revision T:18 to T:19 into B:10, creating B:11.  If B:10 is too | 
|  | dissimilar to T:18, the merge may fail due to textual conflicts, even | 
|  | though there is no logical conflict.  Variance adjustment is about | 
|  | changing the diff T:18-T:19 so that it applies cleanly to B:10. | 
|  | </p> | 
|  |  | 
|  | <p> | 
|  | This is possible because for any given line that a patch hunk expects | 
|  | to see in the target text (e.g., in a context diff, these would be the | 
|  | context lines, plus the "before" version of affected lines), the | 
|  | version control system can know that line's history in both source | 
|  | and target, if there is a common ancestor to examine.  The process is | 
|  | very similar to the calculations done by "<tt>cvs annotate</tt>". | 
|  | Informally speaking, one of the following is true for each such line, | 
|  | ignoring floating due to out-of-bounds insertion and deletion: | 
|  | </p> | 
|  |  | 
|  | <ol style="list-style-type: lower-alphs"> | 
|  | <li><p><b>The line exists unchanged in both source and target</b></p> | 
|  | <p>If the line is still present in target and source, that means it | 
|  | has not been removed from either since the common ancestor, or | 
|  | if it was removed, it has been restored.  Nothing need be done | 
|  | to the diff.</p> | 
|  | </li> | 
|  | <li><p><b>The line has been changed in source but not in target</b></p> | 
|  | <p>In the diff hunk, the line should be changed to reflect the | 
|  | target's version of the line, so that the diff will be | 
|  | applicable to the target.</p> | 
|  | <p> | 
|  | <i>When we say that the line "has been changed", it just means | 
|  | that a different line appears where this line once appeared. | 
|  | Whether this is because the line has been edited, or because a | 
|  | new line has been inserted in target, does not matter.  (Another | 
|  | way to say it is that it need not matter whether the old version | 
|  | of the line still appears somewhere nearby in target.)  The | 
|  | important thing is that new text appears where the hunk is | 
|  | expecting old text, and that we have a way to change the hunk to | 
|  | expect the new text instead.</i></p> | 
|  | </li> | 
|  | <li><p><b>The line has been changed in target but not in source</b></p> | 
|  | <p>Same as above: in the diff hunk, the line should be changed to | 
|  | reflect the target's version of the line, so that the diff will | 
|  | be applicable to the target.  If this seems counterintuitive, | 
|  | drink the Kool-Aid and read again.</p> | 
|  | </li> | 
|  | <li><p><b>The line does not exist in target</b></p> | 
|  | <p>This could be for one of two reasons:</p> | 
|  | <ol> | 
|  | <li>the line is new in source, having been added after | 
|  | target branched, or</li> | 
|  | <li>the line was removed from target after the branch | 
|  | happened</li> | 
|  | </ol> | 
|  | <p>Either way, the line needs to be changed to the "corresponding" | 
|  | line in the target text -- that is, the line that, in target, | 
|  | now occupies the place of the obsolete expected line.  The | 
|  | version control system has enough information to determine | 
|  | which of the two cases above applies, and can use that to | 
|  | decide which of the lines currently in target is the best | 
|  | candidate to substitute for the missing line.</p> | 
|  | </li> | 
|  | </ol> | 
|  |  | 
|  | <p> | 
|  | Note that the case of the line not existing in source is impossible: | 
|  | if it doesn't exist in the current source, it certainly can't appear | 
|  | in the expected-text portions of the diff. | 
|  | </p> | 
|  |  | 
|  | <p> | 
|  | Although the patch program can adjust for floating at application | 
|  | time, the version control system can also adjust the line numbers in | 
|  | hunks to compensate for any insertions or deletions that have | 
|  | happened, in either source or target, outside the areas covered by the | 
|  | hunks.  This can result in a patch that applies perfectly, without any | 
|  | offset adjustment, to the target as it is known to the revision | 
|  | control system.  Of course, uncommitted local edits to the target | 
|  | cannot be compensated for in the diff -- we must still rely on the | 
|  | patch program's own offset adjustment and fuzz factor to handle those. | 
|  | </p> | 
|  |  | 
|  | <p> | 
|  | The above rules are an informal explanation of how variance adjustment | 
|  | works.  Below, the algorithm is described somewhat more formally, to | 
|  | show how the version control system would do variance adjustment in | 
|  | any situation.  Note that the algorithm is actually <i>too</i> | 
|  | powerful -- if taken to its logical limits, it can generate patches | 
|  | that apply cleanly even when the user would almost certainly prefer a | 
|  | conflict.  Thus it's necessary to offer an adjustment selectivity | 
|  | level; a good default selectivity would probably allow compensation | 
|  | for context variance, but not for variance in expected target lines. | 
|  | Anyway, the complete, unselective algorithm is described below, as it | 
|  | should be obvious where to attach the selectivity knobs if desired. | 
|  | </p> | 
|  |  | 
|  | <p> | 
|  | <strong>The Variance Adjustment Algorithm.</strong> | 
|  | </p> | 
|  |  | 
|  | <p> | 
|  | Some terminology: | 
|  | </p> | 
|  |  | 
|  | <dl> | 
|  | <dt><i>insertion</i></dt> | 
|  | <dd>A new line that has been added in either source or | 
|  | target since the common ancestor.</dd> | 
|  | <dt><i>deletion</i></dt> | 
|  | <dd>A line that has been deleted from either source or target | 
|  | since the common ancestor.</dd> | 
|  | <dt><i>edit</i></dt> | 
|  | <dd>A line that has been changed in either source or target | 
|  | since the common ancestor.</dd> | 
|  | <dt><i>out-range</i></dt> | 
|  | <dd>An event that happened to either source or target since the | 
|  | common ancestor, but to a line that is not covered in any of | 
|  | the diff hunks in the patch undergoing variance adjustment.  For | 
|  | example, an <i>out-range insertion</i> means a line was added in | 
|  | some region not directly touched by any of the hunks in the | 
|  | patch; an <i>out-range deletion</i> means a line was deleted | 
|  | from some region not in any hunk.</dd> | 
|  | <dt><i>in-range</i></dt> | 
|  | <dd>An event that happened to either source or target since the | 
|  | common ancestor, to a line covered in one of the diff hunks in | 
|  | the patch undergoing variance adjustment.  For example, an | 
|  | <i>in-range insertion</i> means a line was added in a region | 
|  | affected by one of the hunks; an <i>in-range deletion</i> means | 
|  | a line that was deleted from some region affected by a hunk.</dd> | 
|  | <dt><i>context</i></dt> | 
|  | <dd>A line included in a hunk for context only -- such lines are | 
|  | not affected by the patch.</dd> | 
|  | <dt><i>destination line, affected line</i></dt> | 
|  | <dd>(Used interchangeably).  A line that is actually affected by | 
|  | a patch.  For lines being edited or deleted, the line is already | 
|  | present in the target text; for lines being added, the line will | 
|  | only be present after the patch is applied.  In all cases it is | 
|  | present in a hunk in the patch.</dd> | 
|  | </dl> | 
|  |  | 
|  | <p> | 
|  | Description of the scenario: | 
|  | </p> | 
|  |  | 
|  | <p> | 
|  | Although there is no requirement that there be a "trunk"/"branch" | 
|  | relationship between the two lines, we will use that terminology below | 
|  | to make the description easier to understand.  In reality, there are | 
|  | simply two lines of development with a common ancestory. | 
|  | </p> | 
|  |  | 
|  | <p> | 
|  | Assume that branch B is rooted in trunk T at T:8; thus, | 
|  | T:8 == B:1.  The youngest trunk revision is T:20, and the | 
|  | youngest branch revision is B:15.  A user with working copy on branch | 
|  | B wants to merge in the change T:19-T:20.  For brevity, we will call | 
|  | this change P, for "patch".  (We may speak, without loss of | 
|  | generality, as if only one file is being patched, as the same | 
|  | algorithm holds for every file involved in the merge.) | 
|  | </p> | 
|  |  | 
|  | <p> | 
|  | There are two problems that prevent the direct application of the | 
|  | unadjusted P to B:15.  P, being based on source text T:19, | 
|  | </p> | 
|  |  | 
|  | <ol> | 
|  | <li>contains certain changes not present in B:15, namely, changes T:9 | 
|  | through T:19. | 
|  | </li> | 
|  | <li>does not contain certain changes present in B:15, namely, | 
|  | changes B:2 through B:15.</li> | 
|  | </ol> | 
|  |  | 
|  | <p> | 
|  | To adjust P, we first walk over the diff T:8-T:19, adjusting hunks as | 
|  | we go.  Each line in T:8-T:19 falls into one of the following | 
|  | categories: | 
|  | </p> | 
|  |  | 
|  | <ol> | 
|  | <li><i>Out-range added line</i>: decrement the line numbers in | 
|  | every hunk in P that comes after the addition.  This undoes the | 
|  | effect of the add, since the add never happened in B.</li> | 
|  | <li><i>Out-range deleted line</i>: increment the line numbers in | 
|  | every hunk in P that comes after the deletion.  This undoes the | 
|  | effect of the deletion, since the deletion never happened in | 
|  | B.</li> | 
|  | <li><i>Out-range edited line</i>: do nothing.  Out-range edits | 
|  | are irrelevant to P.</li> | 
|  | <li><i>Added line in context range in P</i>: remove the | 
|  | corresponding line from the context, optionally replacing it | 
|  | with new context <i>based on that region in B:15</i>, and | 
|  | adjust line numbers and mappings appropriately.</li> | 
|  | <li><i>Added line in affected text range in P</i>: this is a | 
|  | dependency problem -- part of the change T:18-T:19 depends on | 
|  | changes introduced to T after B branched.  There are several | 
|  | possible behaviors, depending on what the user wants.  One is | 
|  | to generate an informative error, stating that T:18-T:19 | 
|  | depends on some other change (T:N-T:M, where N>=8, M<=18, and | 
|  | M-N == 1); the exact revisions can be discovered | 
|  | automatically using the same process as "cvs annotate", though | 
|  | it may take some time to do so.  Another option is to include | 
|  | the change in P, as an insertion of the "after" version of the | 
|  | text, and adjust line numbers and mappings accordingly.  (And | 
|  | if all this isn't sounding a lot like a directory merge | 
|  | algorithm, try drinking more of the Kool-Aid.)  A third option | 
|  | is to include it as an insertion, but with metadata (such as | 
|  | CVS-style conflict markers) indicating that the line attempting | 
|  | to be patched does not exist in B.</li> | 
|  | <li><i>Deleted line that is in-range in P</i>: request another | 
|  | universe -- this situation can't happen in ours.</li> | 
|  | <li><i>In-range edited line</i>: reverse that edit in the "before" | 
|  | version of the corresponding line in the appropriate hunk in | 
|  | P, to obtain the version of the line that will be found in | 
|  | B when P is applied.</li> | 
|  | </ol> | 
|  |  | 
|  | <p> | 
|  | Now walk over the diff B:1-B:15, further adjusting P: | 
|  | </p> | 
|  |  | 
|  | <ol> | 
|  | <li><i>Out-range added line</i>: increment the line numbers in | 
|  | every hunk in P that comes after the addition.</li> | 
|  | <li><i>Out-range deleted line</i>: decrement the line numbers in | 
|  | every hunk in P that comes after the deletion.</li> | 
|  | <li><i>Out-range edited line</i>: do nothing.  Out-range edits | 
|  | are irrelevant to P.</li> | 
|  | <li><i>Added line in context range in P</i>: add that line to both | 
|  | sides of the context in the appropriate hunk, adjust line | 
|  | numbers in that hunk and all following hunks.</li> | 
|  | <li><i>Added line in affected text range in P</i>: add that line to | 
|  | the context in P and adjust numbers appropriately.</li> | 
|  | <li><i>Deleted line in-range in P</i>: dependency problem; see | 
|  | "<i>Added line in affected text range in P</i>" in the first | 
|  | pass above.</li> | 
|  | <li><i>In-range edited line</i>: apply that edit to the "before" | 
|  | version of the corresponding line in the appropriate hunk in | 
|  | P.</li> | 
|  | </ol> | 
|  |  | 
|  | <p> | 
|  | Now P is ready to apply to B:15, and even has enough context to apply | 
|  | over local edits. | 
|  | </p> | 
|  |  | 
|  | <p> | 
|  | Some of the steps in both loops can be compressed.  For example, when | 
|  | the number of added and deleted lines in a hunk is balanced, line | 
|  | number adjustment in later hunks is unnecessary; and balance can be | 
|  | quickly detected by inspecting only the line numbers of the hunk in | 
|  | question. | 
|  | </p> | 
|  |  | 
|  | <p> | 
|  | The algorithm handles inherent conflicts flexibly without losing | 
|  | information.  An "inherent" conflict is one where changes in T:8-T:19 | 
|  | overlap with changes in B:1-B:15.  They can be either papered over by | 
|  | adjusting context and "before" lines as enthusiastically as possible; | 
|  | reported as errors, with dependency information included; or both | 
|  | sides of the overlap can be handed to the user, surrounded by | 
|  | CVS-style conflict markers.  Or if conflict markers are not the | 
|  | desired interface, the patch can hold the conflict in some other | 
|  | metadata markup (using the patch format comment space), and other | 
|  | tools can report on and help resolve it. | 
|  | </p> | 
|  |  | 
|  | <p> | 
|  | Let's take a brief look at what adjustment does to some patches. | 
|  | Here's a simple case, arising from an attempt to port an individual | 
|  | change from trunk to branch.  On trunk T, revision 1 looks like this: | 
|  | </p> | 
|  |  | 
|  | <pre> | 
|  | int main (int argc, char **argv) | 
|  | { | 
|  | /* line minus-five of context */ | 
|  | /* line minus-four of context */ | 
|  | /* line minus-three of context */ | 
|  | /* line minus-two of context */ | 
|  | /* line minus-one of context */ | 
|  | printf ("Hello, world!\n"); | 
|  | /* line plus-one of context */ | 
|  | /* line plus-two of context */ | 
|  | /* line plus-three of context */ | 
|  | /* line plus-four of context */ | 
|  | /* line plus-five of context */ | 
|  | } | 
|  | </pre> | 
|  |  | 
|  | <p> | 
|  | A branch B also exists, rooted in the trunk at revision T:1.  On | 
|  | branch B, we commit a change to the context (we replace the words with | 
|  | numbers), so that B:2 looks like this: | 
|  | </p> | 
|  |  | 
|  | <pre> | 
|  | int main (int argc, char **argv) | 
|  | { | 
|  | /* line -5 of context */ | 
|  | /* line -4 of context */ | 
|  | /* line -3 of context */ | 
|  | /* line -2 of context */ | 
|  | /* line -1 of context */ | 
|  | printf ("Hello, world!\n"); | 
|  | /* line +1 of context */ | 
|  | /* line +2 of context */ | 
|  | /* line +3 of context */ | 
|  | /* line +4 of context */ | 
|  | /* line +5 of context */ | 
|  | } | 
|  | </pre> | 
|  |  | 
|  | <p> | 
|  | Meanwhile on the trunk, we change only the middle line, so that T:2 is | 
|  | committed as: | 
|  | </p> | 
|  |  | 
|  | <pre> | 
|  | int main (int argc, char **argv) | 
|  | { | 
|  | /* line minus-five of context */ | 
|  | /* line minus-four of context */ | 
|  | /* line minus-three of context */ | 
|  | /* line minus-two of context */ | 
|  | /* line minus-one of context */ | 
|  | printf ("Good-bye, cruel world!\n"); | 
|  | /* line plus-one of context */ | 
|  | /* line plus-two of context */ | 
|  | /* line plus-three of context */ | 
|  | /* line plus-four of context */ | 
|  | /* line plus-five of context */ | 
|  | } | 
|  | </pre> | 
|  |  | 
|  | <p> | 
|  | Using a traditional context-based approach, an attempt to patch B:2 | 
|  | with the difference T:1->T:2 will fail, with a reject file if | 
|  | vanilla `<tt>patch</tt>' was used, or with conflict markers if | 
|  | `<tt>rcsmerge</tt>' was used: | 
|  | </p> | 
|  |  | 
|  | <pre> | 
|  | $ cvs up -j 1.1 -j 1.2 foo.c | 
|  | ### Purists will note that -j HEAD would have been sufficient; however, ### | 
|  | ### two -j's more clearly reflects the intent of the example.           ### | 
|  | RCS file: /usr/local/cvs/vap0/foo.c,v | 
|  | retrieving revision 1.1 | 
|  | retrieving revision 1.2 | 
|  | Merging differences between 1.1 and 1.2 into foo.c | 
|  | rcsmerge: warning: conflicts during merge | 
|  | $ cat foo.c | 
|  | int main (int argc, char **argv) | 
|  | { | 
|  | <<<<<<< foo.c | 
|  | /* line -5 of context */ | 
|  | /* line -4 of context */ | 
|  | /* line -3 of context */ | 
|  | /* line -2 of context */ | 
|  | /* line -1 of context */ | 
|  | printf ("Hello, world!\n"); | 
|  | /* line +1 of context */ | 
|  | /* line +2 of context */ | 
|  | /* line +3 of context */ | 
|  | /* line +4 of context */ | 
|  | /* line +5 of context */ | 
|  | ======= | 
|  | /* line minus-five of context */ | 
|  | /* line minus-four of context */ | 
|  | /* line minus-three of context */ | 
|  | /* line minus-two of context */ | 
|  | /* line minus-one of context */ | 
|  | printf ("Good-bye, cruel world!\n"); | 
|  | /* line plus-one of context */ | 
|  | /* line plus-two of context */ | 
|  | /* line plus-three of context */ | 
|  | /* line plus-four of context */ | 
|  | /* line plus-five of context */ | 
|  | >>>>>>> 1.2 | 
|  | } | 
|  | </pre> | 
|  |  | 
|  | <p> | 
|  | Because the context has changed, the change to the middle line cannot | 
|  | be found.  If we adjust the patch T:1->T:2, it would look like | 
|  | this: | 
|  | </p> | 
|  |  | 
|  | <pre> | 
|  | *** foo.c	2001/05/18 08:15:28	1.1 | 
|  | --- foo.c	2001/05/18 08:20:59	1.2 | 
|  | *************** | 
|  | *** 5,11 **** | 
|  | /* line -3 of context */ | 
|  | /* line -2 of context */ | 
|  | /* line -1 of context */ | 
|  | !   printf ("Hello, world!\n"); | 
|  | /* line +1 of context */ | 
|  | /* line +2 of context */ | 
|  | /* line +3 of context */ | 
|  | --- 5,11 ---- | 
|  | /* line -3 of context */ | 
|  | /* line -2 of context */ | 
|  | /* line -1 of context */ | 
|  | !   printf ("Goodbye, cruel world!\n"); | 
|  | /* line +1 of context */ | 
|  | /* line +2 of context */ | 
|  | /* line +3 of context */ | 
|  | </pre> | 
|  |  | 
|  | <p> | 
|  | and would now apply perfectly to B:2, even though this diff never | 
|  | existed between any two committed revisions. | 
|  | </p> | 
|  |  | 
|  | <p> | 
|  | Let's try a slightly more complex case, starting with the same T:1 and | 
|  | B:1: | 
|  | </p> | 
|  |  | 
|  | <pre> | 
|  | int main (int argc, char **argv) | 
|  | { | 
|  | /* line minus-five of context */ | 
|  | /* line minus-four of context */ | 
|  | /* line minus-three of context */ | 
|  | /* line minus-two of context */ | 
|  | /* line minus-one of context */ | 
|  | printf ("Hello, world!\n"); | 
|  | /* line plus-one of context */ | 
|  | /* line plus-two of context */ | 
|  | /* line plus-three of context */ | 
|  | /* line plus-four of context */ | 
|  | /* line plus-five of context */ | 
|  | } | 
|  | </pre> | 
|  |  | 
|  | <p> | 
|  | T:2 has some inserted lines near the top: | 
|  | </p> | 
|  |  | 
|  | <pre> | 
|  | #include <stdio.h> | 
|  |  | 
|  | int main (int argc, char **argv) | 
|  | { | 
|  | /* line minus-five of context */ | 
|  | /* line minus-four of context */ | 
|  | /* line minus-three of context */ | 
|  | /* line minus-two of context */ | 
|  | /* line minus-one of context */ | 
|  | printf ("Hello, world!\n"); | 
|  | /* line plus-one of context */ | 
|  | /* line plus-two of context */ | 
|  | /* line plus-three of context */ | 
|  | /* line plus-four of context */ | 
|  | /* line plus-five of context */ | 
|  | } | 
|  | </pre> | 
|  |  | 
|  | <p> | 
|  | And then another change was committed on the trunk, editing the middle | 
|  | line, so that T:3 looks like this: | 
|  | </p> | 
|  |  | 
|  | <pre> | 
|  | #include <stdio.h> | 
|  |  | 
|  | int main (int argc, char **argv) | 
|  | { | 
|  | /* line minus-five of context */ | 
|  | /* line minus-four of context */ | 
|  | /* line minus-three of context */ | 
|  | /* line minus-two of context */ | 
|  | /* line minus-one of context */ | 
|  | printf ("Good-bye, cruel world!\n"); | 
|  | /* line plus-one of context */ | 
|  | /* line plus-two of context */ | 
|  | /* line plus-three of context */ | 
|  | /* line plus-four of context */ | 
|  | /* line plus-five of context */ | 
|  | } | 
|  | </pre> | 
|  |  | 
|  | <p> | 
|  | Meanwhile, one complex change has been committed on the branch, so B:2 | 
|  | looks like this: | 
|  | </p> | 
|  |  | 
|  | <pre> | 
|  | int main (int argc, char **argv) | 
|  | { | 
|  | /* line minus-five of context */ | 
|  | /* line minus-four of context */ | 
|  | /* line minus-three of context */ | 
|  | /* line -1 of context */ | 
|  | printf ("Hello, world!\n"); | 
|  | /* newly inserted line of context */ | 
|  | /* line plus-one of context */ | 
|  | /* line plus-two of context */ | 
|  | /* line plus-three of context */ | 
|  | /* line plus-four of context */ | 
|  | /* line plus-five of context */ | 
|  | } | 
|  | </pre> | 
|  |  | 
|  | <p> | 
|  | Line minus-two of context was removed, line minus-one was changed to | 
|  | "-1", and a new line of context was inserted farther down.  Now the | 
|  | branch developer wants to merge in the change T:2-T:3.  Unadjusted, | 
|  | the patch looks like this: | 
|  | </p> | 
|  |  | 
|  | <pre> | 
|  | Index: foo.c | 
|  | =================================================================== | 
|  | RCS file: /usr/local/cvs/vap1/foo.c,v | 
|  | retrieving revision 1.2 | 
|  | retrieving revision 1.3 | 
|  | diff -c -r1.2 -r1.3 | 
|  | *** foo.c	2001/05/22 21:17:14	1.2 | 
|  | --- foo.c	2001/05/22 21:19:42	1.3 | 
|  | *************** | 
|  | *** 7,13 **** | 
|  | /* line minus-three of context */ | 
|  | /* line minus-two of context */ | 
|  | /* line minus-one of context */ | 
|  | !   printf ("Hello, world!\n"); | 
|  | /* line plus-one of context */ | 
|  | /* line plus-two of context */ | 
|  | /* line plus-three of context */ | 
|  | --- 7,13 ---- | 
|  | /* line minus-three of context */ | 
|  | /* line minus-two of context */ | 
|  | /* line minus-one of context */ | 
|  | !   printf ("Good-bye, cruel world!\n"); | 
|  | /* line plus-one of context */ | 
|  | /* line plus-two of context */ | 
|  | /* line plus-three of context */ | 
|  | </pre> | 
|  |  | 
|  | <p> | 
|  | Following the adjustment algorithm, it looks like this: | 
|  | </p> | 
|  |  | 
|  | <pre> | 
|  | Index: foo.c | 
|  | =================================================================== | 
|  | RCS file: /usr/local/cvs/vap1/foo.c,v | 
|  | retrieving revision 1.2 | 
|  | retrieving revision 1.3 | 
|  | diff -c -r1.2 -r1.3 | 
|  | *** foo.c	2001/05/22 21:17:14	1.2 | 
|  | --- foo.c	2001/05/22 21:19:42	1.3 | 
|  | *************** | 
|  | *** 5,11 **** | 
|  | /* line minus-four of context */ | 
|  | /* line minus-three of context */ | 
|  | /* line -1 of context */ | 
|  | !   printf ("Hello, world!\n"); | 
|  | /* newly inserted line of context */ | 
|  | /* line plus-one of context */ | 
|  | /* line plus-two of context */ | 
|  | --- 5,11 ---- | 
|  | /* line minus-four of context */ | 
|  | /* line minus-three of context */ | 
|  | /* line -1 of context */ | 
|  | !   printf ("Good-bye, cruel world!\n"); | 
|  | /* newly inserted line of context */ | 
|  | /* line plus-one of context */ | 
|  | /* line plus-two of context */ | 
|  | </pre> | 
|  |  | 
|  | <p> | 
|  | Note how the line numbers have been adjusted to compensate for the | 
|  | absence of change T:1-T:2 (since the patch we're applying only | 
|  | concerns T:2-T:3), and the context has been changed to include the | 
|  | difference B:1-B:2. | 
|  | </p> | 
|  |  | 
|  | <p> | 
|  | Again, this patch never existed between any two revisions, but it | 
|  | applies cleanly to B:2. | 
|  | </p> | 
|  |  | 
|  | <p> | 
|  | The cost of the adjustment algorithm is proportional to the size of | 
|  | the diff from branch point to trunk head plus the diff from branch | 
|  | point to branch head, plus the size (for time, number of hunks) in the | 
|  | diff being adjusted.  This is comparable to the cost of using | 
|  | reverse-delta storage in the first place.  Also, the user consuming | 
|  | the final patch does not need access to all the intermediate changes | 
|  | -- she only needs permissions on the endpoints of the diff, the common | 
|  | ancestor of the two lines, and the revision to which the patch is | 
|  | being applied.  Intermediate revisions need not be accessible by the | 
|  | user. | 
|  | </p> | 
|  |  | 
|  | <p> | 
|  | Through variance-adjusted patching, any repository that stores | 
|  | successive revisions can apply patches even between highly divergent | 
|  | branches, as long as there is a common ancestor. | 
|  | </p> | 
|  |  | 
|  | <h4>Addendum</h4> | 
|  |  | 
|  | <p>For clarity, I have used line-based diffs to describe the process | 
|  | of variance adjustment.  The algorithm applies equally well, of | 
|  | course, to other kinds of diffs, including arbitrary binary diffs.  In | 
|  | fact, | 
|  | <a href="http://www.red-bean.com/pipermail/changesets/2003-April/000097.html" | 
|  | >Dr. William Uther wrote</a>:</p> | 
|  |  | 
|  | <blockquote> | 
|  | <p> | 
|  | <em>I actually gave an example of a binary version of the VAP algorithm on | 
|  | the svn-dev list a while back.  It used VAP at the byte level, instead | 
|  | of at the line level.  It was similar to your suggested VAP algorithm | 
|  | in that you could adjust the 'soft-conflict' zone around the changes. | 
|  | </em> | 
|  | </p> | 
|  | </blockquote> | 
|  |  | 
|  | <p>The message he's referring to appears to be this one:</p> | 
|  |  | 
|  | <pre> | 
|  | <a href="http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgId=162878">http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgId=162878</a> | 
|  | From: William Uther | 
|  | To: dev@subversion.tigris.org | 
|  | Subject: Re: Merging | 
|  | Date: Sat, 16 Feb 2002 21:19:43 -0500 | 
|  | Message-ID: <B8947D6F.787C%will=subversion@cs.cmu.edu> | 
|  | In-Reply-To: <51C7002B020CD411824E009027C469F782259A@cldxch01.hifn.com> | 
|  | </pre> | 
|  |  | 
|  | <p>I mention this here in order to help anyone using this algorithm as | 
|  | prior art in a patent challenge.  I doubt I was the first person to | 
|  | describe variance-adjusted patching, but at the very least we can know | 
|  | that the method was described as of 22 May 2001 — and | 
|  | that both I and someone else "proficient in the art" (William Uther) | 
|  | thought its applicability to binary diffs to be pretty obvious.</p> | 
|  |  | 
|  | </div> | 
|  | </body> | 
|  | </html> |