| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" |
| "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> |
| <html xmlns="http://www.w3.org/1999/xhtml"> |
| <head> |
| <style type="text/css"> /* <![CDATA[ */ |
| @import "../branding/css/tigris.css"; |
| @import "../branding/css/inst.css"; |
| /* ]]> */</style> |
| <link rel="stylesheet" type="text/css" media="print" |
| href="../branding/css/print.css"/> |
| <script type="text/javascript" src="../branding/scripts/tigris.js"></script> |
| <title>Merge Tracking Design</title> |
| |
| <style type="text/css"> |
| .question { color: grey; } |
| .answer { } |
| </style> |
| </head> |
| |
| <body> |
| <div class="h1"> |
| <h1>Merge Tracking Design</h1> |
| |
| <p style="color: red">*** UNDER CONSTRUCTION ***</p> |
| |
| <p>Subversion's <a href="index.html">merge tracking</a> uses a layered |
| design, with the user-visible operations based primarily on the |
| information from the <a href="#merge-history">merge history</a>.</p> |
| |
| <ul> |
| <li><a href="#merge-history">Merge History</a></li> |
| <li>Merge Operations (TODO)</li> |
| <li>Audit Operations (TODO)</li> |
| <li>Other Operations (TODO)</li> |
| </ul> |
| |
| <div class="h2" id="merge-history"> |
| <h2>Merge History</h2> |
| |
| <p>Or, <em>Tracking What Revisions Have Been Merged Where</em> |
| provides the information used by Subversion's merge tracking-related |
| capabilities (history sensitive merging, etc.). The design is based |
| on Dan Berlin's <a |
| href="http://svn.haxx.se/dev/archive-2006-04/0916.shtml">proposal</a> |
| <small>(Message-Id: |
| <1146242044.23718.1.camel@dberlin0-corp.corp.google.com>)</small>.</p> |
| |
| <p style="color: red">The following content has been temporarily |
| superseded by <a |
| href="http://svn.collab.net/repos/svn/branches/merge-tracking/notes/merge-tracking.txt">notes |
| on the development branch</a> until the design completely |
| stabilizes.</p> |
| |
| <div class="h3"> |
| <h3>Goals</h3> |
| |
| <p>The goal of the Merge History portion of the design is to track the |
| information needed by the operations outlined by the <a |
| href="requirements.html">use cases</a> (e.g. the revision numbers |
| being merged by a merge operation), and keeping this information in |
| the right places as various operations (<code>copy</code>, |
| <code>delete</code>, <code>add</code>, etc.) are performed. This |
| portion of the design does <em>not</em> encompass the operations |
| themselves.</p> |
| |
| <p>The goals:</p> |
| |
| <ul> |
| <li>To be able to track this down to what files in a working copy |
| and be able to determine what files have had what revisions merged |
| into them.</li> |
| |
| <li>To not need to contact the server more than we already do now to |
| determine which revisions have been merged in a file or directory |
| (ie some contact is acceptable, asking the server about each file is |
| not).</li> |
| |
| <li>To be able to edit merge information in a human editable |
| form.</li> |
| |
| <li>For the information to be stored in a space efficient manner, |
| and to be able to determine the revisions merged into a given |
| file/director in a time efficient manner.</li> |
| |
| <li>Still getting a conservatively correct answer (not worse than |
| what we have now) when no merge info is specified.</li> |
| |
| <li>To be able to collect, transmit, and keep this information up to |
| date as much as possible on the client side.</li> |
| |
| <li>To be able to index this information in the future order to |
| answer queries.</li> |
| </ul> |
| |
| <p><em>Pre-notes:</em> Whether to store info in revprops, or just on |
| dirs and files, is still an open question. It makes certain semantics |
| clearer on what operations do below and how to proceed easier. The |
| question is whether it is efficient enough time wise when we go to |
| retrieve merge info, and whether it complicates what merge has to do |
| too much. It also removes all of the listed <a |
| href="#merge-history-prereqs">prerequisites</a>.</p> |
| |
| </div> <!-- goals --> |
| |
| <div class="h3" id="storage"> |
| <h3>Information Storage</h3> |
| |
| <p>The first question that many people ask is "where should we store |
| the merge information?" (what we store will be covered next). A merge |
| history property, named <code>SVN_MERGE_PROPERTY</code> |
| (e.g. <code>svn:mergeinfo</code>) stored in the revision properties, |
| directory properties, and file properties. Each will store the |
| <em>full</em>, <em>complete</em> list of current merged in changes, as |
| far as it knows. This ensures that the merge algorithm and other |
| consumers do not have to walk back revisions in order to get the |
| transitive closure of the revision list.</p> |
| |
| <p>The way we choose which of file, dir, revprop merge info to use in |
| case of conflicts simple system of inheritance <a |
| href="#merge-history-footnotes">[1]</a> where the "most specific" |
| place wins. This means that if the property is set on a file, that |
| completely overrides the directory and revision level properties.</p> |
| |
| <p>The way we choose which to store to depends on how much and where |
| you merge, and will be covered in the semantics.</p> |
| |
| <p>The reasoning for this system is to avoid having to either copy |
| info everywhere, or crawl everywhere, in order to determine which |
| revisions have been applied. At the same time, we want to be space |
| and time efficient, so we can't just store the entire revision list |
| everywhere.</p> |
| |
| <p>As for what is stored:</p> |
| |
| <p>A survey of the community shows a slight preference for a human |
| editable storage format along the lines of how svnmerge stores its |
| merge info (e.g. path name and list of revisions). Binary storage of |
| such information would buy, on average, a 2-3 byte decrease per |
| revision/range in size over ASCII <a |
| href="merge-history-footnotes">[2]</a>, while making it not directly |
| human readable/editable.</p> |
| |
| <p>The revisions we have merged <em>into</em> something are |
| represented as a path, a colon, and then a comma separated revision |
| list, containing one or more revision or revision ranges. Revision |
| range end and beginning points are separated by "-".</p> |
| |
| <div class="h4" id="merge-info-grammar"> |
| <h4>Grammar</h4> |
| <table> |
| <tr> |
| <th align="left">Token</th> |
| <th align="left">Definition</th> |
| </tr> |
| <tr> |
| <td>revisionrange</td> |
| <td>REVISION "-" REVISION</td> |
| </tr> |
| <tr> |
| <td>revisioneelement</td> |
| <td>revisionrange | REVISION</td> |
| </tr> |
| <tr> |
| <td>revisionlist</td> |
| <td>revisioneelement (COMMA revisioneelement)*</td> |
| </tr> |
| <tr> |
| <td>revisionline</td> |
| <td>PATHNAME COLON revisionlist</td> |
| </tr> |
| <tr> |
| <td>top</td> |
| <td>revisionline (NEWLINE revisionline)*</td> |
| </tr> |
| </table> |
| </div> <!-- merge-info-grammar --> |
| |
| <p>This list will <em>not</em> be stored in a canonicalized minimal |
| form for a path (e.g. it may contain single revision numbers that |
| could fall into ranges). This is chiefly because the benefit of such |
| a canonical format (slightly easier for <em>comparison</em>, but not |
| indexing) is outweighed by the fact that generating a canonical form |
| may require groveling through a lot of information to determine what |
| that minimal canonical form is. In particular, it may be that the |
| revision list "5,7,9" is, in minimal canonical form, "5-9", because 6 |
| and 8 do not have any affect on the path name that 5 and 9 are from. |
| Canonicalization could be done as a server side post pass because the |
| information is stored in properties.</p> |
| |
| <p>Note that this revision format will not scale on its own if you |
| have a list of million revisions. None will easily. However, because |
| it is stored in properties, one can change the WC and FS backends to |
| simply do something different with this single property if they wanted |
| to. Given the rates of change of various very active repositories, |
| this will not be a problem we need to solve for many many years.</p> |
| |
| <p>The current thinking is that the payload of |
| <code>SVN_MERGE_PROPERTY</code> will be stored in an index separate |
| from the FS which is created during <code>svnadmin create</code>. |
| This index will support fast querying, be populated during a merge or |
| <code>svnadmin load</code>, and cough up its contents during a |
| <code>svn propget SVN_MERGE_PROPERTY</code> or <code>svnadmin |
| dump</code>. The contents of <code>SVN_MERGE_PROPERTY</code> will not |
| be stored redundantly in the FS (only in the index). Dan Berlin is |
| prototyping this index using sqlite3, and David James has a (generic) |
| schema design underway.</p> |
| |
| </div> <!-- storage --> |
| |
| |
| <div class="h3"> |
| <h3>Information Updating</h3> |
| |
| <p>Each operation you can perform may update or copy the merge info |
| associated with a path, file, or revision.</p> |
| |
| <p><code>svn add</code>: No change to merge info</p> |
| |
| <p><code>svn delete</code>: No direct change to merge info |
| (indirectly, because the props go away, so does the merge info for the |
| file)</p> |
| |
| <p><code>svn rename</code>: No change to merge info</p> |
| |
| <p><code>svn copy</code>: Copies the merge info from the source path |
| to the destination path, if any.</p> |
| |
| <p>This includes copying info from revprops, if necessary, by |
| determining if the merge info exists in a revprop for the last changed |
| commit for the source path, and copying it to the new revprop if it |
| does (someone probably needs to check if this is the right semantic |
| :P)</p> |
| |
| <p>All copies are full-copies of the merge information.</p> |
| |
| <p><code>svn merge</code>: Adds or subtracts to the merge info, |
| according to the following:</p> |
| |
| <ul> |
| <li>Where to put the info: |
| <ol> |
| <li>If the merge target is a single file, the merge info goes to |
| the property <code>SVN_MERGE_PROPERTY</code> set on that |
| file.</li> |
| |
| <li>If the merge target is a non-wc-root directory, the merge info |
| goes to the property <code>SVN_MERGE_PROPERTY</code> set on the |
| directory.</li> |
| |
| <li>If the merge target is a wc-root directory, the merge info |
| goes to the property <code>SVN_MERGE_PROPERTY</code> set on the |
| revprop.</li> |
| </ol> |
| </li> |
| |
| <li>What info is put: |
| <ol> |
| <li>If you are merging in reverse, revisions are subtracted from |
| the revision lines, but we never write out anti-revisions. Thus, |
| if you subtract all the merged revisions, you just get an empty |
| list, and if you do a reverse merge from there, you still get an |
| empty list.</li> |
| |
| <li>If you are merging forward, the revision(s) you are merging is |
| added to the revision line in sorted order (such that all |
| revisions and revision ranges in the list are monotonically |
| increasing from left to right). The exact details of how the |
| range is represented in terms of a list of single revs, or a |
| revision range, is left as a quality of implementation detail. |
| The only requirement is that the range be correct.</li> |
| |
| <li>The path (known as PATHNAME in the grammar) used as the key to |
| determine which revision line to change is the subdirectory path |
| being merged from, relative to the repo root, with the repo url |
| stripped from it.</li> |
| </ol> |
| </li> |
| </ul> |
| |
| <p>Thus a merge of revisions 1-9 from |
| http://foo.bar.com/reposroot/trunk would produce "/trunk:1-9"</p> |
| |
| <p>cross-repo merging is a bridge we can cross if we ever get there :).</p> |
| |
| </div> <!-- h3 --> |
| |
| <div class="h3" id="merge-history-prereqs"> |
| <h3>Prerequisites</h3> |
| |
| <ul> |
| <li>Need to be able to set a revprop to be stored on commit (see <a |
| href="http://subversion.tigris.org/issues/show_bug.cgi?id=1976">issue |
| 1976</a>).</li> |
| |
| <li>Need to be able to say to copy a revprop from a particular |
| revision and only contact the server at commit time.</li> |
| |
| <li>Need to be able to have auth treat |
| <code>SVN_MERGE_PROPERTY</code> revprop differently from other |
| revprops (either by special casing the cases users do care about |
| controlling, or special casing props users don't care about |
| controlling, etc.) so that people who don't have access to the |
| revprops can still do history sensitive merges of directories they |
| do have access to.</li> |
| </ul> |
| |
| </div> <!-- merge-history-pre-reqs --> |
| |
| |
| <div class="h3" id="merge-history-faq"> |
| <h3>FAQ</h3> |
| |
| <p class="question">What happens if someone commits a merge with a |
| non-merge tracking client?</p> |
| |
| <p class="answer">It simply means the next time you merge, you may |
| receive conflicts that you would have received if you were using a |
| non-history-sensitive client.</p> |
| |
| <p class="question">Can we do without the revprop portion of this |
| design?</p> |
| |
| <p class="answer">Technically yes, but it may require more crawling |
| and querying at merge time.</p> |
| |
| <p class="question">Can we do history sensitive WC-to-WC merges |
| without contacting the server?</p> |
| |
| <p class="answer">No. But you probably couldn't anyway, even if the |
| "revprop not being stored locally" issue were not here.</p> |
| |
| <p class="question">What happens if the merge history is not there?</p> |
| |
| <p class="answer">The same thing that happens if the merge history is |
| not there now.</p> |
| |
| <p class="question">What happens if a user edits merge history |
| incorrectly?</p> |
| |
| <p class="answer">They get the results specified by their merge |
| history.</p> |
| |
| <p class="question">How does the revprop stay up to date?</p> |
| |
| <p class="answer">We copy it from revision to revision.</p> |
| |
| <p class="question">What happens if a user manually edits a file and |
| unmerges a revision (e.g. not using a "reverse merge" command), but |
| doesn't update the merge info to match?</p> |
| |
| <p class="answer">The merge info will believe the change has still |
| been merged. This is a similar effect to performing a <a |
| href="requirements.html#manual-merge">manual merge</a>.</p> |
| |
| <p class="question">What happens if I <code>svn |
| move</code>/<code>rename</code> a directory, and then merge it |
| somewhere?</p> |
| |
| <p class="answer">This doesn't change history, only the future, thus |
| we will simply add the merge info for that directory as if it was a |
| new directory. We will not do something like attempt to modify all |
| merge info to specify the new directory, as that would be wrong.</p> |
| |
| <p class="question">I don't think only that copying info on <code>svn |
| copy</code> is correct. What if you copy a dir with merge info into a |
| dir where the dir has merge info -- won't it get the info wrong |
| now?</p> |
| |
| <div class="answer"> |
| <p>No. Let's say you have:</p> |
| |
| <pre> |
| a/foo (merge info: /trunk:5-9 |
| a/branches/bar (merge info: /trunk:1-4) |
| </pre> |
| |
| <p>If you copy a/foo into a/branches/bar, we now have:</p> |
| |
| <pre> |
| a/branches/bar (merge info: /trunk:1-4) |
| a/branches/bar/foo (merge info: /trunk:5-9) |
| </pre> |
| |
| <p>This is strictly correct. The only changes which have been merged |
| into a/branches/bar/foo, are still 5-9. The only changes which have |
| been merged into /branches/bar are 1-4. No merges have been performed |
| by your copy, only copies have been performed. If you perform a merge |
| of revisions 1-9 into bar, the results one would expect that the |
| history sensitive merge algorithm will skip revisions 5-9 for |
| a/branches/bar/foo, and skip revisions 1-4 for a/branches/bar. The |
| above information gives the algorithm the information necessary to do |
| this. |
| |
| So if you want to argue svn copy has the wrong merge info semantics, |
| it's not because of the above, AFAIK :)</p> |
| </div> |
| |
| </div> <!-- merge-history-faq --> |
| |
| <div class="h3" id="merge-history-footnotes"> |
| <h3>Footnotes</h3> |
| |
| <ol> |
| <li>This is not going to be a full blown design for property |
| inheritance, nor should this design depend on such a system being |
| implemented.</li> |
| |
| <li>Assuming 4 byte revision numbers, and repos with revisions |
| numbering in the hundreds of thousands. You could do slightly |
| better by variable length encoding of integers, but even that will |
| generally be 4 bytes for hundreds of thousands of revs. Thus, we |
| have strings like "102341" vs 4 byte numbers, meaning you save about |
| 2 bytes for a 4 byte integer. Range lists in binary would need a |
| distinguisher from single revisions, adding a single bit to both |
| (meaning you'd get 31 bit integers), and thus, would require 8 bytes |
| per range vs 12 bytes per range. While 30% is normally nothing to |
| sneeze at space wise, it's also not significantly more efficient in |
| time, as most of the time will not be spent parsing revision lists, |
| but doing something with them. The space efficiency therefore does |
| not seem to justify the cost you pay in not making them easily |
| editable.</li> |
| </ol> |
| |
| </div> <!-- merge-history-footnotes --> |
| |
| </div> <!-- merge-history --> |
| |
| |
| </div> <!-- h1 --> |
| </body> |
| </html> |