| |
| SUBVERSION "MERGE" and "SWITCH" FEATURES |
| Slated for 0.9 (M9) |
| |
| 1st draft writ by Karl & Ben, |
| after much discussion with CMike & Greg. |
| |
| |
| This is primarily a description of the semantics of merge and switch, |
| that is, Subversion's user-visible behavior in these operations. It |
| also discusses some implementation issues. |
| |
| Definitions: |
| |
| * Merging is like "cvs update -j -j". I.e., take the difference |
| between two trees in the repository, and apply it diffily to the |
| working copy. |
| |
| * Switching means to switch the working copy from one line of |
| development over to another, like "cvs update -r <TAG|BRANCH>". |
| Of course, Subversion doesn't really have the concept of lines |
| of development, it just has copies. But if a working directory |
| is based on repository tree T, and you "switch" it to be based on |
| repository tree S, where T and S are similar (related) in some |
| way, that's effectively the same as what CVS does. |
| |
| |
| The General Theory of Updating, Merging, and Switching |
| ====================================================== |
| |
| Updating, merging, and switching are all very similar operations; each |
| command is a request to have the server modify the working copy in |
| some way. Each of these subcommands begins with the client describing |
| the "state" of the working copy to the server, and ends with the |
| server comparing trees and sending back tree-delta(s) to the client. |
| |
| Here's the easiest way to understand the three operations: assume that |
| X:PATH1 and Y:PATH2 are paths within two repository revisions X and Y, |
| which are possibly the same revision. The server compares the X:PATH1 |
| and Y:PATH2 and sends the difference to the client. |
| |
| * In an update, PATH1 == PATH2 always, and after the tree-delta is |
| applied, the working copy metadata is changed (specifically, |
| revisions are bumped.) |
| |
| * In a merge, PATH1 does not necessarily equal PATH2, and we don't |
| touch metadata (except maybe for "genetic" merging properties |
| someday). In other words, the applied changes end up looking like |
| local modifications. |
| |
| [Actually, in a merge PATH1 usually does equal PATH2 -- in fact, |
| that's how it always is in CVS, in a sense. So I think |
| supporting the PATH1 != PATH2 case in merge should not be a high |
| priority. -kff] |
| |
| * In a switch, PATH1 does not necessarily equal PATH2, and we *do* |
| rewrite the working copy metadata (specifically, revisions are |
| bumped and URLs are changed). |
| |
| When doing a merge or switch, the user needs to specify at least one |
| of the two paths. There's a risk that the requested path may be |
| completely unrelated to the path represented by the working copy -- |
| and thus might result in seemingly random diffs and conflicts |
| everywhere (or in the worst case, a complete deletion and re-checkout |
| of the working copy!) Our plan is to add a heuristic to Subversion |
| that asks the question "are these two paths related in some way?" If |
| the test fails, the command aborts and the user receives a friendly |
| message: "PATH1 and PATH2 have no common ancestry. Are you *sure* |
| you want to apply this delta? If so, re-run the command with the |
| --force option." |
| |
| |
| Merging |
| ======= |
| |
| Merge is a special case of update, or rather, update is a special case |
| of merge. Simplifying things a bit: when we update, we take the |
| differences between path P at revision X versus P at revision Y, and |
| apply that difference to the working copy. Note that since P:X |
| reflects the working copy text bases exactly, the server can send |
| contextless diffs to bring the working copy to P:Y. (The |
| simplification here is that P:X is really a transaction reflecting the |
| working copy's revision mixture, and not necessarily corresponding |
| precisely to any single revision tree). |
| |
| When we merge, we take the differences between path P at revision X |
| (X:P) versus path Q at revision Y (Y:Q), and apply them to the working |
| copy. |
| |
| Thus, what distinguishes a merge from an update is that P != Q (is |
| there a symbol for "need not equal"? Maybe "P ?= Q"...) For that |
| matter, X ?= Y. |
| |
| X:P and Y:Q are two distinct trees, but in practice, they share a |
| common ancestor, so using the difference between them is not a |
| ridiculous idea. But note that svn_repos_dir_delta() is perfectly |
| content to express the difference between any two trees, related or |
| not. |
| |
| It is possible, indeed likely, that neither P:X nor Q:Y are an exact |
| reflection of the working copy bases, therefore context diffs are used |
| to facilitate merging. |
| |
| *** Implementation details *** |
| |
| Heh, two completely different possibilities here: |
| |
| 1. Only the Subversion client generates context diffs and applies them |
| (right now by running 'diff' and 'patch' externally.) Therefore, |
| the objective is to create *two* sets of fulltext files in some |
| client-side temporary area. The first fulltext set represents X:P, |
| and the second fulltext set represents Y:Q. The client then |
| compares the two sets, generates context diffs, and applies the |
| context diffs to the working copy's working files. |
| |
| The naive approach would be to just directly ask the server for |
| both sets of fulltexts. (We still consider this an option!) |
| |
| A more complex approach (which we'll attempt) is a network |
| optimization -- it's a way of creating both sets of fulltexts on |
| the client using minimal network traffic: |
| |
| * The client builds a transaction on the server that is a |
| "reflection" of the working-copy, mixed revisions and all. |
| |
| * The server sends a tree-delta between the reflection and X:P; |
| the client then applies these binary diffs to copies of the |
| working-copy's text-bases in order to reconstruct the fulltexts |
| of X:P. |
| |
| * The server sends a tree-delta between X:P and Y:Q; the client |
| then applies these binary diffs to copies of the X:P fulltexts |
| in order to reconstruct the fulltexts of Y:Q. |
| |
| And that's it! We have both sets of fulltexts. The client |
| generates context diffs between them and patches the working copy. |
| |
| As mentioned earlier, this process doesn't touch any working-copy |
| metadata in .svn/. Only the working files are patched, so the |
| differences appear as local modifications. At that point, the user |
| manually resolves any conflicts. |
| |
| |
| 2. What is the difference between these two commands? |
| |
| svn merge -rX:Y <URL> |
| svn diff -rX:Y <URL> | patch |
| |
| :-) ? If we have an extended patch format, supporting copies, |
| renames, deletes, and properties (like we've been planning), then |
| there isn't formally even any need for a "merge" command -- it's a |
| trivial wrapper around "svn diff" and patch. |
| |
| In other words, much of the work described in Plan 1 above has |
| already been done by Philip Martin in his diff editors. Maybe we |
| should just take advantage of that? There's still the issue of |
| recording metadata about the merge, but presumably that would come |
| from the extended patch format. |
| |
| Random thoughts from Karl: |
| |
| I do wonder if it's always desirable to merge properties anyway. |
| Most of the properties we have are subversion-specific, and when I |
| think of the kinds of merges I've done in the past, I can't think |
| of a case where having the property changes merge would be |
| desirable. Ooooh, but when we use the properties to record what |
| has been previously merged, then having them travel *with* the |
| changes is useful. For example: |
| |
| $ svn merge -r18:20 http://svn.collab.net/repos/branches/rel_1 |
| $ svn ci |
| ===> produces .../trunk/whatever/blah, revision 100 |
| |
| Then the next week: |
| |
| $ svn switch http://svn.collab.net/repos/branches/rel_2 |
| $ svn merge -r97:153 http://svn.collab.net/repos/trunk/whatever/blah |
| |
| In a situation like that, you want the rel_1 branch merge into |
| trunk to travel with the trunk changes you're now merging into |
| rel_2. |
| |
| |
| Switching |
| ========= |
| |
| Switching is a more general case of update: instead of comparing the |
| working-copy "reflection" to an identical path in some revision, the |
| server compares the reflection to some *arbitrary* path in some |
| revision. The user specifies the new path. |
| |
| The result of the operation is to effectively morph the working copy |
| into representing a different location in the tree. In theory, there |
| should be no way to tell the difference between a fresh checkout of |
| PATH2 and a working copy that was "switched" to PATH2. |
| |
| *** Implementation details *** |
| |
| As in update operations, the client begins by building a reflection of |
| working-copy state on the server. The client then specifies a new |
| path/revision pair as the target of the tree-delta. |
| |
| After the client finishes applying the delta, it needs to do a little |
| more work than update: besides bumping all working revisions to some |
| uniform value, it needs to rewrite all of the metadata URL ancestry as |
| well. |
| |
| ----------------------------------------------------------------------- |
| |
| |
| Interactions: A Brave New World |
| ================================ |
| |
| With the `svn switch' feature, we now have the potential to have |
| working copies with "disjoint" subdirs, that is, subdirs whose |
| repository url is not simply the subdir's parent's url plus the |
| subdir's entry name in the parent. For example: |
| |
| $ svn checkout http://svn.collab.net/repos/trunk -d svn |
| A ... |
| A svn/subversion/libsvn_wc |
| A svn/subversion/libsvn_fs |
| A svn/subversion/libsvn_repos |
| A svn/subversion/libsvn_delta |
| A ... |
| $ cd svn/subversion/libsvn_fs |
| $ svn switch http://svn.collab.net/repos/branches/blue/subversion/libsvn_fs |
| [...] |
| $ |
| |
| While svn/subversion/.svn/entries still has an entry for "libsvn_fs", |
| if you go into libsvn_fs and look at its own directory url, it is not |
| simply a child of the `subversion' directory url, but rather a |
| completely different url. We call this directory "disjoint". |
| |
| Commits, updates, merges, and further switch commands all need to deal |
| sanely with this scenario. |
| |
| We can assume that even disjoint urls are still all within the same |
| repository, because the parent of a disjoint child still has an entry |
| for that child, and all working copy walks are guided by entries. In |
| cases where there are wc subdirs from completely different |
| repositories, there is unlikely to be such entry linkage. [NOTE: We |
| will still be adding some extra information to the wc to make it |
| possible to check for the rare circumstance where the parent has an |
| entry for a subdir which (for whatever reason) is the result of a |
| checkout from a different repository. More on that later.] |
| |
| |
| Changes To The Commit Process: |
| ============================== |
| |
| Currently, the commit editor driver crawls the working copy, and sends |
| local modifications through the editor as it finds them. But we now |
| have to deal with disjoint urls in the working copy. Because editors |
| must be driven depth-first, we cannot send changes against these |
| disjoint urls as they are found -- instead, we must begin the edit |
| based on a common parent of all the urls involved in the commit. So |
| we must do a preliminary scan of the working copy, discovering all |
| local mods, collecting the urls for the mods, and then calculating the |
| common path prefix on which to base the edit. |
| |
| [NOTE: this increases the memory usage of commits by a small amount. |
| We formerly interleaved the discovering and sending of local mods, but |
| now discovery will happen first and produce a list of changed paths, |
| and then sending the changes will happen entirely after that. The |
| benefit is that we preserve commit atomicity even when branches are |
| present in the working copy... which is very important!] |
| |
| |
| Changes To The Update Process: |
| ============================== |
| |
| Currently, update builds a reflection of the working copy's state on |
| the server (the reflection is a Subversion transaction). Then the |
| server sends back a tree delta between the reflection and the desired |
| revision tree (usually the head revision, but whatever). The tree |
| delta is expressed by driving an svn_delta_edit_fns_t editor on the |
| client side. |
| |
| If there are disjoint subdirs in the working copy, the reflection |
| must, uh, reflect this. That's pretty easy: that subtree of the |
| transaction will simply point to the appropriate revision subtree |
| (implementation note: we'll need to add a new function to |
| svn_ra_reporter_t, allowing us to link arbitrary path/rev pairs into |
| the transaction.) |
| |
| But getting the reflection right isn't enough. The revision tree |
| we're comparing the reflection with doesn't have the special disjoint |
| subtree, so a lot of spurious differences would be sent to the client, |
| which the client would then have to ignore, presumably making a |
| separate connection later to update the disjoint subdir. This way |
| lies madness... or at least inefficiency. |
| |
| So instead, we'll create a *second*seaoe transaction, representing the |
| target of the update. In the plain update case, this transaction is |
| an exact copy of the revision (and perhaps we'll optimize out the txn |
| and just use the revision tree after all). But in the disjoint subdir |
| case, this second txn will also reflect the disjointedness. In other |
| words, when a disjoint directory D is discovered, it will be linked |
| into both txn trees -- in the reflection txn, D will be at whatever |
| revision(s) it is in the working copy, and in the target txn, it will |
| be at the target revision of the update. This way, the delta between |
| the reflection and target txns will apply cleanly to the working copy |
| (i.e., svn_repos_dir_delta() will just Do The Right Thing when invoked |
| on the two txns). Voila. |
| |
| |
| Changes to Switch and Merge Process: |
| ==================================== |
| |
| The switch process still needs to build a working-copy reflection that |
| contains possible "disjointed" subtrees. However, the second |
| target-transaction isn't needed at all. The server can send a delta |
| between the reflection and a "pure" path in some revision (presumably |
| the path that we're switching to.) |
| |
| If the disjointed subtree and the target path both happen to be part |
| of the same branch, then svn_repos_dir_delta() won't notice any |
| differences at all. Otherwise, the user should expect to have the |
| disjointed section of the working copy be "converted" to a new URL, |
| just like the rest of the working copy. |
| |
| In the case of merges, we continue to build a reflection that contains |
| disjointed subtrees. Again, no need for a second transaction. |
| Remember that the reflection is only being built as a shortcut to |
| cheaply construct fulltexts of X:P in the client. The structure of |
| the reflection is irrelevant; *any* reflection can be used as a basis |
| for sending a tree-delta that constructs X:P, no matter what |
| disjointed sections it has. (Although some reflections may be more |
| useful than others! In the worst case, if the reflection is |
| completely unrelated to X:P, then svn_repos_dir_delta() regresses into |
| sending fulltexts anyway.) |
| |