|  | Last updated: $Date: 2001/07/02 21:46:52 $ | 
|  |  | 
|  | Three things that need to happen in the filesystem: | 
|  |  | 
|  | a) Switch to a sane node key system (issue #654). | 
|  |  | 
|  | b) We need to operate on file contents without holding the entire | 
|  | contents in RAM.  Berkeley DB gives us tools to operate on | 
|  | regions within a value, we just need to use them. | 
|  |  | 
|  | c) We need to do reverse-delta storage in the filesystem (with | 
|  | checksums). | 
|  |  | 
|  | d) Record (potentially) multiple parents per change. | 
|  |  | 
|  | e) Implement atomic renames. | 
|  |  | 
|  | Some thoughts on them: | 
|  |  | 
|  |  | 
|  | a) Switch to a sane node key system (issue #654) | 
|  | ================================================ | 
|  |  | 
|  | For more background, read the archived dev list thread with subject | 
|  | "Maintaining NodeID sanity": | 
|  |  | 
|  | http://subversion.tigris.org/servlets/ReadMsg?msgId=72265&listName=dev | 
|  |  | 
|  | Here's the plan, mostly by Bill Tutt and Branko, with assists from | 
|  | Karl and Mike: | 
|  |  | 
|  | Note: | 
|  |  | 
|  | - This is described in terms of a BDB implementation.  Translate to | 
|  | RDB-speak in your head as necessary :-). | 
|  |  | 
|  | - This proposal supports one copy (a.k.a. branch) operation.  You | 
|  | can call it anything you want: "copy", "branch", "split", | 
|  | "pumpkin", whatever.  We're calling it "copy" :-).  It is the SCM | 
|  | branch operation. | 
|  |  | 
|  | First, a node's key consists of three parts: | 
|  |  | 
|  | nodeID.copyID.txnID | 
|  |  | 
|  | The "txnID" is really just a unique identifier, but we happened to | 
|  | inherit it from the fs txn, and we needed a name for that portion, | 
|  | so... :-) Also, the copyID could theoretically live in the node's | 
|  | value instead of in its key, but it feels right to put it in the | 
|  | key.  (How's that for a powerful argument?)  For nodes that are not | 
|  | copies, the copyID is just "0" or some other special value. | 
|  |  | 
|  | There are no more mutability flags -- mutability is determined by | 
|  | examining whether the node key's txnID matches the txn in question. | 
|  | Therefore, there is no stabilization walk at commit time. | 
|  |  | 
|  | When we commit a change to a node, the nodeID and copyID stay the | 
|  | same, only the txnID changes (actually there is a circumstance where | 
|  | the copyID can change, but more on that later).  The new txnID is not | 
|  | necessarily greater than the old one -- sometimes txns get committed | 
|  | out of order! -- but anyway it's different from the old txnID, and the | 
|  | same new txnID is used for all other changes in that commit. | 
|  |  | 
|  | [Greg and Karl disagree on whether to use integer types or `char *' | 
|  | for the parsed representation of IDs.  See the dev list thread | 
|  | that starts here for the details of this: | 
|  | http://subversion.tigris.org/servlets/ReadMsg?msgId=72277&listName=dev | 
|  | ] | 
|  |  | 
|  | After a commit, the txn record in the transactions table does not go | 
|  | away; instead, it is updated so it now maps the txnID to the new | 
|  | revision.  This allows us to determine the revision a node was | 
|  | committed in, in constant time, given the node's key. | 
|  |  | 
|  | Each new version of a node stores the node's predecessor (and does not | 
|  | store copyform history).  When node "5.0.fzb" is committed as a | 
|  | successor to "5.0.qnr", the new node's value stores a reference to | 
|  | "5.0.qnr". | 
|  |  | 
|  | What about copies? | 
|  |  | 
|  | As in the current fs, copies are shallow.  The top of the copied tree | 
|  | gets a new node, but the nodes below it are shared with the copy | 
|  | source.  The new node key keeps the same nodeID, but gets a new txnID, | 
|  | and gets the next unique copyID (replacing the current copyID, if | 
|  | any). | 
|  |  | 
|  | In the example below, we copy `A' to `Y'.  Node keys for `A', `Y', and | 
|  | `bar' are given in parentheses: | 
|  |  | 
|  | BEFORE THE COPY               AFTER THE COPY | 
|  | <root>                         <root> | 
|  | /  |                           /  |  \ | 
|  | /    |                         /    |    \ | 
|  | /      |                       /      |      \ | 
|  | A(3.0.m) B                     A(3.0.m) B       Y(3.jfb.p) | 
|  | / \      |                     / \      |      / \ | 
|  | foo  bar   qux                 foo  bar   qux   foo  bar | 
|  | (5.0.abc)                       (5.0.abc)        (5.0.abc) | 
|  |  | 
|  | Let's flesh out the example with some commits under A and Y.  To save | 
|  | space, the colons represent time flow, not directory hierarchy -- | 
|  | imagine they're the Z axis coming out of the screen or something :-). | 
|  |  | 
|  | <root> | 
|  | /  |  \ | 
|  | /    |    \ | 
|  | /      |      \ | 
|  | A(3.0.m)  B       Y(3.jfb.p) | 
|  | / \      |      / \ | 
|  | foo  bar   qux   foo  bar | 
|  | (5.0.abc)         (5.0.abc) | 
|  | :                 : | 
|  | :                 : | 
|  | (5.0.ejk)         (5.jfb.mht) | 
|  | :                 : | 
|  | :                 : | 
|  | (5.0.xyz)         (5.jfb.uuu) | 
|  | :                 : | 
|  | :                 : | 
|  | (5.0.r2d2)        (5.jfb.2pz) | 
|  | :                 : | 
|  | :                 : | 
|  | (5.0.c3po)        (5.jfb.rdt) | 
|  |  | 
|  | Let's see how easy it is to answer various questions in this system: | 
|  |  | 
|  | Obviously, is_related() is simple -- just check that the nodeID | 
|  | portion is the same.  You may not know if the relationship is cousins | 
|  | vs ancestor/descendant, but you know whether or not they're related. | 
|  |  | 
|  | Asking is_predecessor(A,B) is also easy.  Just fetch the predecessor | 
|  | pointer from B and see if it's A. | 
|  |  | 
|  | Finding out what revisions a node changed in is proportional to the | 
|  | number of changes the node has undergone: start at the node, walk back | 
|  | through its predecessors, and for each txnID, look up the revision | 
|  | number via the transactions table (as described earlier). | 
|  |  | 
|  | During this walk, you can always tell when you encounter a node that | 
|  | results from a copy, because the copyID portion will either change or | 
|  | disappear entirely.  When this happens, you know one of two things is | 
|  | true: either the previous node in the walk was the top of a copied | 
|  | tree, or *this* node (the one with the different copyID) was one of | 
|  | the unchanged nodes down inside a copied tree. | 
|  |  | 
|  | One might think "Oh, we'll distinguish between these cases by walking | 
|  | up the parents of the node, and seeing if we eventually encounter the | 
|  | old copyID in one of the parents.  That would tell us that we're in | 
|  | the second case.  If we never encounter it, that tells us we're in the | 
|  | first." | 
|  |  | 
|  | Not so fast, Spidey.  We don't have parent pointers -- this is a | 
|  | predecessor walk by node keys; we can't just walk up the parent path | 
|  | like that.  Fortunately, copyIDs are generated from a new `copies' | 
|  | table, which maps unique copyIDs onto (REV COPY_DST_PATH | 
|  | COPY_DST_NODEKEY).  We look up the rev/path for the old copyID, | 
|  | convert it to a node key, and compare it to the node key we're | 
|  | currently on.  VoilĂ !  Actually, we're not sure we'll store all of | 
|  | those in the copies table, it may boil down to just the DST_NODEKEY or | 
|  | just the other two, we'll see. | 
|  |  | 
|  | Writing those predecessor walk loops well is left as an exercise for | 
|  | the reader (umm, more likely for the writers, heh), but you can see | 
|  | that the necessary questions can be answered efficiently. | 
|  |  | 
|  | Note that, like txnIDs, copyIDs are just unique numbers.  They may be | 
|  | increasing monotonically in the `copies' table, but (due to the fact | 
|  | that txn A may be started before txn B yet be committed afterwards) | 
|  | it's quite possible that a higher copyID will become visible in the | 
|  | revision history before a lower one. | 
|  |  | 
|  | The one thing we can know is that a lower copyID can never be a | 
|  | branchwise descendant of a lower copyID, since the lower one must have | 
|  | been committed before any of its descendants txns were started, of | 
|  | course.  I'm not sure this minimal inference will ever be useful, but | 
|  | anyway it's all we've got.  Anyway, right now, we only ever need ask | 
|  | if two copyIDs are equal -- we don't order them. | 
|  |  | 
|  | Okay, now what if A already had copied trees underneath it when we | 
|  | copied it to Y?  Suppose `bar' is down in one of those subdirectories? | 
|  |  | 
|  | Then when we're committing on /Y/.../bar, we watch for copyIDs as we | 
|  | walk down from root, like usual, and if we're in a copy underneath a | 
|  | copy, we bubble down a _new_ copyID, distinct from both Y's and B's, | 
|  | starting from that point.  Justification: a branch of a branch is | 
|  | still a branch, so it gets its own copyID. | 
|  |  | 
|  | At this point, I'm going to hand-wave on describing the | 
|  | copy-under-copy behavior any further.  I think the above is enough to | 
|  | see that there are no insurmountable problems here, and that the | 
|  | filesystem will now have node keys that [lazily] reflect actual | 
|  | branching patterns. | 
|  |  | 
|  | A few more notes: | 
|  |  | 
|  | The is_ancestor(X,Y) question is really only asked during merge(), | 
|  | that is, during commits.  If entry "somedir/blah" has NodeID Y in the | 
|  | txn being committed, and NodeID X in head tree being merged into that | 
|  | txn, then we need to make sure X is an ancestor of Y, so that when the | 
|  | commit replaces X with Y, we know we're not losing any history. | 
|  |  | 
|  | Therefore, we think we can get away with storing the ancestors in a | 
|  | distributed fashion, as a chain: each node knows its immediate | 
|  | predecessor (or "precessors", in the future), and you walk back until | 
|  | you have your answer.  In real life, we won't usually be walking back | 
|  | too far, and the search can be further bounded by the revision range | 
|  | implied by the two nodes.  If profiling proves these walks to be a | 
|  | bottleneck, we can start caching the results of such walks in a new | 
|  | table whose keys are node keys, and values are _full_ ancestor lists. | 
|  |  | 
|  |  | 
|  |  | 
|  | b) Operate on portions of files efficiently. | 
|  | ============================================ | 
|  |  | 
|  | [still pondering this section] | 
|  |  | 
|  | We should take advantage of Berkeley DB's partial record stuff, all | 
|  | the way up to the top of the svn fs interfaces. | 
|  |  | 
|  | - dag_node_t gets two new fields: contents_offset and contents_len. | 
|  | They apply to the node's cache of the contents, not the header or | 
|  | proplist. | 
|  |  | 
|  | - svn_fs__get_node_rev_contents() takes offset and len arguments, | 
|  | fetches only that data.  The dag_node_t will remember the offset | 
|  | and len. | 
|  |  | 
|  | - svn_fs__put_node_rev_contents() takes offset and len args as | 
|  | well. | 
|  |  | 
|  | - change set_node_revision() accordingly. | 
|  |  | 
|  | - ... todo thinking here ... | 
|  |  | 
|  | So now, whenever you read or write a node revision, you are operating | 
|  | on a range.  There will be some way to say "I mean the whole thing", | 
|  | of course, so it won't be necessary to know the size in advance. | 
|  |  | 
|  | Thought: possibly we should stop storing data in the dag_node_t | 
|  | itself, and just return the data in a void pointer passed to | 
|  | svn_fs__get_node_rev_contents().  Still pondering. | 
|  |  | 
|  |  | 
|  |  | 
|  | c) Reverse-delta storage. | 
|  | ========================= | 
|  |  | 
|  | The naive way to recover an old text is: | 
|  |  | 
|  | retrieve_node_rev (N) | 
|  | { | 
|  | grab_node_revision (&N); | 
|  |  | 
|  | if (is_fulltext (N)) | 
|  | return N; | 
|  | else if (is_shared (N)) | 
|  | return retrieve_node_rev (get_sharee (N)); | 
|  | else if (is_svndiff (N)) | 
|  | return svnpatch (get_svndiff (N), retrieve_node_rev (get_base (N))) | 
|  | } | 
|  |  | 
|  | (Loose pseudo-code, obviously, and the recursion could be a loop, but | 
|  | you get the idea.) | 
|  |  | 
|  | The trouble with this is that it constructs and patches each | 
|  | intermediate revision.  That'll probably be i/o bound, and anyway much | 
|  | of the intermediate material may not end up in the final target, in | 
|  | which case reconstructing it was a waste of time. | 
|  |  | 
|  | What we really want is a way to compose a bunch of svndiffs, and then | 
|  | apply that composition to the current head, to give us the older | 
|  | revision in one step (well, one application anyway).  Both the | 
|  | composition and the final application need to happen in a streamy or | 
|  | windowed fashion -- we shouldn't have to hold the entire diff in | 
|  | memory, nor the entire source, nor the target. | 
|  |  | 
|  | Here's a way to do this: | 
|  |  | 
|  | An svndiff is a series of instructions that are followed to | 
|  | reconstruct the target.  There are three possible instructions: | 
|  |  | 
|  | a) Insert X bytes of new text into the TARGET, and I'm giving you | 
|  | those bytes right here. | 
|  | b) Copy N bytes, starting from offset F in the SOURCE, into the | 
|  | TARGET. | 
|  | c) Copy N bytes, starting from offset F in the TARGET, into the | 
|  | TARGET. | 
|  |  | 
|  | (Note that (c) can actually run past the current end of the target, as | 
|  | long by the time you get there, the target is longer.) | 
|  |  | 
|  | To compose two svndiffs... | 
|  |  | 
|  | ...and I hate to tantalize you, but I'm late and have to run now, | 
|  | so I'll try to finish this tomorrow... crimminy... The quick | 
|  | summary is, we build a new svndiff (the composition of all the | 
|  | intermediates), and as it gets too large, we windowify as we go | 
|  | and put each window temporarily in the database; this makes the | 
|  | composition as a whole less efficient, but means that at any | 
|  | given time we don't have to have the whole thing in memory.  The | 
|  | arithmetic for offset-adjustment is fairly straightforward even | 
|  | when one has to offload windows, I believe.  It's nice that the | 
|  | source is already in the db and we get offset+length style | 
|  | operations from Berkeley naturally anyway.  Branko or anyone, | 
|  | feel free to continue this recipe and see if you can take it | 
|  | somewhere before I get in tomorrow morning...  -kff | 
|  |  | 
|  |  | 
|  | ------------------------------------------------------------------------ | 
|  | Notes from JimB about optimizing the in-repository delta generation | 
|  | to make deltas that can be composed more quickly: | 
|  |  | 
|  | I talked about this with Karl on the phone, and gave pretty bad | 
|  | explanations of my thinking; I'll try to do better today.  This will | 
|  | also provide some background for other readers. | 
|  |  | 
|  | I'm told that RCS reconstructs older file revisions by walking the | 
|  | list of diffs, ignoring the actual text, and simply recording the line | 
|  | numbers and sizes of the substitutions.  Then, it can run over that | 
|  | data and do arithmetic to construct a single `composed' diff against | 
|  | the youngest revision would reconstructs the older revision.  Then, | 
|  | you apply that composed diff to get the revision you wanted. | 
|  |  | 
|  | For example, if your fulltext is: | 
|  |  | 
|  | rev 3:  abcdefghijklmnopqrst | 
|  |  | 
|  | and your deltas are: | 
|  |  | 
|  | rev 2:  replace 3--6 with "howdy"    (yielding abchowdyghijklmnopqrst) | 
|  | rev 1:  replace 6--8 with " are you" (yielding abchow are youghijklmnopqrst) | 
|  |  | 
|  | then the RCS algorithm would gather this info: | 
|  |  | 
|  | rev 2:  replace 3--6 with 5 chars (call them X) | 
|  | rev 1:  replace 6--8 with 8 chars (call them Y) | 
|  |  | 
|  | Now, without looking at any of the text, it can compose the two deltas | 
|  | to get the following delta, yielding rev 1: | 
|  |  | 
|  | replace 3--6 with range 0--3 from X | 
|  | replace 6--6 with range 0--8 from Y (i.e. insert Y at 6) | 
|  |  | 
|  | If we then apply this `composed' delta to the original text, we get: | 
|  |  | 
|  | abchow are youghijklmnopqrst | 
|  |  | 
|  | The point of all this is that you don't need to mess around with | 
|  | actual text until the very end.  Until that point, the amount of work | 
|  | depends on the amount of change, not on the amount of text involved. | 
|  | And when you do get around to actually assembling text, the amount of | 
|  | work depends on the size of the output file --- because you're only | 
|  | touching each piece of text once --- and only weakly on the amount of | 
|  | change. | 
|  |  | 
|  | Our current svndiff format frustrates this somewhat by compressing the | 
|  | new text, as well as pulling in pieces of the original.  The | 
|  | compression process produces a lot of delta ops that deal with small | 
|  | pieces of text.  (Or at least, I expect it does...)  So even if the | 
|  | change is something simple --- replacing a single block of text in the | 
|  | middle of the file, say --- you end up with an svndiff with a lot of | 
|  | ops, mostly concerned with building the replacement text from pieces | 
|  | of itself.  This is great, except that having lots of small ops | 
|  | increases the amount of work the delta composition phase needs to do. | 
|  | In fact, if the ops usually deal with really small pieces of text --- | 
|  | a few dozen bytes or so --- I expect it'd be faster to just throw the | 
|  | actual text around.  Memcpy is pretty optimized on real systems; you | 
|  | could copy a lot of bytes in the time it would take to do funky | 
|  | intersections and adjustments on a list of ops talking about those | 
|  | bytes, so those ops had better refer to large blocks of bytes. | 
|  |  | 
|  | I'm not sure what to do with that.  It almost seems like we want the | 
|  | text delta computation algorithm to optimize deltas for network | 
|  | transmission and deltas for storage differently. | 
|  |  | 
|  |  | 
|  | ------------------------------------------------------------------------ | 
|  | Notes from Brane: Delta composition | 
|  |  | 
|  | Despite JimB's concerns, it turns out that delta composition is | 
|  | straight-forward. The basic idea is that combining two deltas is | 
|  | equivalent to applying the second delta to a representation of the | 
|  | first delta's result. Bear with me while I shamelessly abuse what | 
|  | little I remember about linear algebra. | 
|  |  | 
|  | Given two deltas, A and B, and the source stream S, we want to | 
|  | construct a composed delta, AB, that converts S to the target stream | 
|  | T. Let's borrow notation from linear algebra and write this | 
|  | transformation like this: | 
|  |  | 
|  | T = AB(S) | 
|  |  | 
|  | Abusing algebra some more, I'll assume that a delta behaves like any | 
|  | other linear transformatsion; therefore, | 
|  |  | 
|  | AB(S) = B(A(S)) | 
|  |  | 
|  | and since I'm not about to develop any rigorous proofs here, I'll | 
|  | just say that it follows from the above that | 
|  |  | 
|  | T = B(S'), where S' = A(S) | 
|  |  | 
|  | A small note here: Every delta is actually an n-tuple of delta | 
|  | opserations, represented by what I'll call the delta operators [n] | 
|  | (for new data), [s] (for source copies) and [t] (for target | 
|  | copies). [s] and [t] create bits of the target stream by operating on | 
|  | contiguous parts of the source and (existing) target stream, | 
|  | respectively; while [n] does the same by operating on raw data. | 
|  |  | 
|  |  | 
|  | Now, what we actually want to do is derive some form of AB (which, by | 
|  | the way, does not have a unique representation, sice we're not trying | 
|  | to find the optimal ("normalized") transform) that doesn't in any way | 
|  | rely on the value of S'. We do that by building a representation of S' | 
|  | that relies only on S, and any new data introduced by the [n] | 
|  | operators in A. That's possible because any [t] ops in A merely copy | 
|  | parts of S' that have been previously defined by [s] and [n] | 
|  | ops. Transforming A by (recursively) replacing all [t] ops with the | 
|  | equivalent [s] and [n] ops gives us exactly such a representation, | 
|  | which I'll call A'. [*] | 
|  |  | 
|  | Building AB from B and A' is trivial: traversing the list of delta ops | 
|  | in B, copy all [n] and [t] ops into the result; for [s] ops, copy the | 
|  | range of ops from A' that define the appropriate range in S'. For some | 
|  | of the copies, the first or last op from the range in A' will have to | 
|  | be split, and the first op in the copy range can sometimes be merged | 
|  | with the previous op in AB. | 
|  |  | 
|  |  | 
|  | Of course, stopping here could give a very sub-optimal AB, because it | 
|  | could contain many duplicate copies of the same op ranges from A'. We | 
|  | fix this by doing exactly the opposite transformation than A->A': by | 
|  | transforming [s] ops from B into [t] ops. We do this by recording the | 
|  | source and target of each copy from A' to AB and, whenever the [s] | 
|  | from B describes a range in T that was already defined, converting | 
|  | that into a [t] instead [**]. Unlike the A->A' transform, we can't remove | 
|  | all copies from A' to AB (we can only do that when AB doesn't refer to | 
|  | S at all), but we can significantly reduce the number of such copies. | 
|  |  | 
|  | The resulting AB will usually not be the optimal delta from S to T, | 
|  | because we will never actually look at S while constructing AB. | 
|  |  | 
|  |  | 
|  | Summarizing the above, we get the following delta composition | 
|  | algorithm: | 
|  |  | 
|  | ;; X is the set of byte ranges in T defined by copies from S' | 
|  | ;; Y is the current offset in T, defined by the ops in AB | 
|  | foreach OP in B: | 
|  | if (OP = [t]) or (OP = [n]): | 
|  | copy OP to AB | 
|  | else | 
|  | R = (set of ops from A that define S'[OP.offset, OP.length]) | 
|  | [**]    using X, find the optimal set O of [t] ops to replace part of R | 
|  | foreach OP' in R: | 
|  | if OP' is in (R - O): | 
|  | if (OP' = [t]): | 
|  | [*]           replace OP' with equivalent [s] and [n] ops | 
|  | copy OP' to AB | 
|  | else | 
|  | [**]        copy OP's equivalent in O to AB | 
|  | insert S'[OP.offset, OP.length]->[Y, OP.length] into X | 
|  |  | 
|  |  | 
|  | This algorithm ignores details such as op splitting and merging, | 
|  | ensuring every op from O gets copied exactly once, helper indexes, | 
|  | etc..., but those are implementation details. | 
|  |  | 
|  |  | 
|  | ------------------------------------------------------------------------ | 
|  | Notes from Brane: Delta storage | 
|  |  | 
|  | O.K., now that delta composition is out of the way, let's talk a bit | 
|  | about storing and retrieving deltified text from the filesystem. I'll | 
|  | just jot down a few thoughts about how this could be done, assuming | 
|  | one of two possible models of storage management in the filesystem: | 
|  |  | 
|  | a) all data store and retrieve operations are synchronous; or, | 
|  | b) deltification can run in the background. | 
|  |  | 
|  |  | 
|  | 1) Storing new data | 
|  |  | 
|  | New data arrives either as fulltext (from add) or as a delta from an | 
|  | existing version (copy, modify). | 
|  |  | 
|  | a) if the new data is a delta, convert it to fulltext (possibly | 
|  | combining several existing deltas in the process). Store the | 
|  | fulltext in the tip, and replace the previous tip (which, by | 
|  | induction, contains fulltext) with a delta from the current tip. | 
|  |  | 
|  | b) store the fulltext or delta in the tip and mark it for async | 
|  | modification. Do a) in the background. | 
|  |  | 
|  | 2) Retrieving old data | 
|  |  | 
|  | a) If the data is a delta, follow the delte references (combining | 
|  | the deltas) until a fulltext is found; apply the combined delta | 
|  | to get the required fulltext. | 
|  |  | 
|  | If the combined delta reduces to a no-op (the two fulltexts are | 
|  | the same), store the fulltext in the younger of the two nodes and | 
|  | replace the older node's data with a "same" note. | 
|  |  | 
|  | b) Same as para 1 of a), then mark the node for async | 
|  | modification. In the background, find the diff between the two | 
|  | fulltexts. If they're equal, do para 2 of a). Otherwise, if the | 
|  | diff is smaller than the current diff in the node, replace the | 
|  | current representation. ("Smaller" could be construed as "more | 
|  | optimal" -- it would make sense to take into account the number | 
|  | of delta combinations that could be skipped by replacing the | 
|  | current representation when comparing sizes.) | 
|  |  | 
|  |  | 
|  | d) Multiple parents per change | 
|  | ============================== | 
|  |  | 
|  | This is necessary for -- at least -- accurrate Merge Tracking, to | 
|  | allow for accurate calculation of change set graph.  Use cases | 
|  | include: | 
|  |  | 
|  | 1) Avoiding repeated merges when performing cyclic merging | 
|  | (e.g branch A -> B -> A). | 
|  |  | 
|  | 2) Sussing explicit merge info when a change in merge info occurs | 
|  | during a copy operation (e.g. to avoid paying attention to implied | 
|  | merge info from the copy source). | 
|  |  | 
|  | Mercurial (hg) does this. | 
|  |  | 
|  |  | 
|  | e) Atomic renames | 
|  | ================= | 
|  |  | 
|  | This may just be a means to an end?  Mercurial (hg) gets by without | 
|  | this, but this may be due to its distributed repository implementation. | 
|  |  | 
|  | It seems we can handle a lot of the desired use cases (see | 
|  | notes/tree-conflicts.txt) without true renames. | 
|  |  | 
|  | Exploratory work has been started on the fs-atomic-renames branch. |