| Last updated: $Date: 2001/07/02 21:46:52 $ |
| |
| Two things that need to happen in the filesystem: |
| |
| a) We need to operate on file contents without holding the entire |
| contents in RAM. Berkeley DB gives us tools to operate on |
| regions within a value, we just need to use them. |
| |
| b) We need to do reverse-delta storage in the filesystem (with |
| checksums). |
| |
| Some thoughts on them: |
| |
| a) Operate on portions of files efficiently. |
| ============================================ |
| |
| [still pondering this section] |
| |
| We should take advantage of Berkeley DB's partial record stuff, all |
| the way up to the top of the svn fs interfaces. |
| |
| - dag_node_t gets two new fields: contents_offset and contents_len. |
| They apply to the node's cache of the contents, not the header or |
| proplist. |
| |
| - svn_fs__get_node_rev_contents() takes offset and len arguments, |
| fetches only that data. The dag_node_t will remember the offset |
| and len. |
| |
| - svn_fs__put_node_rev_contents() takes offset and len args as |
| well. |
| |
| - change set_node_revision() accordingly. |
| |
| - ... todo thinking here ... |
| |
| So now, whenever you read or write a node revision, you are operating |
| on a range. There will be some way to say "I mean the whole thing", |
| of course, so it won't be necessary to know the size in advance. |
| |
| Thought: possibly we should stop storing data in the dag_node_t |
| itself, and just return the data in a void pointer passed to |
| svn_fs__get_node_rev_contents(). Still pondering. |
| |
| |
| b) Reverse-delta storage. |
| ========================= |
| |
| The naive way to recover an old text is: |
| |
| retrieve_node_rev (N) |
| { |
| grab_node_revision (&N); |
| |
| if (is_fulltext (N)) |
| return N; |
| else if (is_shared (N)) |
| return retrieve_node_rev (get_sharee (N)); |
| else if (is_svndiff (N)) |
| return svnpatch (get_svndiff (N), retrieve_node_rev (get_base (N))) |
| } |
| |
| (Loose pseudo-code, obviously, and the recursion could be a loop, but |
| you get the idea.) |
| |
| The trouble with this is that it constructs and patches each |
| intermediate revision. That'll probably be i/o bound, and anyway much |
| of the intermediate material may not end up in the final target, in |
| which case reconstructing it was a waste of time. |
| |
| What we really want is a way to compose a bunch of svndiffs, and then |
| apply that composition to the current head, to give us the older |
| revision in one step (well, one application anyway). Both the |
| composition and the final application need to happen in a streamy or |
| windowed fashion -- we shouldn't have to hold the entire diff in |
| memory, nor the entire source, nor the target. |
| |
| Here's a way to do this: |
| |
| An svndiff is a series of instructions that are followed to |
| reconstruct the target. There are three possible instructions: |
| |
| a) Insert X bytes of new text into the TARGET, and I'm giving you |
| those bytes right here. |
| b) Copy N bytes, starting from offset F in the SOURCE, into the |
| TARGET. |
| c) Copy N bytes, starting from offset F in the TARGET, into the |
| TARGET. |
| |
| (Note that (c) can actually run past the current end of the target, as |
| long by the time you get there, the target is longer.) |
| |
| To compose two svndiffs... |
| |
| ...and I hate to tantalize you, but I'm late and have to run now, |
| so I'll try to finish this tomorrow... crimminy... The quick |
| summary is, we build a new svndiff (the composition of all the |
| intermediates), and as it gets too large, we windowify as we go |
| and put each window temporarily in the database; this makes the |
| composition as a whole less efficient, but means that at any |
| given time we don't have to have the whole thing in memory. The |
| arithmetic for offset-adjustment is fairly straightforward even |
| when one has to offload windows, I believe. It's nice that the |
| source is already in the db and we get offset+length style |
| operations from Berkeley naturally anyway. Branko or anyone, |
| feel free to continue this recipe and see if you can take it |
| somewhere before I get in tomorrow morning... -kff |
| |
| |
| ------------------------------------------------------------------------ |
| Notes from JimB about optimizing the in-repository delta generation |
| to make deltas that can be composed more quickly: |
| |
| I talked about this with Karl on the phone, and gave pretty bad |
| explanations of my thinking; I'll try to do better today. This will |
| also provide some background for other readers. |
| |
| I'm told that RCS reconstructs older file revisions by walking the |
| list of diffs, ignoring the actual text, and simply recording the line |
| numbers and sizes of the substitutions. Then, it can run over that |
| data and do arithmetic to construct a single `composed' diff against |
| the youngest revision would reconstructs the older revision. Then, |
| you apply that composed diff to get the revision you wanted. |
| |
| For example, if your fulltext is: |
| |
| rev 3: abcdefghijklmnopqrst |
| |
| and your deltas are: |
| |
| rev 2: replace 3--6 with "howdy" (yielding abchowdyghijklmnopqrst) |
| rev 1: replace 6--8 with " are you" (yielding abchow are youghijklmnopqrst) |
| |
| then the RCS algorithm would gather this info: |
| |
| rev 2: replace 3--6 with 5 chars (call them X) |
| rev 1: replace 6--8 with 8 chars (call them Y) |
| |
| Now, without looking at any of the text, it can compose the two deltas |
| to get the following delta, yielding rev 1: |
| |
| replace 3--6 with range 0--3 from X |
| replace 6--6 with range 0--8 from Y (i.e. insert Y at 6) |
| |
| If we then apply this `composed' delta to the original text, we get: |
| |
| abchow are youghijklmnopqrst |
| |
| The point of all this is that you don't need to mess around with |
| actual text until the very end. Until that point, the amount of work |
| depends on the amount of change, not on the amount of text involved. |
| And when you do get around to actually assembling text, the amount of |
| work depends on the size of the output file --- because you're only |
| touching each piece of text once --- and only weakly on the amount of |
| change. |
| |
| Our current svndiff format frustrates this somewhat by compressing the |
| new text, as well as pulling in pieces of the original. The |
| compression process produces a lot of delta ops that deal with small |
| pieces of text. (Or at least, I expect it does...) So even if the |
| change is something simple --- replacing a single block of text in the |
| middle of the file, say --- you end up with an svndiff with a lot of |
| ops, mostly concerned with building the replacement text from pieces |
| of itself. This is great, except that having lots of small ops |
| increases the amount of work the delta composition phase needs to do. |
| In fact, if the ops usually deal with really small pieces of text --- |
| a few dozen bytes or so --- I expect it'd be faster to just throw the |
| actual text around. Memcpy is pretty optimized on real systems; you |
| could copy a lot of bytes in the time it would take to do funky |
| intersections and adjustments on a list of ops talking about those |
| bytes, so those ops had better refer to large blocks of bytes. |
| |
| I'm not sure what to do with that. It almost seems like we want the |
| text delta computation algorithm to optimize deltas for network |
| transmission and deltas for storage differently. |
| |
| |
| ------------------------------------------------------------------------ |
| Notes from Brane: Delta composition |
| |
| Despite JimB's concerns, it turns out that delta composition is |
| straight-forward. The basic idea is that combining two deltas is |
| equivalent to applying the second delta to a representation of the |
| first delta's result. Bear with me while I shamelessly abuse what |
| little I remember about linear algebra. |
| |
| Given two deltas, A and B, and the source stream S, we want to |
| construct a composed delta, AB, that converts S to the target stream |
| T. Let's borrow notation from linear algebra and write this |
| transformation like this: |
| |
| T = AB(S) |
| |
| Abusing algebra some more, I'll assume that a delta behaves like any |
| other linear transformatsion; therefore, |
| |
| AB(S) = B(A(S)) |
| |
| and since I'm not about to develop any rigorous proofs here, I'll |
| just say that it follows from the above that |
| |
| T = B(S'), where S' = A(S) |
| |
| A small note here: Every delta is actually an n-tuple of delta |
| opserations, represented by what I'll call the delta operators [n] |
| (for new data), [s] (for source copies) and [t] (for target |
| copies). [s] and [t] create bits of the target stream by operating on |
| contiguous parts of the source and (existing) target stream, |
| respectively; while [n] does the same by operating on raw data. |
| |
| |
| Now, what we actually want to do is derive some form of AB (which, by |
| the way, does not have a unique representation, sice we're not trying |
| to find the optimal ("normalized") transform) that doesn't in any way |
| rely on the value of S'. We do that by building a representation of S' |
| that relies only on S, and any new data introduced by the [n] |
| operators in A. That's possible because any [t] ops in A merely copy |
| parts of S' that have been previously defined by [s] and [n] |
| ops. Transforming A by (recursively) replacing all [t] ops with the |
| equivalent [s] and [n] ops gives us exactly such a representation, |
| which I'll call A'. [*] |
| |
| Building AB from B and A' is trivial: traversing the list of delta ops |
| in B, copy all [n] and [t] ops into the result; for [s] ops, copy the |
| range of ops from A' that define the appropriate range in S'. For some |
| of the copies, the first or last op from the range in A' will have to |
| be split, and the first op in the copy range can sometimes be merged |
| with the previous op in AB. |
| |
| |
| Of course, stopping here could give a very sub-optimal AB, because it |
| could contain many duplicate copies of the same op ranges from A'. We |
| fix this by doing exactly the opposite transformation than A->A': by |
| transforming [s] ops from B into [t] ops. We do this by recording the |
| source and target of each copy from A' to AB and, whenever the [s] |
| from B describes a range in T that was already defined, converting |
| that into a [t] instead [**]. Unlike the A->A' transform, we can't remove |
| all copies from A' to AB (we can only do that when AB doesn't refer to |
| S at all), but we can significantly reduce the number of such copies. |
| |
| The resulting AB will usually not be the optimal delta from S to T, |
| because we will never actually look at S while constructing AB. |
| |
| |
| Summarizing the above, we get the following delta composition |
| algorithm: |
| |
| ;; X is the set of byte ranges in T defined by copies from S' |
| ;; Y is the current offset in T, defined by the ops in AB |
| foreach OP in B: |
| if (OP = [t]) or (OP = [n]): |
| copy OP to AB |
| else |
| R = (set of ops from A that define S'[OP.offset, OP.length]) |
| [**] using X, find the optimal set O of [t] ops to replace part of R |
| foreach OP' in R: |
| if OP' is in (R - O): |
| if (OP' = [t]): |
| [*] replace OP' with equivalent [s] and [n] ops |
| copy OP' to AB |
| else |
| [**] copy OP's equivalent in O to AB |
| insert S'[OP.offset, OP.length]->[Y, OP.length] into X |
| |
| |
| This algorithm ignores details such as op splitting and merging, |
| ensuring every op from O gets copied exactly once, helper indexes, |
| etc..., but those are implementation details. |
| |
| |
| ------------------------------------------------------------------------ |
| Notes from Brane: Delta storage |
| |
| O.K., now that delta composition is out of the way, let's talk a bit |
| about storing and retrieving deltified text from the filesystem. I'll |
| just jot down a few thoughts about how this could be done, assuming |
| one of two possible models of storage management in the filesystem: |
| |
| a) all data store and retreive operations are synchronous; or, |
| b) deltification can run in the background. |
| |
| |
| 1) Storing new data |
| |
| New data arrives either as fulltext (from add) or as a delta from an |
| existing version (copy, modify). |
| |
| a) if the new data is a delta, convert it to fulltext (possibly |
| combining several existing deltas in the process). Store the |
| fulltext in the tip, and replace the previous tip (which, by |
| induction, contains fulltext) with a delta from the current tip. |
| |
| b) store the fulltext or delta in the tip and mark it for async |
| modification. Do a) in the background. |
| |
| 2) Retreiving old data |
| |
| a) If the data is a delta, follow the delte references (combining |
| the deltas) until a fulltext is found; apply the combined delta |
| to get the required fulltext. |
| |
| If the combined delta reduces to a no-op (the two fulltexts are |
| the same), store the fulltext in the younger of the two nodes and |
| replace the older node's data with a "same" note. |
| |
| b) Same as para 1 of a), then mark the node for async |
| modification. In the background, find the diff between the two |
| fulltexts. If they're equal, do para 2 of a). Otherwise, if the |
| diff is smaller than the current diff in the node, replace the |
| current representation. ("Smaller" could be construed as "more |
| optimal" -- it would make sense to take into account the number |
| of delta combinations that could be skipped by replacing the |
| current representation when comparing sizes.) |