|  |  | 
|  | Description of the NODES table | 
|  | ============================== | 
|  |  | 
|  |  | 
|  | * Introduction | 
|  | * Inclusion of BASE nodes | 
|  | * Rows to store state | 
|  | * Ordering rows into layers | 
|  | * Visibility of multiple op_depth rows | 
|  | * Restructuring the tree means adding rows | 
|  | * Copies of mixed-revision subtrees become multiple layers | 
|  | * In a deleted subtree, all nodes get marked deleted explicitly | 
|  | * Nodes in a replaced subtree can have different presence values | 
|  | * Presence values of nodes in partially overlapping replacements | 
|  | * Status needs to consult the *two* topmost layers - sometimes | 
|  |  | 
|  |  | 
|  | Introduction | 
|  | ------------ | 
|  |  | 
|  | The entire original design of wc-ng evolves around the notion that | 
|  | there are a number of states in a working copy, each of which needs | 
|  | to be managed.  All operations - excluding merge - operate on three | 
|  | trees: BASE, WORKING and ACTUAL. | 
|  |  | 
|  | For an in-depth description of what each means, the reader is referred | 
|  | to other documentation, also in the notes/ directory.  In short, BASE | 
|  | is what was checked out from the repository; WORKING includes | 
|  | tree restructuring; while ACTUAL also includes changes to properties | 
|  | and file contents. | 
|  |  | 
|  | The idea that there are three trees works - mostly. There is no need | 
|  | for more trees outside the area of the metadata administration and even | 
|  | then three trees got us pretty far.  The problem starts when one realizes | 
|  | tree modifications can be overlapping or layered. Imagine a tree with | 
|  | a replaced subtree.  It's possible to replace a subtree within the | 
|  | replacement.  Imagine that happened and that the user wants to revert | 
|  | one of the replacements.  Given a 'flat' system, with just enough columns | 
|  | in the database to record the 'old' and 'new' information per node, a single | 
|  | revert can be supported.  However, in the example with the double | 
|  | replacement above, that would mean it's impossible to revert one of the | 
|  | two replacements: either there's not enough information in the deepest | 
|  | replacement to execute the highest level replacement or vice versa | 
|  | - depending on which information was selected to be stored in the "new" | 
|  | columns. | 
|  |  | 
|  | The NODES table is the answer to this problem: instead of having a single | 
|  | row in a table with WORKING nodes with just enough columns to record | 
|  | (as per the example) a replacement, the solution is to record different | 
|  | layers of tree restructuring by having multiple rows. | 
|  |  | 
|  |  | 
|  |  | 
|  | Inclusion of BASE nodes | 
|  | ----------------------- | 
|  |  | 
|  | The original technical design of wc-ng included a WORKING_NODE and a | 
|  | BASE_NODE table.  As described in the introduction, the WORKING_NODE | 
|  | table was replaced with NODES.  However, the BASE_NODE table stores | 
|  | roughly the same state information that WORKING_NODE did.  Additionally, | 
|  | in a number of situations, the system isn't interested in the type of | 
|  | state it gets returned (BASE or WORKING) - it just wants the latest. | 
|  |  | 
|  | As a result the BASE_NODE table has been integrated into the NODES | 
|  | table. | 
|  |  | 
|  | The main difference between the WORKING_NODE and BASE_NODE tables was | 
|  | that the BASE_NODE table contained a few caching fields which are | 
|  | not relevant to WORKING_NODE.  Moving those to a separate table was | 
|  | determined to be wasteful because the primary key of that table | 
|  | would be much larger than any information stored in it in the first | 
|  | place. | 
|  |  | 
|  |  | 
|  |  | 
|  | Rows to store state | 
|  | ------------------- | 
|  |  | 
|  | Rows of the NODES table store state of nodes in the BASE tree | 
|  | and the layers in the WORKING tree.  Note that these nodes do not | 
|  | need to exist in the working copy presented to the user: they may | 
|  | be 'absent', 'not-present' or just removed (rm) without using | 
|  | Subversion commands. | 
|  |  | 
|  | A row contains information linking to the repository, if the node | 
|  | was received from a repository.  This reference may be a link to | 
|  | the original nodes for copied or moved nodes, but for rows designating | 
|  | BASE state, they refer to the repository location which was checked | 
|  | out from. | 
|  |  | 
|  | Additionally, the rows contain information about local modifications | 
|  | such as copy, move or delete operations. | 
|  |  | 
|  |  | 
|  |  | 
|  | Ordering rows into layers | 
|  | ------------------------- | 
|  |  | 
|  | Since the table might contain more than one row per (wc_id, local_relpath) | 
|  | combination, an ordering mechanism needs to be added.  To that effect | 
|  | the 'op_depth' value has been devised.  The op_depth is an integer | 
|  | indicating the depth of the operation which modified the tree in order | 
|  | for the node to enter the state indicated in the row. | 
|  |  | 
|  | Every row for the (wc_id, local_relpath) combination must have a unique | 
|  | op_depth associated with it.  The value of op_depth is related to the | 
|  | top-most node being modified in the given tree-restructuring | 
|  | operation (operation root or oproot).  E.g. upon deletion of a subtree, | 
|  | every node in the subtree will have a row in the table with the same | 
|  | op_depth, that being the depth of the top directory of the subtree. | 
|  |  | 
|  | The op_depth is calculated by taking the number of path components in | 
|  | the local_relpath of the oproot. The unmodified tree (BASE) is identified | 
|  | by rows with an op_depth value 0. | 
|  |  | 
|  | By having multiple restructuring operations on the same path in a modified | 
|  | subtree (most notably replacements), the table may end up with multiple rows | 
|  | with an op_depth bigger than 0. | 
|  |  | 
|  |  | 
|  |  | 
|  | Visibility of multiple op_depth rows | 
|  | ------------------------------------ | 
|  |  | 
|  | As stated in the introduction, there's no need to leak the concept of | 
|  | multiple op_depth rows out of the meta data store - apart from the BASE | 
|  | and WORKING trees. | 
|  |  | 
|  | As described before, the BASE tree is defined by op_depth == 0. WORKING as | 
|  | visible outside the metadata store maps back to those rows where | 
|  | op_depth == MAX(op_depth) for each (wc_id, local_relpath) combination. | 
|  |  | 
|  |  | 
|  |  | 
|  | Restructuring the tree means adding rows | 
|  | ---------------------------------------- | 
|  |  | 
|  | The base idea behind the NODES table is that every tree restructuring | 
|  | operation causes nodes to be added to the table in order to best support | 
|  | the reversal process: in that case a revert simply means deletion of rows | 
|  | and bringing the subtree back into sync with the metadata. | 
|  |  | 
|  | There's one exception: When a delete is followed by a copy or move to | 
|  | the deleted location - causing a replacement - a row with the right | 
|  | op_depth may already exist, due to the delete.  If so, it needs to be | 
|  | modified.  On revert, the modified nodes need to be restored to 'deleted' | 
|  | state, which itself can be reverted during the next revert.  (If the row did | 
|  | not exist with the right op_depth, then this copy or move is being performed | 
|  | at some greater depth than the delete, and then this copy or move will | 
|  | simply create rows at a new op_depth.) | 
|  |  | 
|  | ### JAF: I don't think a replacement should be reverted in two stages, even | 
|  | though it was created in two stages.  I think 'revert' should restore the | 
|  | previous existing node, just like it does in WC-1.  A partial revert of | 
|  | this state is not a particularly helpful or frequent use case. | 
|  |  | 
|  | GJS: I believe that wc_db should enable the individual reverts. The | 
|  | first revert will undo the add/copy, and the second revert will undo | 
|  | the delete. The UI (or the next level up in libsvn_wc) can collapse | 
|  | those into a single user action. This leaves us the future | 
|  | possibility of finer-grained reverts. | 
|  |  | 
|  | ### EHU: The statement above probably means that *all* nodes in the subtree | 
|  | need to be rewritten: they all have a deleted state with the affected | 
|  | op_depth, meaning they probably need a 'replaced/copied-to' state with | 
|  | the same op_depth... | 
|  |  | 
|  | GJS: not all nodes. The newly-arriving copy/move may have | 
|  | new/disjoint nodes that were not part of the deleted set. We will | 
|  | simply add new rows for these arriving nodes. Similarly, the | 
|  | arriving subtree may NOT have a similar node, so the deleted node | 
|  | remains untouched. | 
|  |  | 
|  |  | 
|  | Copies of mixed-revision subtrees become multiple layers | 
|  | -------------------------------------------------------- | 
|  |  | 
|  | In the design, every node which is not a child of its parent implies a | 
|  | tree restructuring operation having taken place.  When committing a | 
|  | mixed-revision subtree, the commit should mirror the actual mixed state | 
|  | of the tree. | 
|  |  | 
|  | A mixed-revision tree which came about in the usual process of committing | 
|  | content changes - ie one without tree modifications - differs exactly in | 
|  | that respect:  the tree in the repository doesn't need to mirror the mixed | 
|  | revision state in the working copy. | 
|  |  | 
|  | The idea is that every tree restructuring operation takes place on the | 
|  | oproot.  When a node or subtree within the copied tree isn't a direct | 
|  | child of its parent, most notably because it's at a different revision, | 
|  | that's a tree restructuring: a node of the same revision has been | 
|  | replaced by a node (of the same name) from another rev. | 
|  |  | 
|  | By strict application of the design rule, all nodes and subtrees at | 
|  | different revision levels than their parents within the copied subtree, | 
|  | become an op_depth layer of their own. | 
|  |  | 
|  | ### JAF: We don't have the info in the WC to be able to fill in the lower | 
|  | layers of this tree for the copy, if we are copying from BASE, because | 
|  | BASE is stored flat.  Therefore we won't be able to revert/delete/replace | 
|  | the different-revision sub-trees of this copied tree.  Therefore I think | 
|  | we have to store the mixed-rev copy flat (single op_depth) and modify the | 
|  | commit rules to act on revision-number changes within this flat tree. | 
|  |  | 
|  | GJS: correct. Consider a root of the subtree at r10, and a | 
|  | descendent is at r12. We cannot create one layer at r10, and another | 
|  | at r12 because we do not have the descendent@r10 to place into the | 
|  | first layer. Thus, we need to use a single op_depth layer for this | 
|  | operation. At commit time, one copy will be me for the subtree from | 
|  | r10, a deletion will be made for the descendent, and then another | 
|  | copy performed for the r12 descendent. | 
|  |  | 
|  | ### GJS: in the above scenario, we do not know if the descendent | 
|  | existed in r10, so the deletion may not be necessary (and could even | 
|  | throw an error!). I do not recall if our copy's destination is | 
|  | allowed to exist (ie. we have implied overwrite semantics in the | 
|  | repository). | 
|  |  | 
|  | PM: Yes, we have overwrite sematics.  The FS layer on the server has | 
|  | magic that converts the copy of the r12 descendant into a replace if | 
|  | the descendant exists in r10.  The client does not send a delete. | 
|  |  | 
|  | This magic applies to copies, not deletes, so there is a problem | 
|  | when the descendant is deleted in the mixed-revision copy in the | 
|  | working copy.  When faced with a copy of the subtree at r10 and a | 
|  | delete of a descendant at r12 the commit doesn't work at present. | 
|  | Deleting the descendant is wrong if it does not exist in r10, but | 
|  | not deleting it is wrong if it does exist.  I suppose the client | 
|  | could ask the server, or perhaps use multiple layers of BASE to | 
|  | track mixed-revisions (argh!). | 
|  |  | 
|  | In a deleted subtree, all nodes get marked deleted explicitly | 
|  | ------------------------------------------------------------- | 
|  |  | 
|  | All nodes in a deleted subtree get marked 'deleted' explicitly in order | 
|  | to be able to query on a single node and find in its topmost layer | 
|  | that the node that might have ever existed at the given path does not | 
|  | exist there anymore. | 
|  |  | 
|  |  | 
|  | Presence values of nodes in partially overlapping replacements | 
|  | -------------------------------------------------------------- | 
|  |  | 
|  | Replacement - being a two-step operation consisting of a delete and an | 
|  | add/copy - causes all rows of the deleted subtree to be added with a | 
|  | new op_depth and presence value 'deleted'.  So far so good. | 
|  |  | 
|  | Adding a tree on top of the same oproot will cause the oproot - | 
|  | and all overlapping children! - to switch their presence value to | 
|  | 'normal'.  When a node replaces a deleted node it hides any deleted | 
|  | children of the previously deleted node, and may come with children of | 
|  | its own.  Some of the new children may have the same names as some of | 
|  | the deleted children, but these overlapping children should not be | 
|  | considered restructuring replacements.  Only the parent, with op_depth | 
|  | equal to the tree depth, is a restructuring replacement. | 
|  |  | 
|  |  | 
|  | Status needs to consult the *two* topmost layers - sometimes | 
|  | ------------------------------------------------------------ | 
|  |  | 
|  | As discussed before, every tree restructuring operation becomes an | 
|  | oproot, causing rows to be added with a new op_depth value. | 
|  |  | 
|  | Status wants to report the oproots, making a clear distinction between | 
|  | adds and replacements.  However, both added and replaced nodes have | 
|  | the presence value 'normal'.  In order to make the distinction, status | 
|  | needs to determine if there was a node in the layer below the | 
|  | restructured layer.  In case there is, it must be a replacement, | 
|  | otherwise, it must be an addition. | 
|  |  | 
|  |  | 
|  |  | 
|  |  | 
|  | TODO: | 
|  |  | 
|  | GJS: yup. tho it will complicate a revert of a copy/move-here | 
|  | since we will need to perform a query to see whether we should | 
|  | convert the copy/move into a deleted node, or whether to simply | 
|  | remove the node entirely. | 
|  |  | 
|  | GJS: and yes, if wc_db performed a double-operation revert, | 
|  | then we wouldn't have to do this. arguably, we could push the | 
|  | 2-op revert to a future release when we also choose to alter | 
|  | the higher layers to bring in the finer-grained control. (or | 
|  | maybe we have to look at prior nodes regardless, so checking | 
|  | for "replace with a deleted node" comes for no additional | 
|  | cost). | 
|  |  | 
|  | * Document states of the table and their meaning (including values | 
|  | of the relevant columns) |