blob: 710957496aadba93b499d88d218087a07be883cb [file] [log] [blame]
Description of the NODES table
==============================
* Introduction
* Inclusion of BASE nodes
* Rows to store state
* Ordering rows into layers
* Visibility of multiple op_depth rows
* Restructuring the tree means adding rows
* Copies of mixed-revision subtrees become multiple layers
* In a deleted subtree, all nodes get marked deleted explicitly
* Nodes in a replaced subtree can have different presence values
* Presence values of nodes in partially overlapping replacements
* Status needs to consult the *two* topmost layers - sometimes
Introduction
------------
The entire original design of wc-ng evolves around the notion that
there are a number of states in a working copy, each of which needs
to be managed. All operations - excluding merge - operate on three
trees: BASE, WORKING and ACTUAL.
For an in-depth description of what each means, the reader is referred
to other documentation, also in the notes/ directory. In short, BASE
is what was checked out from the repository; WORKING includes
tree restructuring; while ACTUAL also includes changes to properties
and file contents.
The idea that there are three trees works - mostly. There is no need
for more trees outside the area of the metadata administration and even
then three trees got us pretty far. The problem starts when one realizes
tree modifications can be overlapping or layered. Imagine a tree with
a replaced subtree. It's possible to replace a subtree within the
replacement. Imagine that happened and that the user wants to revert
one of the replacements. Given a 'flat' system, with just enough columns
in the database to record the 'old' and 'new' information per node, a single
revert can be supported. However, in the example with the double
replacement above, that would mean it's impossible to revert one of the
two replacements: either there's not enough information in the deepest
replacement to execute the highest level replacement or vice versa
- depending on which information was selected to be stored in the "new"
columns.
The NODES table is the answer to this problem: instead of having a single
row in a table with WORKING nodes with just enough columns to record
(as per the example) a replacement, the solution is to record different
layers of tree restructuring by having multiple rows.
Inclusion of BASE nodes
-----------------------
The original technical design of wc-ng included a WORKING_NODE and a
BASE_NODE table. As described in the introduction, the WORKING_NODE
table was replaced with NODES. However, the BASE_NODE table stores
roughly the same state information that WORKING_NODE did. Additionally,
in a number of situations, the system isn't interested in the type of
state it gets returned (BASE or WORKING) - it just wants the latest.
As a result the BASE_NODE table has been integrated into the NODES
table.
The main difference between the WORKING_NODE and BASE_NODE tables was
that the BASE_NODE table contained a few caching fields which are
not relevant to WORKING_NODE. Moving those to a separate table was
determined to be wasteful because the primary key of that table
would be much larger than any information stored in it in the first
place.
Rows to store state
-------------------
Rows of the NODES table store state of nodes in the BASE tree
and the layers in the WORKING tree. Note that these nodes do not
need to exist in the working copy presented to the user: they may
be 'absent', 'not-present' or just removed (rm) without using
Subversion commands.
A row contains information linking to the repository, if the node
was received from a repository. This reference may be a link to
the original nodes for copied or moved nodes, but for rows designating
BASE state, they refer to the repository location which was checked
out from.
Additionally, the rows contain information about local modifications
such as copy, move or delete operations.
Ordering rows into layers
-------------------------
Since the table might contain more than one row per (wc_id, local_relpath)
combination, an ordering mechanism needs to be added. To that effect
the 'op_depth' value has been devised. The op_depth is an integer
indicating the depth of the operation which modified the tree in order
for the node to enter the state indicated in the row.
Every row for the (wc_id, local_relpath) combination must have a unique
op_depth associated with it. The value of op_depth is related to the
top-most node being modified in the given tree-restructuring
operation (operation root or oproot). E.g. upon deletion of a subtree,
every node in the subtree will have a row in the table with the same
op_depth, that being the depth of the top directory of the subtree.
The op_depth is calculated by taking the number of path components in
the local_relpath of the oproot. The unmodified tree (BASE) is identified
by rows with an op_depth value 0.
By having multiple restructuring operations on the same path in a modified
subtree (most notably replacements), the table may end up with multiple rows
with an op_depth bigger than 0.
Visibility of multiple op_depth rows
------------------------------------
As stated in the introduction, there's no need to leak the concept of
multiple op_depth rows out of the meta data store - apart from the BASE
and WORKING trees.
As described before, the BASE tree is defined by op_depth == 0. WORKING as
visible outside the metadata store maps back to those rows where
op_depth == MAX(op_depth) for each (wc_id, local_relpath) combination.
Restructuring the tree means adding rows
----------------------------------------
The base idea behind the NODES table is that every tree restructuring
operation causes nodes to be added to the table in order to best support
the reversal process: in that case a revert simply means deletion of rows
and bringing the subtree back into sync with the metadata.
There's one exception: When a delete is followed by a copy or move to
the deleted location - causing a replacement - a row with the right
op_depth may already exist, due to the delete. If so, it needs to be
modified. On revert, the modified nodes need to be restored to 'deleted'
state, which itself can be reverted during the next revert. (If the row did
not exist with the right op_depth, then this copy or move is being performed
at some greater depth than the delete, and then this copy or move will
simply create rows at a new op_depth.)
### JAF: I don't think a replacement should be reverted in two stages, even
though it was created in two stages. I think 'revert' should restore the
previous existing node, just like it does in WC-1. A partial revert of
this state is not a particularly helpful or frequent use case.
GJS: I believe that wc_db should enable the individual reverts. The
first revert will undo the add/copy, and the second revert will undo
the delete. The UI (or the next level up in libsvn_wc) can collapse
those into a single user action. This leaves us the future
possibility of finer-grained reverts.
### EHU: The statement above probably means that *all* nodes in the subtree
need to be rewritten: they all have a deleted state with the affected
op_depth, meaning they probably need a 'replaced/copied-to' state with
the same op_depth...
GJS: not all nodes. The newly-arriving copy/move may have
new/disjoint nodes that were not part of the deleted set. We will
simply add new rows for these arriving nodes. Similarly, the
arriving subtree may NOT have a similar node, so the deleted node
remains untouched.
Copies of mixed-revision subtrees become multiple layers
--------------------------------------------------------
In the design, every node which is not a child of its parent implies a
tree restructuring operation having taken place. When committing a
mixed-revision subtree, the commit should mirror the actual mixed state
of the tree.
A mixed-revision tree which came about in the usual process of committing
content changes - ie one without tree modifications - differs exactly in
that respect: the tree in the repository doesn't need to mirror the mixed
revision state in the working copy.
The idea is that every tree restructuring operation takes place on the
oproot. When a node or subtree within the copied tree isn't a direct
child of its parent, most notably because it's at a different revision,
that's a tree restructuring: a node of the same revision has been
replaced by a node (of the same name) from another rev.
By strict application of the design rule, all nodes and subtrees at
different revision levels than their parents within the copied subtree,
become an op_depth layer of their own.
### JAF: We don't have the info in the WC to be able to fill in the lower
layers of this tree for the copy, if we are copying from BASE, because
BASE is stored flat. Therefore we won't be able to revert/delete/replace
the different-revision sub-trees of this copied tree. Therefore I think
we have to store the mixed-rev copy flat (single op_depth) and modify the
commit rules to act on revision-number changes within this flat tree.
GJS: correct. Consider a root of the subtree at r10, and a
descendant is at r12. We cannot create one layer at r10, and another
at r12 because we do not have the descendant@r10 to place into the
first layer. Thus, we need to use a single op_depth layer for this
operation. At commit time, one copy will be me for the subtree from
r10, a deletion will be made for the descendant, and then another
copy performed for the r12 descendant.
### GJS: in the above scenario, we do not know if the descendant
existed in r10, so the deletion may not be necessary (and could even
throw an error!). I do not recall if our copy's destination is
allowed to exist (ie. we have implied overwrite semantics in the
repository).
PM: Yes, we have overwrite sematics. The FS layer on the server has
magic that converts the copy of the r12 descendant into a replace if
the descendant exists in r10. The client does not send a delete.
This magic applies to copies, not deletes, so there is a problem
when the descendant is deleted in the mixed-revision copy in the
working copy. When faced with a copy of the subtree at r10 and a
delete of a descendant at r12 the commit doesn't work at present.
Deleting the descendant is wrong if it does not exist in r10, but
not deleting it is wrong if it does exist. I suppose the client
could ask the server, or perhaps use multiple layers of BASE to
track mixed-revisions (argh!).
In a deleted subtree, all nodes get marked deleted explicitly
-------------------------------------------------------------
All nodes in a deleted subtree get marked 'deleted' explicitly in order
to be able to query on a single node and find in its topmost layer
that the node that might have ever existed at the given path does not
exist there anymore.
Presence values of nodes in partially overlapping replacements
--------------------------------------------------------------
Replacement - being a two-step operation consisting of a delete and an
add/copy - causes all rows of the deleted subtree to be added with a
new op_depth and presence value 'deleted'. So far so good.
Adding a tree on top of the same oproot will cause the oproot -
and all overlapping children! - to switch their presence value to
'normal'. When a node replaces a deleted node it hides any deleted
children of the previously deleted node, and may come with children of
its own. Some of the new children may have the same names as some of
the deleted children, but these overlapping children should not be
considered restructuring replacements. Only the parent, with op_depth
equal to the tree depth, is a restructuring replacement.
Status needs to consult the *two* topmost layers - sometimes
------------------------------------------------------------
As discussed before, every tree restructuring operation becomes an
oproot, causing rows to be added with a new op_depth value.
Status wants to report the oproots, making a clear distinction between
adds and replacements. However, both added and replaced nodes have
the presence value 'normal'. In order to make the distinction, status
needs to determine if there was a node in the layer below the
restructured layer. In case there is, it must be a replacement,
otherwise, it must be an addition.
TODO:
GJS: yup. tho it will complicate a revert of a copy/move-here
since we will need to perform a query to see whether we should
convert the copy/move into a deleted node, or whether to simply
remove the node entirely.
GJS: and yes, if wc_db performed a double-operation revert,
then we wouldn't have to do this. arguably, we could push the
2-op revert to a future release when we also choose to alter
the higher layers to bring in the finer-grained control. (or
maybe we have to look at prior nodes regardless, so checking
for "replace with a deleted node" comes for no additional
cost).
* Document states of the table and their meaning (including values
of the relevant columns)