| -*- Text -*- |
| |
| Content |
| ======= |
| |
| * Context |
| * Requirements |
| * Nice-to-have's |
| * Non-goals |
| * Open items / discussion points |
| * Problems in wc-1.0 |
| * Possible solutions |
| * Prerequisites for a good wc implementation |
| * Modularization |
| * Implementation proposals for |
| - metadata storage/access abstraction |
| - BASE tree storage/access abstraction |
| - WORKING tree storage/access abstraction |
| - TARGET & MERGE-END tree storage/access abstraction |
| - transactional manipulation API proposal |
| - delta-application algorithm |
| (in light of metadata, tree and textual conflicts) |
| - |
| |
| |
| Context |
| ======= |
| |
| The working copy library has traditionally been a complex piece of |
| machinery and libsvn_wc-1.0 (wc-1.0 hereafter) was more a result of |
| evolution than it was a result of design. This can't be said to be |
| anybody's fault as much as it was unawareness of the developers at |
| the time with the problem(s) inherent to versioning trees instead of |
| files (as was the usual context within CVS). As a result, the WC |
| has been one of the most fragile areas of the Subversion versioning |
| model. |
| |
| The wc is where a large number of issues come together which can |
| be considered separate issues in the remainder of the system, or |
| don't have any effect on the rest of the system at all. The following |
| things come to mind: |
| |
| * Different behaviours required by different use-cases (users) |
| For example: some users want mtime's at checkout time |
| to be the checkout time, some want it to be the historical |
| value at check-in time (and others want different variants). |
| * Different filesystems behave differently, yet Subversion |
| is a cross platform tool and tries to behave the same on all |
| filesystems (timestamp resolution may be an example of this). |
| |
| When considering the wc-1.0 design, one finds that there are a lot |
| of situations where the exact state of the versioned tree isn't |
| defined. When explicitly considering which trees relate to the |
| working copy at one time or another, the following trees can be |
| found: |
| |
| * BASE: The tree as it was in unmodified form |
| * WORKING: The tree as it is in modified form, based on the |
| administrative information recorded by the transforming |
| 'svn ..' commands |
| Note: This tree will -as far as text bases goes- generally |
| overlap with BASE, but isn't required to; |
| e.g. "add-with-history" |
| * ACTUAL: The tree as it is in modified form on the local disk. |
| This tree may differ from WORKING when having been modified |
| with non-Subversion transforming commands (such as plain 'rm'). |
| |
| In the context of the 'svn update' command: |
| |
| * BASE-TARGET: The tree to which BASE is being updated and for |
| which the changes w.r.t. BASE are integrated into |
| WORKING and ACTUAL |
| * WORKING-TARGET, ACTUAL-TARGET: Trees in which the above mentioned |
| changes have been integrated, but which haven't "gone live" yet; |
| these trees generally represent "in transition" or "intermediary" |
| state with the intent to become the final tree. |
| |
| Additionally, three more trees may be related to the working copy |
| when considering the 'svn merge' command: |
| |
| * START: The tree used as the base state for the 'merge' command |
| * END: The tree used as the ending state for the 'merge' command |
| The difference between these trees will be merged into the |
| WORKING and ACTUAL trees. |
| |
| In the following example 10 == START and 15 == END: |
| $ svn merge -r10:15 http://svn.example.com/svn/ . |
| |
| Please note that the WORKING-TARGET and ACTUAL-TARGET trees also |
| apply to 'svn merge' as they can result in 'add with history' schedules, |
| which will place text bases in the WORKING-TARGET tree. Also note |
| that -since merge is by definition an 'edit' operation- the BASE and |
| BASE-TARGET trees are not concerned with a merge. |
| |
| ###EHU: To which trees do BASE and TARGET refer when we're in a subdir |
| of a replaced tree? And which trees do they refer to in a subdir of |
| a replaced tree which itself is replaced? (Preliminary answer: the |
| base in a replaced subdir should probably be the base as defined by |
| the parent which got copied in, not the base as was deleted, because |
| otherwise it won't be possible to delete files from the replaced subdir: |
| there would be no way to express a deletion against the new dir.) |
| |
| |
| |
| Requirements |
| ============ |
| |
| * Developer sanity |
| From this requirement, a number of additional ones follow: |
| - Very explicit tree state management; clear difference between |
| each of the 5 states we may be looking at |
| - It must be "fun" to code wc-ng enhancements |
| * Speed |
| (Note: a trade off may be required for 'checkout' vs 'status' speed) |
| * Cross-node-type working copy changes |
| * Flexibility |
| The model should make it easy to support |
| - central vs local metadata storage |
| - Last modified timestamp behaviours |
| - .svn-less working copy subtrees |
| - different file-changed detection schemes |
| (e.g. full tree scan as in wc-1.0 as well as 'p4 edit') |
| * Graceful (defined) fallback for non-supported operations |
| When a checkout tries to create a symlink on an OS which supports |
| them, on a filesystem which doesn't, we should cope without |
| canceling the complete checkout. Same for marking metadata read-only. |
| * Gracefully handle symlinks in relation to any special-handling of |
| files (don't special-handle symlinks!) |
| * Clear/reparable tree state |
| Other than our current loggy system, I mean here: "there is a command |
| by which the user can restart the command he/she last issued and |
| Subversion will help complete that command", which differs from our |
| loggy system in the way that it will return the working copy to a |
| defined (but to the user unknown) state. |
| * Transactional/ repairable tree state (with which I mean something |
| which achieves the same as our loggy system, but better). |
| * Case sensitive filesystem aware / resilient |
| * Working copy stability; a number of scenario's with switch and |
| update obstructions used to leave the working copy unrecoverable |
| * Client side 'true renames' support where one side can't be committed |
| without the other (relates to issue #876) |
| * Change detection should become entirely internal to libsvn_wc (referring |
| to the fact that libsvn_client currently calls svn_wait_for_timestamps()), |
| even though under 'use-commit-times=yes', this waiting is |
| completely useless. |
| * Last-modified recording as a preparation for solving issue #1256 and |
| as defined in this mail, also linked from the issue: |
| http://svn.haxx.se/dev/archive-2006-10/0193.shtml |
| * Representing "this node is part of a replaced-with-history tree and |
| I'm *not* in the replacement tree" as well as "... and I'm deleted |
| from the replacement tree" [issues #1962 and #2690] |
| |
| |
| Would-be-very-nice-to-have's |
| ============================ |
| |
| * Multiple users with a single working copy (aka shared working copy) |
| * Ending up with an implementation which can use current WCs |
| (without conversion) |
| * Working copies/ metadata storages without local storage of text-bases |
| (other than a few cached ones) |
| |
| |
| Non-goals |
| ========= |
| |
| * Off-line commits |
| * Distributed VC |
| |
| Open items / discussion points |
| ============================== |
| |
| * Files changed during the window "sent as part of commit" to |
| "post commit wc processing"; these are currently explicitly |
| supported. Do we want to keep this support (at the cost of speed)? |
| * Single working copy lock. Should we have one lock which locks the |
| entire working copy, disabling any parallel actions on disjoint |
| parts of the working copy? |
| * Meta data physical read-only marking (as in wc-1.0). Is it still |
| required, or should it become advisory (ie ignore errors on failure)? |
| * Is issue #1599 a real use-case we need to address? |
| (Loosing and regaining authz access with updates in between) |
| |
| |
| Problems in wc-1.0 |
| ================== |
| |
| * There's no way to clear unused parts of the entries cache |
| * The code is littered with path calculations in order |
| to access different parts of the working copy (incl. admin areas) |
| * The code is littered with direct accesses to both wc files and |
| admin area files |
| * It's not always clear at which time log files are being processed |
| (ie transactions are being committed), meaning it's not always |
| clear at which version of a tree one is looking at: the pre or post |
| transformation versions... |
| * There's no support for nested transactions (even though some |
| functions want to start a new transaction, regardless whether one |
| was already started) |
| * It's very hard to determine when an action needs to be written |
| to a transaction or needs to be executed directly |
| * All code assumes local access to admin (meta)data |
| * The transaction system contains non-runnable commands |
| * It's possible to generate combinations of commands, each of which |
| is runnable, but the series isn't |
| * Long if() blocks to sort through all possible states of |
| WORKING, ACTUAL and BASE, without calling it that. |
| * Large if() blocks dealing with the difference between file and |
| directory nodes |
| * Many special-handling if()s for svn:special files |
| * Manipulation of paths, URLs and base-text paths in 1 function |
| * 'Switchedness' of subdirectories has to be derived from the |
| URLs of the parent and the child, but copied nodes also have |
| non-parent-child source URLs... (confusing) |
| * Duplication of data: a 'copied' boolean and a 'copy_source' URL field |
| * Checkouts fail when checking out files of different casing to a case |
| insensitive filesystem |
| * Checkouts fail when marking working copy admin data as read-only |
| is a non-supported FS operation (VFAT or Samba mounts on Linux have |
| this behaviour) |
| * Obstructed updates leave operations half done; in case of a switch, |
| it's not always possible to switch back (because the switch itself |
| may have left now-unversioned items behind) |
| * Directories which have their own children merged into them (which happens |
| when merging a directory-add) won't correctly fold the children into |
| schedule==normal, but instead leave them as schedule==add, resulting in |
| a double commit (through HTTP, other RA layers fold the double add, but |
| that's not the point) [see issue #1962] |
| * transaction files (ie log files) are XML files, requiring correct |
| encoding of characters and other values; given the short expected |
| life-time of a log file and the fact that we're almost completely sure |
| the log file is going to be read by the WC library anyway (no interchange |
| problems), this is a waste of processing time |
| * No strict separation between public and internal APIs: many public |
| APIs also used internally, growing arguments which *should* only |
| matter for internal use |
| |
| |
| Possible solutions |
| ================== |
| |
| Developer sanity |
| ---------------- |
| Strict separation between modules should help keep code focused at one |
| task. Probably some of the required user-specific behaviours can (and |
| should) be hidden behind vtables; for example: setting the file stamp |
| to the commit time, last recorded time or leaving it at the current time |
| should be abstracted from. |
| |
| Access to 'text bases' is another one of these areas: most routines in |
| wc-1.0 don't actually need access to a file (a stream would be fine as |
| well), but since the files are there, availability is assumed. |
| When abstracting all access into streams, the actual administration of |
| the BASE tree can be abstracted from: for all we know the 'tree storage |
| module' may be reading the stream directly off the repository server. |
| [The only module in wc-1.0 which *requires* access to the files is |
| the diff/merge library, because it rewinds to the start of the file |
| during its processing; an operation not supported by streams... and even |
| then, if these routines are passed file handles, they'll be quite |
| happy, meaning they still don't need to know where the text base / |
| source file is...] |
| |
| ###GJS: the APIs should use streams so that we can decompress as the |
| stream is being read. the diff library will need a callback of some |
| kind to perform the rewind, which will effectively just close and |
| reopen the stream. if it rewinds *multiple* times, then we may want |
| to cache the decompressed version of the file. I'll |
| investigate. Given our metadata/base-text storage system, I suspect |
| it will be very easy to cache decompressed copies for a while. |
| |
| In order to keep developers sane, it should be extremely clear at any |
| one time - when operating on a tree - which tree is being operated upon. |
| |
| One way to prevent the lengthy 'if()' blocks currently in wc-1.0, would be |
| to design a dispatch mechanism based on the path-state in WORKING/BASE and the |
| required transformation, dispatching to (small) functions which perform |
| solely that specific task. |
| #####XBC Do please note that this suggests yet another instance of |
| pure polymorphism coded in C. This runs contrary to the |
| developer sanity requirement. |
| ###GJS: agreed with XBC. |
| |
| |
| Speed |
| ----- |
| wc-1.0 assumes the WORKING tree and the ACTUAL tree match, but then |
| goes out of its way to assure they actually do when deemed important. |
| The result is a library which calls stat() a lot more often than need be. |
| |
| One of the possible improvements would be to make wc-ng read all of |
| the ACTUAL state (concentrated in one place, using apr_stat()), keeping |
| it around as long as required, matching it with the WORKING state before |
| operating on either (not only when deemed important!). |
| |
| ###GJS: working copy file counts are unbounded, so we need to be |
| careful about keeping "all" stat results in memory. I'll certainly |
| keep this in mind, however. |
| |
| Working from the ACTUAL tree will also prove to be a step toward clarity |
| regarding the exact tree which is being operated upon. |
| |
| [This suggestion from wc-improvements also applies to wc-ng:] |
| Most operations are I/O bound and have CPU to spare. Consider the virtue |
| of compressed text bases in order to reduce the amount of I/O required. |
| |
| Another idea to reduce I/O is to eliminate atomic-rename-into-place for |
| the metadata part of the working copy: if a file is completely written, |
| store the name of the base-text/prop-text in the entries file, which gets |
| rewritten on most wc-transformations anyway. |
| |
| |
| Cross node type change representation |
| ------------------------------------- |
| ####EHU To be done |
| |
| Flexibility of metadata storage |
| ------------------------------- |
| There are 3 known models for storing metadata as requested by different |
| groups of users: |
| |
| - in-subtree metadata storage (.svn subdir model, as in wc-1.0) |
| ###GJS: euh... aren't we axing this? who has *requested* this? |
| - in-'tree root' metadata storage (working copy central) |
| - detached metadata storage (user-central) |
| - in $HOME/.subversion/ |
| - in arbitrary location (e.g. $HOME is a (slow) NFS mount, and we |
| want the metadata on a local drive, such as /var/...) |
| |
| A solution to implementing each of these behaviours in order to satisfy |
| the wide range of use-cases they solve, would be to define a module |
| interface and implement this interface three times (possibly using vtables). |
| |
| Note that using within-module vtables should be less problematic than our |
| post-1.0 experiences with public vtables (such as the ra-layer vtable): |
| implementation details are allowed to differ between releases (even patch |
| releases). |
| |
| ###GJS: note that we are talking about both metadata AND base-text |
| content. (and yeah, optional and compresses base-texts can be done |
| during this rewrite) Also note that we might be able to share |
| base-text content across working copies if they are all keyed by |
| the MD5 hash into storage directories (under the user-central model) |
| |
| ###GJS: I don't think vtables are needed here. This is simply altering |
| the base location, not a whole new implementation. My plan is to |
| default to the "tree root" model with a .svn subdirectory. If a |
| .svn subdir is not found, then we fall back to looking in the |
| $HOME/.subversion/ directory (some subdir under there). If we |
| *still* don't find it, then some config options will point us to |
| the metadata/base-text location. |
| |
| ###GJS: my plan is to upgrade the working copy if we find a pre-1.6 |
| working copy. all the data will be lifted from the multiple .svn |
| subdirectories, and relocated to the "proper" storage location. |
| This will be a non-reversable upgrade, and will preclude pre-1.6 |
| clients from using that working copy again. |
| Note: because of the "destructive" nature of this upgrade, and the |
| expected duration, we may want to require the user to perform an |
| explicit action in order to complete the upgrade. However, 1.6 will |
| not be able to *modify* wc-1.0 metadata -- just read it in order to |
| upgrade it to the new storage system. |
| |
| |
| Transaction duration / memory management |
| ---------------------------------------- |
| The current pool-based memory management system is very good at managing |
| memory in a transaction-based processing model. In the wc library, a |
| 'transaction' often spans more than one call into the library. We either |
| need a sane way to handle this kind of situation using pools, or we may |
| need a different memory management strategy in wc-ng. |
| |
| Working copy stability |
| ---------------------- |
| In light of obstructed updates it may not always be desirable to be able |
| to resume the current operation (as currently is the case): in some cases |
| the user may want to abort the operation, in other cases the user may |
| want to resolve the obstruction before re-executing the operation. |
| |
| The solution to this problem could be 'atomic updates': receiving the |
| full working copy transformation, verifying prerequisites, creating |
| replacement files and directories and when all that succeeds, update |
| the working copy. |
| |
| Full workin' copy unit tests: |
| Exactly because the working copy is such an important part of the |
| Subversion experience *and* because of the 'reputation' of wc-1.0, |
| we need a way to ensure wc-ng completely performs according to our |
| expectations. *The* way to ensure we're able to test the most contrived |
| edge-cases is to develop a full unit testing test-suite while developing |
| wc-ng. This will both be a measure to ensure working copy stability |
| as well as developer sanity: in the early stages of the wc-ng develop- |
| ment process, we'll be able to assess how well the design holds up |
| under more difficult 'weather'. |
| |
| ###GJS: agreed. as much as possible, when I (re)implement the old APIs |
| in terms of the new APIs, then I'll write a whitebox test. we'll |
| see how long I keep that up :-P |
| |
| Transactional updates |
| --------------------- |
| |
| .. where 'update' is meant as 'user command', not 'svn update' per se. |
| |
| When applied to files, this can be summarized as: |
| |
| * Receive transformations (update, delete, add) from |
| the server, |
| |
| |
| Prerequisites for a good wc implementation |
| ========================================== |
| |
| These prerequisites are to be addressed, either as definitions |
| in this document, or elsewhere in the subversion (source) tree: |
| * Well defined behaviour for cross-node type updates/merges/.. |
| (tree conflicts in particular) |
| * Well defined behaviour for special file handling |
| * Well defined behaviour for operations on locally missing items |
| (see issue #1082) |
| * Well defined change detection scheme for each of the different |
| last-modified handling strategies |
| * No special handling of symlinks: they are first class versioned objects |
| * Well defined behaviour for property changes on updates/merges/... |
| (this is a problem which may resemble tree conflicts!), |
| including 'svn:' special properties |
| * File name manipulation routines (availability) |
| * File name comparison routines (!) (availability; which compensate |
| for the different ways Unicode characters can be represented |
| [re: NFC/NFD Unicode issue]) |
| * URL manipulation routines (availability) |
| * URL comparison routines (availability; which compensate for |
| different ways the same URL can be encoded; see issue #2490) |
| * Modularization |
| * Agree on a UI to pull in other parts of the same repository |
| (NOT svn:externals) [relates to issue #1167] |
| #####XBC I submit this is a server-side feature that the client |
| (i.e. the WC library) should not know about. |
| * Agree on behaviour for update on moved items (relates to issue #1736) |
| * Case-sensitivity detection code to probe working copy filesystem |
| |
| |
| Modularization |
| ============== |
| |
| Strict separation must be applied to a number of modules which can be |
| recognised. This will help prevent spaghetti code as in wc-1.0 where |
| one piece of code manipulates paths to a working copy file, its URL |
| *and* the path to the base file. |
| |
| For now, these APIs can be separated: |
| |
| - the public API (presumably not to be used by any internal |
| processing, but presents functionality to working copy users) |
| #####XBC This is really required of all our module public APIs. |
| - tree administration API (required for BASE, TARGET and WORKING) |
| Admins which files are part of the tree, which ones map to |
| which repositories and which textbase / propbase files belong |
| to which local files. [should provide checkpointing functionality |
| for use with transactional tree modifications API] |
| - tree access API (required for BASE, WORKING, TARGET and ACTUAL) |
| Gives access to the content of the nodes in a tree |
| - props |
| - text bases (for files) |
| - child nodes (for directories) |
| - transactional tree modifications API (applicable to all trees, |
| ###EHU do we provide the same interface to BASE/WORKING as for ACTUAL?) |
| - tree transformation (required for update/switch/merge updating |
| BASE, WORKING and ACTUAL), meaning all of tree changes, file |
| changes and metadata changes |
| - Working-copy changedness detection API |
| - Metadata access API (used by tree administration module(s)) |
| - Event hooks (in order to be able to implement different |
| timestamp-setting strategies and possibly more) |
| |
| These APIs will be implemented by these (currently known) modules: |
| |
| - tree administration |
| * wc_adm |
| - tree access |
| * wc_acc |
| - transactional tree modifications |
| * wc_log |
| - tree transformation |
| * wc_trans |
| - working copy changedness detection |
| wc_detect vtable-based API implemented by these modules: |
| * tree crawler ('inspired' by wc-1.0) |
| * tree marker (inspired by 'p4 edit') |
| - metadata access API |
| wc_macc vtable-based API implemented by these modules: |
| * tree spread ('inspired' by wc-1.0) |
| * tree root (storing all metadata in the tree root (think darcs)) |
| * central depot (storing 'somewhere' locally, possibly $HOME) |
| this central store would open up the possibility to share |
| text bases/prop bases across checkouts |
| * non-local (retrieving all text and prop-bases from the server, |
| except for a number of cached ones) ###EHU: maybe this is |
| orthogonal to the question where metadata is stored: in all |
| situations, you *could* choose not to keep local copies |
| - Event hooks for the union of all paths in (BASE, WORKING) |
| wc_hook event based single-callback API |
| for e.g. these events: |
| + props updated |
| + base text updated |
| + wc file updated |
| + update completed |
| + lock acquired |
| + lock released |
| (+ lock can't be acquired [in order to 'unprotect' |
| svn:needs-lock protected files which have been removed |
| from the repository?]) |
| to be implemented by these modules: |
| * use-commit-times |
| * versioned-mtimes |
| * versioned-execute-perm |
| * versioned-other-unix-perms |
| (* versioned-windows-perms?) |
| * needs-lock-updater |
| |
| Justification for the large number of modules, with a modest number |
| of different APIs is that the problem is really quite complex as shown |
| earlier in this document. |
| |
| Over the years, a large number of use cases have developed around |
| Subversion where different user groups have shown very valid use cases |
| for conflicting behaviours. Presumably, most of these we want to |
| retain. Some of the unimplemented ones have open issues indicating |
| there's at least an active interest. In order to prevent locking out |
| some of the current use cases adding support for the open issues, we |
| need a flexible modularized model. This model will also prevent that |
| we'll end up duplicating lots of code to support the different use cases. |
| #####XBC Such flexibility will bring the WC to the kind of |
| purgatory the RA layers are in. We promise feature and semantics |
| parity between them, and the result is that even a small change |
| in that layer requires knowledge of three different protocols |
| and four different implementations. |
| |
| Given the assumption of 'little code duplication', the choice for |
| having several modules which implement the same API (vtable) is |
| justifiable. |
| |
| ###GJS: disagree. I plan to have just one library (libsvn_wc2?) and |
| will probably have no vtables. there is very little need for |
| distinct implementations, as far as I can tell. |
| |
| |
| Implementation proposals |
| ======================== |
| |
| Classification of svn_wc_entry_t fields to BASE/WORKING |
| ------------------------------------------------------- |
| |
| [Note: This section is mainly to clarify the difference between the BASE |
| and WORKING trees, it's not here to mean that we actually need all these |
| fields in wc-ng!] |
| |
| Here are the mappings of all fields from svn_wc_entry_t to the BASE and |
| WORKING trees: |
| |
| +-------------------------------+------+---------+ |
| | svn_wc_entry_t | BASE | WORKING | |
| +-------------------------------+------+---------+ |
| | name | x | x (1)| |
| | revision | x | x (2)| |
| | url | x | x (2)| |
| | repos | x | x (3)| |
| | uuid | x | x (3)| |
| | kind | x | x | |
| | absent | x | | |
| | copyfrom_url | | x | |
| | copyfrom_rev | | x | |
| | conflict_old | | x | |
| | conflict_new | | x | |
| | conflict_wrk | | x | |
| | prejfile | | x | |
| | text_time | | = | |
| | prop_time | | = | |
| | checksum | x | x (2)| |
| | cmt_rev | x | x (2)| |
| | cmt_date | x | x (2)| |
| | cmt_author | x | x (2)| |
| | lock_token | x(6)| | |
| | lock_owner | x | | |
| | lock_comment | x | | |
| | lock_creation_date | x | | |
| | has_props | x | x (4)| |
| | has_prop_mods | | = | |
| | cachable_props | x(5)| x (4)| |
| | present_props | x | x (4)| |
| | changelist | | x | |
| | working_size | | = | |
| | keep_local | | = | |
| | depth | x | x | |
| | schedule | | | |
| | copied | | | |
| | deleted | | | |
| | incomplete | | | |
| +-------------------------------+------+---------+ |
| |
| (1) if this one differs from BASE, it must point to the source of a rename |
| (2) for an add-with-history |
| (3) or can we assume single-repository working copies? |
| (4) can differ from BASE for add-with-history |
| (5) why is this a field at all; can't the WC code know? |
| (6) locks apply to in-repository paths, hence BASE |
| |
| The fields marked with '=' are implementation details of internal detection |
| mechanisms, which means they don't belong in the public interface. |
| |
| Fields with no check are to become obsolete. 'schedule', 'copied' and |
| 'deleted' can be deducded from the difference between the BASE and WORKING |
| or WORKING and ACTUAL trees. 'incomplete' should become obsolete when the |
| goal of 'atomic updates' can be realised, in which case the tree can't be |
| in an incomplete yet locked state. This would also invalidate issue #1879. |
| |
| |
| Basic Storage Mechanics |
| ----------------------- |
| |
| All metadata will be stored into a single SQLite database. This |
| includes all of the "entry" fields *and* all of the properties |
| attached to the files/directories. SQLite transactions will be used |
| rather than the "loggy" mechanics of wc-1.0. |
| |
| ###GJS: note that atomicity across the sqlite database and the content |
| of the ACTUAL tree is freakin' difficult. idea to test: metadata |
| says "not sure of ACTUAL", and when ops complete successfully, then |
| we clear the flag. during any future operation, if the flag is |
| present, then we approach the ACTUAL with extreme prejudice. also |
| note that we can batch clearing of the flags as an optimistic |
| efficiency approach (since if we batch 100 and the last fails, then |
| the other 99 will be slower until the wc-ng determines the ACTUAL |
| is in fine shape and clears the flag for future operations). |
| |
| ###GJS: be wary of sqlite commit performance (based on some of my |
| prior experience with it). must have timing/debugging around the |
| commit operations. may need to use various transaction isolations |
| and/or batching of commits to get proper performance. thus, profile |
| output capability is mandatory to determine if we have issues, and |
| where they occur. |
| |
| Base text data will be stored in a multi-level directory structure, |
| keyed/named by the MD5 of the file. The database will record |
| appropriate mappings, content/compression types, and refcounts for the |
| base text files (for the shared case). |
| |
| |
| Metadata Schemas |
| ---------------- |
| |
| BASE schema |
| name string |
| kind enum |
| checksum string ### string should indicate what kind of checksum? |
| props blob ### serialized props. maybe keep unserialized? |
| ### maybe some props should be broken out? |
| children [string] ### someting more than array-of-string? |
| cmt_rev integer64 |
| cmt_date integer64 ### is there a time type? |
| cmt_author string ### normalize? (besides utf8, what does "normalize" mean?) |
| |
| absent boolean ### really? maybe state instead, such as "omitted"? |
| revision integer64 |
| url string ### should be computed value? |
| repos_url string ### shared, per-dir? |
| repos_uuid string ### shared, per-dir? |
| ### lock information? in this table, or a separate locks table? |
| ### KFF: in this table, I think. It's rare to tell Subversion "Show |
| ### me everything that's locked", but it's common to say "Tell me if |
| ### this particular path [or set of paths] is locked". |
| ### |
| ### depth? |
| |
| |
| WORKING schema |
| name string ### |
| kind enum |
| checksum string |
| props blob ### serialized props. maybe keep unserialized? |
| ### maybe some props should be broken out? |
| children [string] ### someting more than array-of-string? |
| ### following are set when a copyfrom occurs. do we really need these? |
| cmt_rev integer64 |
| cmt_date integer64 ### is there a time type? |
| cmt_author string ### normalize? |
| |
| deleted boolean |
| copyfrom_url string |
| copyfrom_rev integer64 |
| conflict_old string ### conflict data goes on WORKING, right? |
| conflict_new string |
| conflict_wrk string |
| prejfile string |
| changelist string |
| ### KFF: With changelists, it's the opposite of locks: one |
| ### frequently says "show me all the files in changelist X"; I'm |
| ### not sure how frequently we need to say "show me all the |
| ### changelists this file is part of". But anyway, the need to |
| ### discover all the files in changelist X implies that it might be |
| ### better to make a changelist table? |
| |
| ### how to represent renames? |
| ### repos_url, repos_uuid: combined working copies, and/or a foreign |
| ### repos brought in and about to be linked via svn:externals. ?? |
| ### depth? |
| |
| |
| ACTUAL schema (what we know about the ACTUAL tree) |
| name string |
| ### cache the checksum? probably unreliable, but could be a marker |
| ### for "we've already noted the file is different from WORKING" |
| ### cache the kind? assume that kind, then on exception do extended |
| ### processing to figure out the "right" thing. |
| |
| lastmod integer64 ### is there a time type? |
| |
| ### note that we can't really hold any/much data here since ACTUAL |
| could change at any point in time. anything stored here should |
| simply be optimization indicators on how to approach the ACTUAL |
| tree (i.e. what assumptions to make). |
| |
| |
| BASETEXT schema |
| checksum string ### KFF: how is this different from the |
| ### checksum in the "BASE" schema above? |
| present enum ### something like: yes, no, cached, notyet |
| refcount integer |
| compressed boolean ### enum for various storage styles? |
| |
| ### learn from GIT storage? can we somehow compress across this |
| entire storage heirarchy? (eg. use history and deltas to |
| reassemble fulltexts from line-of-descent) |
| |
| |
| Note that 'children' applies only to kind==DIRECTORY |
| |
| ### break down all modification operations to things that operate on a |
| small/fixed set of rows. if a large sequence of operations fails, |
| then it can leave the system in reparable state, since most were |
| performed. note that ACTUAL can change at any time, thus all mods |
| should be able to compensate for ACTUAL being something |
| unexpected. thus, the transformative operations should be able to |
| fail in such a way as to leave ACTUAL pretty bunged up. |
| |
| ### maybe use handles to refer to files/dirs? take input pathname, |
| convert to native charset, and return a handle for that. same |
| handle across varous schemas. |
| |
| note: could be very handy, as I'm thinking abspath for all names, |
| which gets to be pretty wordy for large working copies. |
| |
| note: with handles, a file entry could have "parent dir" cheaply, |
| and from that, we could derive repos_url/uuid from some dir table. |
| |
| ### probably want to special-case the checksum and BASETEXT entry for |
| the "empty file" |
| |
| |
| Code Organization |
| ----------------- |
| |
| libsvn_wc/wc_db.h (symbols: svn_wc__db_*) |
| Storage subsystem for the WC metadata/base-text information. |
| This is a private API, and the rest of the WC will be rebuilt |
| on top of this. |
| |
| This code deals with storage, and transactional modifications |
| of the data. |
| |
| |
| Other sections |
| ============== |
| remain to be done |