| -*- Text -*- |
| |
| Content |
| ======= |
| |
| * Context |
| * Requirements |
| * Nice-to-have's |
| * Non-goals |
| * Open items / discussion points |
| * Problems in wc-1.0 |
| * Possible solutions |
| * Prerequisites for a good wc implementation |
| * Modularization |
| * Implementation proposals for |
| - metadata storage/access abstraction |
| - BASE tree storage/access abstraction |
| - WORKING tree storage/access abstraction |
| - TARGET & MERGE-END tree storage/access abstraction |
| - transactional manipulation API proposal |
| - delta-application algorithm |
| (in light of metadata, tree and textual conflicts) |
| - |
| * Implementation plan |
| |
| |
| Context |
| ======= |
| |
| The working copy library has traditionally been a complex piece of |
| machinery and libsvn_wc-1.0 (wc-1.0 hereafter) was more a result of |
| evolution than it was a result of design. This can't be said to be |
| anybody's fault as much as it was unawareness of the developers at |
| the time with the problem(s) inherent to versioning trees instead of |
| files (as was the usual context within CVS). As a result, the WC |
| has been one of the most fragile areas of the Subversion versioning |
| model. |
| |
| The wc is where a large number of issues come together which can |
| be considered separate issues in the remainder of the system, or |
| don't have any effect on the rest of the system at all. The following |
| things come to mind: |
| |
| * Different behaviours required by different use-cases (users) |
| For example: some users want mtime's at checkout time |
| to be the checkout time, some want it to be the historical |
| value at check-in time (and others want different variants). |
| * Different filesystems behave differently, yet Subversion |
| is a cross platform tool and tries to behave the same on all |
| filesystems (timestamp resolution may be an example of this). |
| |
| When considering the wc-1.0 design, one finds that there are a lot |
| of situations where the exact state of the versioned tree isn't |
| defined. When explicitly considering which trees relate to the |
| working copy at one time or another, the following trees can be |
| found: |
| |
| * BASE: The tree of nodes from the repository, against which local changes |
| are made. Also known as "pristine". Each node is as it was in the |
| repository at a particular revision and URL, as recorded per node in |
| the WC metadata. A directory node in the BASE tree knows something |
| about the children it had in the repository (### details?), but its set |
| of children in the WC is independent of that. In a node or tree |
| scheduled for replacement the BASE is the pristine version of the |
| to-be-added node or tree, not of the deleted one. For a node that is |
| scheduled for add without history, there is no BASE node. |
| |
| * WORKING: The tree that represent's the user's view of the WC with their |
| local modifications (assuming the user told Subversion about these |
| modifications with "svn add" etc. as required). In implementation, the |
| WORKING tree has the structure and properties recorded in the WC, and |
| the file content present on the local disk. (If a file cannot be |
| accessed because the tree structure on the local disk is incompatible, |
| this is an error, known as an "obstruction".) |
| |
| * ACTUAL: The tree on the local disk, ignoring Subversion |
| administrative directories and other nodes that Subversion has |
| knowingly put there such as conflict reject files, and regarding |
| every node as having no Subversion properties. |
| |
| (Variations to consider: Construct properties such as |
| svn:executable, svn:special, and any svn: time-stamp properties |
| from the operating system meta-data. Construct properties from |
| auto-props. Exclude nodes that the operating system says are |
| hidden.) |
| |
| In the context of the 'svn update' command: |
| |
| * BASE-TARGET: The tree to which BASE is being updated and for |
| which the changes w.r.t. BASE are integrated into |
| WORKING and ACTUAL |
| |
| * WORKING-TARGET, ACTUAL-TARGET: Trees in which the above mentioned |
| changes have been integrated, but which haven't "gone live" yet; |
| these trees generally represent "in transition" or "intermediary" |
| state with the intent to become the final tree. |
| |
| Additionally, two more trees may be related to the working copy |
| when considering the 'svn merge' command: |
| |
| * START: The tree used as the base state for the 'merge' command |
| |
| * END: The tree used as the ending state for the 'merge' command |
| The difference between these trees will be merged into the |
| WORKING and ACTUAL trees. |
| |
| In the following example 10 == START and 15 == END: |
| $ svn merge -r10:15 http://svn.example.com/svn/ . |
| |
| Please note that the WORKING-TARGET and ACTUAL-TARGET trees also |
| apply to 'svn merge' as they can result in 'add with history' schedules, |
| which will place text bases in the WORKING-TARGET tree. Also note |
| that -since merge is by definition an 'edit' operation- the BASE and |
| BASE-TARGET trees are not concerned with a merge. |
| |
| ###EHU: To which trees do BASE and TARGET refer when we're in a subdir |
| of a replaced tree? And which trees do they refer to in a subdir of |
| a replaced tree which itself is replaced? (Preliminary answer: the |
| base in a replaced subdir should probably be the base as defined by |
| the parent which got copied in, not the base as was deleted, because |
| otherwise it won't be possible to delete files from the replaced subdir: |
| there would be no way to express a deletion against the new dir.) |
| |
| A tree can be said to have its files in repository-normal format or |
| working-copy format; the difference relates to line endings and keyword |
| expansion, as defined elsewhere. A BASE tree presents itself in |
| repository-normal format by default and can be converted to working-copy |
| format. A WORKING or ACTUAL tree presents itself in working-copy format by |
| default and can be converted to repository-normal format. |
| |
| |
| Requirements |
| ============ |
| |
| * Developer sanity |
| From this requirement, a number of additional ones follow: |
| - Very explicit tree state management; clear difference between |
| each of the 5 states we may be looking at |
| - It must be "fun" to code wc-ng enhancements |
| * Speed |
| (Note: a trade off may be required for 'checkout' vs 'status' speed) |
| * Cross-node-type working copy changes |
| * Flexibility |
| The model should make it easy to support |
| - central vs local metadata storage |
| - Last modified timestamp behaviours |
| - .svn-less working copy subtrees |
| - different file-changed detection schemes |
| (e.g. full tree scan as in wc-1.0 as well as 'p4 edit') |
| * Graceful (defined) fallback for non-supported operations |
| When a checkout tries to create a symlink on an OS which supports |
| them, on a filesystem which doesn't, we should cope without |
| canceling the complete checkout. Same for marking metadata read-only. |
| * Gracefully handle symlinks in relation to any special-handling of |
| files (don't special-handle symlinks!) |
| * Clear/reparable tree state |
| Other than our current loggy system, I mean here: "there is a command |
| by which the user can restart the command he/she last issued and |
| Subversion will help complete that command", which differs from our |
| loggy system in the way that it will return the working copy to a |
| defined (but to the user unknown) state. |
| * Transactional/ repairable tree state (with which I mean something |
| which achieves the same as our loggy system, but better). |
| * Case sensitive filesystem aware / resilient |
| * Working copy stability; a number of scenario's with switch and |
| update obstructions used to leave the working copy unrecoverable |
| * Client side 'true renames' support where one side can't be committed |
| without the other (relates to issue #876) |
| |
| ###JSS: Perhaps this is obvious... I think that requirement is fine for the |
| user doing the commit. We still need to remember that another user doing |
| the update may not have authz permission to the directory it was renamed |
| into or may have a checkout of a sub-tree and that target directory may |
| not exist. Likewise, the original location might be unavailable too. |
| |
| * Change detection should become entirely internal to libsvn_wc (referring |
| to the fact that libsvn_client currently calls svn_wait_for_timestamps()), |
| even though under 'use-commit-times=yes', this waiting is |
| completely useless. |
| * Last-modified recording as a preparation for solving issue #1256 and |
| as defined in this mail, also linked from the issue: |
| http://svn.haxx.se/dev/archive-2006-10/0193.shtml |
| * Representing "this node is part of a replaced-with-history tree and |
| I'm *not* in the replacement tree" as well as "... and I'm deleted |
| from the replacement tree" [issues #1962 and #2690] |
| |
| |
| Would-be-very-nice-to-have's |
| ============================ |
| |
| * Multiple users with a single working copy (aka shared working copy) |
| * Ending up with an implementation which can use current WCs |
| (without conversion) |
| * Working copies/ metadata storages without local storage of text-bases |
| (other than a few cached ones) |
| |
| |
| Non-goals |
| ========= |
| |
| * Off-line commits |
| * Distributed VC |
| |
| Open items / discussion points |
| ============================== |
| |
| * Files changed during the window "sent as part of commit" to |
| "post commit wc processing"; these are currently explicitly |
| supported. Do we want to keep this support (at the cost of speed)? |
| * Single working copy lock. Should we have one lock which locks the |
| entire working copy, disabling any parallel actions on disjoint |
| parts of the working copy? |
| * Meta data physical read-only marking (as in wc-1.0). Is it still |
| required, or should it become advisory (ie ignore errors on failure)? |
| * Is issue #1599 a real use-case we need to address? |
| (Loosing and regaining authz access with updates in between) |
| |
| |
| Problems in wc-1.0 |
| ================== |
| |
| * There's no way to clear unused parts of the entries cache |
| * The code is littered with path calculations in order |
| to access different parts of the working copy (incl. admin areas) |
| * The code is littered with direct accesses to both wc files and |
| admin area files |
| * It's not always clear at which time log files are being processed |
| (ie transactions are being committed), meaning it's not always |
| clear at which version of a tree one is looking at: the pre or post |
| transformation versions... |
| * There's no support for nested transactions (even though some |
| functions want to start a new transaction, regardless whether one |
| was already started) |
| * It's very hard to determine when an action needs to be written |
| to a transaction or needs to be executed directly |
| * All code assumes local access to admin (meta)data |
| * The transaction system contains non-runnable commands |
| * It's possible to generate combinations of commands, each of which |
| is runnable, but the series isn't |
| * Long if() blocks to sort through all possible states of |
| WORKING, ACTUAL and BASE, without calling it that. |
| * Large if() blocks dealing with the difference between file and |
| directory nodes |
| * Many special-handling if()s for svn:special files |
| * Manipulation of paths, URLs and base-text paths in 1 function |
| * 'Switchedness' of subdirectories has to be derived from the |
| URLs of the parent and the child, but copied nodes also have |
| non-parent-child source URLs... (confusing) |
| * Duplication of data: a 'copied' boolean and a 'copy_source' URL field |
| * Checkouts fail when checking out files of different casing to a case |
| insensitive filesystem |
| * Checkouts fail when marking working copy admin data as read-only |
| is a non-supported FS operation (VFAT or Samba mounts on Linux have |
| this behaviour) |
| * Obstructed updates leave operations half done; in case of a switch, |
| it's not always possible to switch back (because the switch itself |
| may have left now-unversioned items behind) |
| * Directories which have their own children merged into them (which happens |
| when merging a directory-add) won't correctly fold the children into |
| schedule==normal, but instead leave them as schedule==add, resulting in |
| a double commit (through HTTP, other RA layers fold the double add, but |
| that's not the point) [see issue #1962] |
| * transaction files (ie log files) are XML files, requiring correct |
| encoding of characters and other values; given the short expected |
| life-time of a log file and the fact that we're almost completely sure |
| the log file is going to be read by the WC library anyway (no interchange |
| problems), this is a waste of processing time |
| * No strict separation between public and internal APIs: many public |
| APIs also used internally, growing arguments which *should* only |
| matter for internal use |
| |
| |
| Possible solutions |
| ================== |
| |
| Developer sanity |
| ---------------- |
| Strict separation between modules should help keep code focused at one |
| task. Probably some of the required user-specific behaviours can (and |
| should) be hidden behind vtables; for example: setting the file stamp |
| to the commit time, last recorded time or leaving it at the current time |
| should be abstracted from. |
| |
| Access to 'text bases' is another one of these areas: most routines in |
| wc-1.0 don't actually need access to a file (a stream would be fine as |
| well), but since the files are there, availability is assumed. |
| When abstracting all access into streams, the actual administration of |
| the BASE tree can be abstracted from: for all we know the 'tree storage |
| module' may be reading the stream directly off the repository server. |
| [The only module in wc-1.0 which *requires* access to the files is |
| the diff/merge library, because it rewinds to the start of the file |
| during its processing; an operation not supported by streams... and even |
| then, if these routines are passed file handles, they'll be quite |
| happy, meaning they still don't need to know where the text base / |
| source file is...] |
| |
| ###GJS: the APIs should use streams so that we can decompress as the |
| stream is being read. the diff library will need a callback of some |
| kind to perform the rewind, which will effectively just close and |
| reopen the stream. if it rewinds *multiple* times, then we may want |
| to cache the decompressed version of the file. I'll |
| investigate. Given our metadata/base-text storage system, I suspect |
| it will be very easy to cache decompressed copies for a while. |
| |
| ###GJS: a very reasonable strategy is: non-binary files are compressed |
| by default. binaries are stored uncompressed. |
| future improvement: extension-based choices, or some other control |
| |
| In order to keep developers sane, it should be extremely clear at any |
| one time - when operating on a tree - which tree is being operated upon. |
| |
| One way to prevent the lengthy 'if()' blocks currently in wc-1.0, would be |
| to design a dispatch mechanism based on the path-state in WORKING/BASE and the |
| required transformation, dispatching to (small) functions which perform |
| solely that specific task. |
| #####XBC Do please note that this suggests yet another instance of |
| pure polymorphism coded in C. This runs contrary to the |
| developer sanity requirement. |
| ###GJS: agreed with XBC. |
| |
| |
| Speed |
| ----- |
| wc-1.0 assumes the WORKING tree and the ACTUAL tree match, but then |
| goes out of its way to assure they actually do when deemed important. |
| The result is a library which calls stat() a lot more often than need be. |
| |
| One of the possible improvements would be to make wc-ng read all of |
| the ACTUAL state (concentrated in one place, using apr_stat()), keeping |
| it around as long as required, matching it with the WORKING state before |
| operating on either (not only when deemed important!). |
| |
| ###GJS: working copy file counts are unbounded, so we need to be |
| careful about keeping "all" stat results in memory. I'll certainly |
| keep this in mind, however. |
| |
| Working from the ACTUAL tree will also prove to be a step toward clarity |
| regarding the exact tree which is being operated upon. |
| |
| [This suggestion from wc-improvements also applies to wc-ng:] |
| Most operations are I/O bound and have CPU to spare. Consider the virtue |
| of compressed text bases in order to reduce the amount of I/O required. |
| |
| Another idea to reduce I/O is to eliminate atomic-rename-into-place for |
| the metadata part of the working copy: if a file is completely written, |
| store the name of the base-text/prop-text in the entries file, which gets |
| rewritten on most wc-transformations anyway. |
| |
| |
| Cross node type change representation |
| ------------------------------------- |
| ####EHU To be done |
| |
| |
| Flexibility of metadata storage |
| ------------------------------- |
| There are 3 known models for storing metadata as requested by different |
| groups of users: |
| |
| - in-subtree metadata storage (.svn subdir model, as in wc-1.0) |
| ###GJS: euh... aren't we axing this? who has *requested* this? |
| - in-'tree root' metadata storage (working copy central) |
| - detached metadata storage (user-central) |
| - in $HOME/.subversion/ |
| - in arbitrary location (e.g. $HOME is a (slow) NFS mount, and we |
| want the metadata on a local drive, such as /var/...) |
| |
| A solution to implementing each of these behaviours in order to satisfy |
| the wide range of use-cases they solve, would be to define a module |
| interface and implement this interface three times (possibly using vtables). |
| |
| Note that using within-module vtables should be less problematic than our |
| post-1.0 experiences with public vtables (such as the ra-layer vtable): |
| implementation details are allowed to differ between releases (even patch |
| releases). |
| |
| ###GJS: note that we are talking about both metadata AND base-text |
| content. (and yeah, optional and compresses base-texts can be done |
| during this rewrite) Also note that we might be able to share |
| base-text content across working copies if they are all keyed by |
| the MD5 hash into storage directories (under the user-central model) |
| |
| ###GJS: I don't think vtables are needed here. This is simply altering |
| the base location, not a whole new implementation. My plan is to |
| default to the "tree root" model with a .svn subdirectory. If a |
| .svn subdir is not found, then we fall back to looking in the |
| $HOME/.subversion/ directory (some subdir under there). If we |
| *still* don't find it, then some config options will point us to |
| the metadata/base-text location. |
| |
| ###GJS: my plan is to upgrade the working copy if we find a pre-1.6 |
| working copy. all the data will be lifted from the multiple .svn |
| subdirectories, and relocated to the "proper" storage location. |
| This will be a non-reversable upgrade, and will preclude pre-1.6 |
| clients from using that working copy again. |
| Note: because of the "destructive" nature of this upgrade, and the |
| expected duration, we may want to require the user to perform an |
| explicit action in order to complete the upgrade. However, 1.6 will |
| not be able to *modify* wc-1.0 metadata -- just read it in order to |
| upgrade it to the new storage system. |
| |
| When svn detects an old working copy, then it will error out and |
| request that the user run "svn cleanup" to upgrade their working copy |
| to the new format. |
| |
| The metadata location is determined at one of two points: |
| |
| * checkout time |
| * upgrade time |
| |
| According to the user's config, the metadata will be placed in one of |
| three areas: |
| |
| wcroot: at the root of the working copy in a .svn subdirectory |
| home: in the .subversion/wc/ subdirectory |
| /some/path: stored in the given path |
| |
| All wcroot directories will have a .svn subdirectory. In that |
| directory will be the datastore, or there will be a file that provides |
| two pieces of information: |
| |
| * absolute path to the (centralized) metadata |
| * absolute path of where this wcroot was created |
| |
| With this information, we can link a wcroot to its metadata in the |
| centralized store. If the user has moved the wcroot (the stored path |
| is different from the current/actual path), then Subversion will exit |
| with an error. The user must then ###somehow tell svn that the wc has |
| been copied (duplicate the metadata for the wcroot) or moved (tweak |
| the path stored in the metadata and in the linkage file). Subversion |
| is unable to programmatically determine which operation was used. |
| |
| Note that we use "svn cleanup" as the trigger to *perform* the |
| upgrade. The amount of file opens, parsing, moving, deleting, etc is |
| expected to consume significant amounts of I/O and (thus) cannot |
| simply be done on-the-fly without the user's knowledge and consent. |
| |
| |
| Transaction duration / memory management |
| ---------------------------------------- |
| The current pool-based memory management system is very good at managing |
| memory in a transaction-based processing model. In the wc library, a |
| 'transaction' often spans more than one call into the library. We either |
| need a sane way to handle this kind of situation using pools, or we may |
| need a different memory management strategy in wc-ng. |
| |
| Working copy stability |
| ---------------------- |
| In light of obstructed updates it may not always be desirable to be able |
| to resume the current operation (as currently is the case): in some cases |
| the user may want to abort the operation, in other cases the user may |
| want to resolve the obstruction before re-executing the operation. |
| |
| The solution to this problem could be 'atomic updates': receiving the |
| full working copy transformation, verifying prerequisites, creating |
| replacement files and directories and when all that succeeds, update |
| the working copy. |
| |
| Full workin' copy unit tests: |
| Exactly because the working copy is such an important part of the |
| Subversion experience *and* because of the 'reputation' of wc-1.0, |
| we need a way to ensure wc-ng completely performs according to our |
| expectations. *The* way to ensure we're able to test the most contrived |
| edge-cases is to develop a full unit testing test-suite while developing |
| wc-ng. This will both be a measure to ensure working copy stability |
| as well as developer sanity: in the early stages of the wc-ng develop- |
| ment process, we'll be able to assess how well the design holds up |
| under more difficult 'weather'. |
| |
| ###GJS: agreed. as much as possible, when I (re)implement the old APIs |
| in terms of the new APIs, then I'll write a whitebox test. we'll |
| see how long I keep that up :-P |
| |
| Transactional updates |
| --------------------- |
| |
| .. where 'update' is meant as 'user command', not 'svn update' per se. |
| |
| When applied to files, this can be summarized as: |
| |
| * Receive transformations (update, delete, add) from |
| the server, |
| |
| |
| Prerequisites for a good wc implementation |
| ========================================== |
| |
| These prerequisites are to be addressed, either as definitions |
| in this document, or elsewhere in the subversion (source) tree: |
| * Well defined behaviour for cross-node type updates/merges/.. |
| (tree conflicts in particular) |
| * Well defined behaviour for special file handling |
| * Well defined behaviour for operations on locally missing items |
| (see issue #1082) |
| * Well defined change detection scheme for each of the different |
| last-modified handling strategies |
| * No special handling of symlinks: they are first class versioned objects |
| * Well defined behaviour for property changes on updates/merges/... |
| (this is a problem which may resemble tree conflicts!), |
| including 'svn:' special properties |
| * File name manipulation routines (availability) |
| * File name comparison routines (!) (availability; which compensate |
| for the different ways Unicode characters can be represented |
| [re: NFC/NFD Unicode issue]) |
| |
| ###JSS: Talking with ehu on IRC when I asked him about how to handle this |
| issue: "if we accept that some repositories will be unusable with wc-ng, |
| then we can standardize anything that comes in from the server as well as |
| the directory side into the same encoding. we'd be writing files with the |
| standardized encoding." The rest of this conversation centered around the |
| fact that either APR or the OS will convert the filename to the correct |
| form for the filesystem when doing the stat() call. Note, ehu says: "(we'll |
| need to retain the filename we got from the server though: we'll need it to |
| describe the file through the editor interface: the server still allows all |
| encodings.)" |
| |
| * URL manipulation routines (availability) |
| * URL comparison routines (availability; which compensate for |
| different ways the same URL can be encoded; see issue #2490) |
| * Modularization |
| * Agree on a UI to pull in other parts of the same repository |
| (NOT svn:externals) [relates to issue #1167] |
| #####XBC I submit this is a server-side feature that the client |
| (i.e. the WC library) should not know about. |
| * Agree on behaviour for update on moved items (relates to issue #1736) |
| * Case-sensitivity detection code to probe working copy filesystem |
| |
| |
| Modularization |
| ============== |
| |
| Strict separation must be applied to a number of modules which can be |
| recognised. This will help prevent spaghetti code as in wc-1.0 where |
| one piece of code manipulates paths to a working copy file, its URL |
| *and* the path to the base file. |
| |
| For now, these APIs can be separated: |
| |
| - the public API (presumably not to be used by any internal |
| processing, but presents functionality to working copy users) |
| #####XBC This is really required of all our module public APIs. |
| - tree administration API (required for BASE, TARGET and WORKING) |
| Admins which files are part of the tree, which ones map to |
| which repositories and which textbase / propbase files belong |
| to which local files. [should provide checkpointing functionality |
| for use with transactional tree modifications API] |
| - tree access API (required for BASE, WORKING, TARGET and ACTUAL) |
| Gives access to the content of the nodes in a tree |
| - props |
| - text bases (for files) |
| - child nodes (for directories) |
| - transactional tree modifications API (applicable to all trees, |
| ###EHU do we provide the same interface to BASE/WORKING as for ACTUAL?) |
| - tree transformation (required for update/switch/merge updating |
| BASE, WORKING and ACTUAL), meaning all of tree changes, file |
| changes and metadata changes |
| - Working-copy changedness detection API |
| - Metadata access API (used by tree administration module(s)) |
| - Event hooks (in order to be able to implement different |
| timestamp-setting strategies and possibly more) |
| |
| These APIs will be implemented by these (currently known) modules: |
| |
| - tree administration |
| * wc_adm |
| - tree access |
| * wc_acc |
| - transactional tree modifications |
| * wc_log |
| - tree transformation |
| * wc_trans |
| - working copy changedness detection |
| wc_detect vtable-based API implemented by these modules: |
| * tree crawler ('inspired' by wc-1.0) |
| * tree marker (inspired by 'p4 edit') |
| - metadata access API |
| wc_macc vtable-based API implemented by these modules: |
| * tree spread ('inspired' by wc-1.0) |
| * tree root (storing all metadata in the tree root (think darcs)) |
| * central depot (storing 'somewhere' locally, possibly $HOME) |
| this central store would open up the possibility to share |
| text bases/prop bases across checkouts |
| * non-local (retrieving all text and prop-bases from the server, |
| except for a number of cached ones) ###EHU: maybe this is |
| orthogonal to the question where metadata is stored: in all |
| situations, you *could* choose not to keep local copies |
| - Event hooks for the union of all paths in (BASE, WORKING) |
| wc_hook event based single-callback API |
| for e.g. these events: |
| + props updated |
| + base text updated |
| + wc file updated |
| + update completed |
| + lock acquired |
| + lock released |
| (+ lock can't be acquired [in order to 'unprotect' |
| svn:needs-lock protected files which have been removed |
| from the repository?]) |
| to be implemented by these modules: |
| * use-commit-times |
| * versioned-mtimes |
| * versioned-execute-perm |
| * versioned-other-unix-perms |
| (* versioned-windows-perms?) |
| * needs-lock-updater |
| |
| Justification for the large number of modules, with a modest number |
| of different APIs is that the problem is really quite complex as shown |
| earlier in this document. |
| |
| Over the years, a large number of use cases have developed around |
| Subversion where different user groups have shown very valid use cases |
| for conflicting behaviours. Presumably, most of these we want to |
| retain. Some of the unimplemented ones have open issues indicating |
| there's at least an active interest. In order to prevent locking out |
| some of the current use cases adding support for the open issues, we |
| need a flexible modularized model. This model will also prevent that |
| we'll end up duplicating lots of code to support the different use cases. |
| #####XBC Such flexibility will bring the WC to the kind of |
| purgatory the RA layers are in. We promise feature and semantics |
| parity between them, and the result is that even a small change |
| in that layer requires knowledge of three different protocols |
| and four different implementations. |
| |
| Given the assumption of 'little code duplication', the choice for |
| having several modules which implement the same API (vtable) is |
| justifiable. |
| |
| ###GJS: disagree. I plan to have just one library and no plans for |
| vtables. there is very little need for distinct implementations, as |
| far as I can tell. |
| |
| |
| Implementation proposals |
| ======================== |
| |
| Classification of svn_wc_entry_t fields to BASE/WORKING |
| ------------------------------------------------------- |
| |
| [Note: This section is mainly to clarify the difference between the BASE |
| and WORKING trees, it's not here to mean that we actually need all these |
| fields in wc-ng!] |
| |
| Here are the mappings of all fields from svn_wc_entry_t to the BASE and |
| WORKING trees: |
| |
| +-------------------------------+------+---------+ |
| | svn_wc_entry_t | BASE | WORKING | |
| +-------------------------------+------+---------+ |
| | name | x | x (1)| |
| | revision | x | x (2)| |
| | url | x | x (2)| |
| | repos | x | x (3)| |
| | uuid | x | x (3)| |
| | kind | x | x | |
| | absent | x | | |
| | copyfrom_url | | x | |
| | copyfrom_rev | | x | |
| | conflict_old | | x | |
| | conflict_new | | x | |
| | conflict_wrk | | x | |
| | prejfile | | x | |
| | text_time | | = | |
| | prop_time | | = | |
| | checksum | x | x (2)| |
| | cmt_rev | x | x (2)| |
| | cmt_date | x | x (2)| |
| | cmt_author | x | x (2)| |
| | lock_token | x(6)| | |
| | lock_owner | x | | |
| | lock_comment | x | | |
| | lock_creation_date | x | | |
| | has_props | x | x (4)| |
| | has_prop_mods | | = | |
| | cachable_props | x(5)| x (4)| |
| | present_props | x | x (4)| |
| | changelist | | x | |
| | working_size | | = | |
| | keep_local | | = | |
| | depth | x | x | |
| | schedule | | | |
| | copied | | | |
| | deleted | | | |
| | incomplete | | | |
| +-------------------------------+------+---------+ |
| |
| (1) if this one differs from BASE, it must point to the source of a rename |
| (2) for an add-with-history |
| (3) or can we assume single-repository working copies? |
| (4) can differ from BASE for add-with-history |
| (5) why is this a field at all; can't the WC code know? |
| (6) locks apply to in-repository paths, hence BASE |
| |
| The fields marked with '=' are implementation details of internal detection |
| mechanisms, which means they don't belong in the public interface. |
| |
| Fields with no check are to become obsolete. 'schedule', 'copied' and |
| 'deleted' can be deducded from the difference between the BASE and WORKING |
| or WORKING and ACTUAL trees. 'incomplete' should become obsolete when the |
| goal of 'atomic updates' can be realised, in which case the tree can't be |
| in an incomplete yet locked state. This would also invalidate issue #1879. |
| |
| |
| Basic Storage Mechanics |
| ----------------------- |
| |
| All metadata will be stored into a single SQLite database. This |
| includes all of the "entry" fields *and* all of the properties |
| attached to the files/directories. SQLite transactions will be used |
| rather than the "loggy" mechanics of wc-1.0. |
| |
| ###GJS: note that atomicity across the sqlite database and the content |
| of the ACTUAL tree is freakin' difficult. idea to test: metadata |
| says "not sure of ACTUAL", and when ops complete successfully, then |
| we clear the flag. during any future operation, if the flag is |
| present, then we approach the ACTUAL with extreme prejudice. also |
| note that we can batch clearing of the flags as an optimistic |
| efficiency approach (since if we batch 100 and the last fails, then |
| the other 99 will be slower until the wc-ng determines the ACTUAL |
| is in fine shape and clears the flag for future operations). |
| |
| ###GJS: be wary of sqlite commit performance (based on some of my |
| prior experience with it). must have timing/debugging around the |
| commit operations. may need to use various transaction isolations |
| and/or batching of commits to get proper performance. thus, profile |
| output capability is mandatory to determine if we have issues, and |
| where they occur. |
| |
| ###JSS: I don't see how transactions by themselves can replace loggy. |
| Right now, if you abort something like 'svn update' or 'svn checkout', |
| loggy has recorded all the files to be downloaded, and will pick up |
| where it left off. We did this as an optimization to prevent |
| re-downloading a potentially large amount of data again. Seems like |
| we still need to provide that capability. |
| |
| ###GJS: sqlite transactions replace the atomicity that loggy was |
| originally designed for. it sounds like loggy is also be |
| used as a work queue, and that is easily handled in sqlite. |
| |
| Base text data will be stored in a multi-level directory structure, |
| keyed/named by the checksum (MD5 or SHA1) of the file. The database |
| will record appropriate mappings, content/compression types, and |
| refcounts for the base text files (for the shared case). We will use a |
| single level of directories: |
| |
| TEXT_BASE/7c/7ca344... |
| |
| With 100k files spread across all of a user's working copies, that |
| will put 390 files into each subdirectory, which is quite fine. If the |
| user grows to a million files, then 3900 per subdir is still |
| reasonable. Two levels would effectively mean one file per subdir in |
| typical situations, which is a lot of disk overhead. |
| |
| When the metadata is recorded in a central area (rather than the WC |
| root), then it is possible for the metadata and the base files to |
| become out of date with respect to all the working copies on the |
| system. We will revamp "svn cleanup" to re-tally the base text |
| reference counts, eliminate unreferenced bases, verify that the |
| working copies are still present, ensure the metadata <-> WC |
| integrity, deal with moves of metadata from central -> wc-root (can |
| happen if somebody rm -rf's the wc, then does a checkout and wants the |
| metadata at the wc-root (this time)), and other consistency checks. |
| |
| |
| Metadata Schemas |
| ---------------- |
| |
| see libsvn_wc/wc-metatdata.sql3 |
| |
| |
| |
| Random Notes |
| ------------ |
| |
| ### break down all modification operations to things that operate on a |
| small/fixed set of rows. if a large sequence of operations fails, |
| then it can leave the system in reparable state, since most were |
| performed. note that ACTUAL can change at any time, thus all mods |
| should be able to compensate for ACTUAL being something |
| unexpected. thus, the transformative operations should be able to |
| fail in such a way as to leave ACTUAL pretty bunged up. |
| |
| ### maybe use handles to refer to files/dirs? take input pathname, |
| convert to native charset, and return a handle for that. same |
| handle across varous schemas. |
| |
| note: could be very handy, as I'm thinking abspath for all names, |
| which gets to be pretty wordy for large working copies. |
| |
| note: with handles, a file entry could have "parent dir" cheaply, |
| and from that, we could derive repos_url/uuid from some dir table. |
| |
| ### probably want to special-case the checksum and BASETEXT entry for |
| the "empty file" |
| |
| |
| Code Organization |
| ----------------- |
| |
| libsvn_wc/wc_db.h (symbols: svn_wc__db_*) |
| Storage subsystem for the WC metadata/base-text information. |
| This is a private API, and the rest of the WC will be rebuilt |
| on top of this. |
| |
| This code deals with storage, and transactional modifications |
| of the data. |
| |
| Note: this is a random-access, low-level API. Editors will be |
| built on top of this layer. |
| |
| |
| svn_wc.h API |
| ------------ |
| |
| Note that we also have an opportunity to revamp the WC API. Things |
| like access batons will definitely disappear, but there will most |
| likely be great opportunities for other design changes. |
| |
| Note that removing access batons (and other API changes) will ripple |
| up until libsvn_client, and may even have an affect on *its* API. |
| |
| ### the form of a new API is unknown/TBD. |
| |
| |
| Implementation Plan |
| =================== |
| The following are tests which need to be accomplished for WC-NG. There |
| isn't a strict ordering here, but rather a possible plan. There may be |
| dependencies between some items, but that is left as an exercise for the |
| reader. |
| |
| * Pristine file management |
| * Properties management |
| * Tree management (BASE v. WORKING v. ACTUAL for APIs and storage) |
| * Journaled actions |
| * Finding/using the correct admin area |
| * Upgrading |
| - Including multiple heterogenous admin areas |
| * Move entries into SQLite |
| * Relocating datastore in useful ways |
| |
| Afterwards, we'll need: |
| * A second pass at the WC code to find/fix patterns and solutions. |
| * Revamp of WC API, to propagate up into libsvn_client. |
| * Reexamine any client/wc interactions, and look for final cleanups. |
| |
| Near-Term Plan |
| -------------- |
| |
| 1. convert entries.c to use sqlite directly. migrate 'entries' file |
| during this step. the sqlite file will be in-memory if we are not |
| allowed to auto-upgrade the WC; otherwise, we'll write the sqlite |
| database into .svn/ |
| note: the presence of 'wc.db' (or whatever its name) will indicate |
| a minimum format level. the user field in the database |
| contains the schema version which is our further format-level |
| descriptor value. |
| |
| 2. convert entries.c to use wc_db. shift the sqlite code into wc_db. |
| note: this is a separate step from 1. there is a paradigm shift |
| between how entries.c works and wc_db works. we want to |
| ignore that in Step 1, and then handle it in this Step. |
| note: put wc_db handle into lock->shared and share the handle |
| across all directories/batons. |
| |
| 3. convert props.c to use wc_db. migrate props to db simultaneously. |
| |
| 4. incremental shift of pristines from N files into pristine db. |
| note: we could continue to leave .revert-base while we migrate the |
| primary base into the pristine dataset. |
| |
| 5. shift libsvn_wc from using entries.h to using wc_db.h. |
| note: since entries.h is "merely" a wrapper for wc_db.h, this will |
| allow the libsvn_wc to start using the new wc_db APIs |
| wherever it is easy/possible. |
| goal: all libsvn_wc code uses wc_db.h, and entries.h exists solely |
| to support old backwards-compat code. |
| |
| 6. centralize the metadata and pristines |
| note: this will also involve merging datastores |
| |
| |
| Other sections |
| ============== |
| remain to be done |