|  |  | 
|  | Auto-versioning Research Notes | 
|  | ============================== | 
|  |  | 
|  | [Note from sussman:  if you don't understand rfc 2518 (webdav) and rfc | 
|  | 3253 (deltav) intimately, you'll probably not understand these notes. | 
|  | Read the rfcs, and also read the 'webdav-general-summary' notes in | 
|  | this directory as a quick review.] | 
|  |  | 
|  |  | 
|  | Phase 1: a lone PUT results in an immediate commit.  This can be done | 
|  | purely via libsvn_fs, using an auto-generated log message. | 
|  | This covers the "drag-n-drop" use-case -- when a user simply | 
|  | drops a file into a mounted repository. | 
|  |  | 
|  | Phase 2: come up with a system for dealing with the more common | 
|  | class-2 DAV sequence:  LOCK, GET, PUT, PUT, PUT, UNLOCK. | 
|  | This covers most DAV clients, such as MSOffice and OpenOffice. | 
|  |  | 
|  | On first glance, it seems that Phase 1 should be doable by simply | 
|  | noticing a PUT on a public URI, and triggering a commit.  But | 
|  | apparently this completely circumvents the fact that mod_dav *already* | 
|  | has a notion of auto-versioning, and we want to mesh with that.  This | 
|  | feature was added by the Rational guys, but isn't well-reviewed by | 
|  | gstein.  Apparently mod_dav defines a concept of whether resources are | 
|  | auto-versionable, and then deals with the checkout/modify/checkin of | 
|  | those resources.  So *first* we need to understand the existing | 
|  | system before we can do anything else, and figure out how mod_dav_svn | 
|  | can act as a "provider" to that framework. | 
|  |  | 
|  | (Greg also warns:  this autoversioning feature added by Rational was | 
|  | done based on an OLD version of the deltaV RFC, so watch out for | 
|  | mismatches with the final RFC 3253.) | 
|  |  | 
|  | [gstein sez: Note: the reason for the auto-versioning framework is to | 
|  | take the load off of the provider for modeling WebDAV's auto-vsn | 
|  | concepts to clients. mod_dav itself can deal with the property | 
|  | management, sequence of operations, error responses, whatnot. That | 
|  | said, it is also open to change and refinement -- there is no way that | 
|  | it is set in stone. That only happens once an Open Source | 
|  | implementation has used it.] | 
|  |  | 
|  |  | 
|  | Phase 2 is more complicated: | 
|  |  | 
|  | * Greg proposed a system whereby the LOCK creates a txn, the PUTs | 
|  | only write to the txn (the txn name is the lock "token"), and the | 
|  | UNLOCK commits the txn.  The problem with this is that DAV clients | 
|  | expect real locking here, and this is just a "fake out": | 
|  |  | 
|  | - If client #1 LOCKS a file, then when client #2 does a GET, | 
|  | they should see the latest version that client #1 has PUT, not | 
|  | some older version. | 
|  |  | 
|  | [gstein sez he doesn't believe that the GET sans locktoken has | 
|  | to reflect the latest PUT-with-locktoken.  I disagree. See | 
|  | below for a response from the DeltaV IETF Working Group] | 
|  |  | 
|  | - Also, if client #2 tries to work on the file, its LOCK request | 
|  | should be denied if it's already locked.  Users will be mighty | 
|  | pissed if they get a LOCK on the file, but when they finally | 
|  | close MSWord, they get an out-of-date error! | 
|  |  | 
|  | [gstein sez this is only if we take an exclusive lock.  shared | 
|  | locks are more interesting.  I say, yah, but so what.  We only | 
|  | care about write-locks anyway, which according to 2518, are | 
|  | always exclusive, I think.  shared-locks are just read-locks, | 
|  | and can be done with unversioned props.] | 
|  |  | 
|  | * It seems that the Right Way to do this is to actually design and | 
|  | implement some kind of locking system.  We've had a huuuuge | 
|  | discussion on the dev list about this, and folks like jimb and | 
|  | kfogel want the system to be more of a "communication" system, | 
|  | rather than a system for unconditionally handcuffing naughty | 
|  | users.  This goal doesn't necessarily contradict the needs of DAV | 
|  | clients, however.  Smart svn clients should be able to easily | 
|  | override a LOCK failure, perhaps by using some special 'Force: | 
|  | true' request header.  Dumb DAV clients won't know about this | 
|  | technique, so they effectively end up with the 'handcuff' locking | 
|  | system they expect. | 
|  |  | 
|  | [brane sez: Exclusive and shared lcoks can both be used for | 
|  | communication, and which one you use depends on context -- | 
|  | see below.] | 
|  |  | 
|  | ---------------------------------------------------------------- | 
|  |  | 
|  | I sent a mail off to the deltaV working group, asking about the | 
|  | locking issue. | 
|  |  | 
|  | Geoff Clemm came back and said, "yah, if a lock-holder does a PUT to a | 
|  | locked resource, then the changes should be immediately visible to | 
|  | *all* users who do a GET, whether they hold the lock token or not." | 
|  |  | 
|  | This is my (sussman)'s intuition too, but it throws a big wrench into | 
|  | gstein's proposal about how to do Phase 2. | 
|  |  | 
|  | [brane sez: Not really. All you have to do is maintain a list of the | 
|  | public URLs of objects that were actually modified through a "locked" | 
|  | PUT -- *not* the bubble-up dirs -- and you have to maintain that | 
|  | anyway, if you want to implement exclusive locks. A GET will just | 
|  | check that list first, and if it finds the URL, look into the | 
|  | associated txn instead of HEAD.] | 
|  |  | 
|  | [ gstein: note that list is cross-txn; we probably want a new dbm in | 
|  | the REPOS/dav/ subdir. map the repos path (derived from the URL) to | 
|  | the txn-name containing the most recent copy. | 
|  |  | 
|  | my hope was to avoid additional state like this, and encode that | 
|  | state in something like the locktoken. ] | 
|  |  | 
|  | ---------------------------------------------------------------- | 
|  |  | 
|  | Here are some thoughts Bill Tutt and I shared on IRC some time | 
|  | ago. They're more about locking than auto-versioning, but the two | 
|  | concepts are related, so this brain dump might as well go in here. | 
|  |  | 
|  | <<<It's pretty late/early right now, so I'll just dump Bill's mail in | 
|  | here for reference, and edit it later.>>> | 
|  | ----- | 
|  | From: "Bill Tutt" <billtut@microsoft.com> | 
|  | To: "Branko Cibej" <brane@xbc.nu> | 
|  | Subject: Locks Discussion | 
|  | Date: Wed, 4 Sep 2002 15:49:54 -0700 | 
|  |  | 
|  | Edited from IRC: | 
|  | <brane> "svn edit" has other uses, too | 
|  | <brane> e.g., you could check out a wc that has only checksums, not text | 
|  | bases, and makes wc files read-only. "svn edit" would make them | 
|  | writable, and temporarily store the text base. it doesn't have to cerate | 
|  | a lock. | 
|  | <brane> "svn edit" can be completely client-side. | 
|  |  | 
|  | It could, but ideally it would just work as if it were connected. i.e. | 
|  | executing "svn note" if connected, and not if not. i.e. laptop on bus | 
|  | mode. | 
|  |  | 
|  | <brane> basically, you're non-exclusive lock would add an unversioned | 
|  | annotation to an object. | 
|  | <brane> ok. so we have "svn lock", which is an exclusive lock | 
|  | <brane> and "svn edit", which may or may not create locks | 
|  |  | 
|  | At a minimum annotates the file in the WC, for the "svn commit" default | 
|  | log message case below. At the far out end, it would create an exclusive | 
|  | lock if the file (via the pluggable diff protocol) was determined to be | 
|  | non-mergable. | 
|  |  | 
|  | <brane> and "svn note", which just adds a note to the object | 
|  | <brane> and "svn lock" can also add a note to the object | 
|  | <brane> and "svn unlock" takes the note away | 
|  | <brane> and "svn rmnote" takes the note away, too | 
|  | <brane> and "svn commit" clears locks and removes notes | 
|  | <brane> and "svn commit" uses the note (if any, keyed off the username) | 
|  | as the default log message | 
|  | <brane> "svn note" and "svn rmnote", always contacts the server | 
|  |  | 
|  | "svn revert" now becomes "svn revert" + "svn rmnote" all rolled into | 
|  | one. | 
|  | "svn rmnote" undos (as appropriate) any annotation on a WC entry. If | 
|  | created via "svn note" functionality, then the server is contacted. If | 
|  | via "svn edit" disconnected client functionality, then the server is NOT | 
|  | contacted. | 
|  |  | 
|  | I've edited out my original comments, and inserted my own post log | 
|  | comments. | 
|  |  | 
|  | Bill | 
|  | ---- | 
|  | Do you want a dangerous fugitive staying in your flat? | 
|  | No. | 
|  | Well, don't upset him and he'll be a nice fugitive staying in your flat. | 
|  | ----- | 
|  |  | 
|  |  | 
|  | ----------------------------------------------- | 
|  |  | 
|  | PHASE 1 STRATEGY: | 
|  |  | 
|  | * ? options response includes autoversioning feature... required? | 
|  |  | 
|  | * all resources gain new live property:  'DAV:auto-version'.  This | 
|  | property will always be set to 'DAV:checkout-checkin'.  (There are | 
|  | four possible values, and this is the one that has nothing | 
|  | whatsoever to do with locking.) | 
|  |  | 
|  | * use-case 1:  PUT or PROPPATCH against existing VCR, or a PUT of a | 
|  | new VCR. | 
|  |  | 
|  | * use-case 2: DELETE of VCR | 
|  |  | 
|  | * use-case 3: MKCOL (totally new, by definition) | 
|  |  | 
|  |  | 
|  | ----------------------------------------------------------- | 
|  |  | 
|  | Analysis of dav_svn_put() | 
|  | ========================= | 
|  |  | 
|  | At the moment, ra_dav is only attempting to PUT WR's. | 
|  |  | 
|  | mod_dav, however, already has an autoversioning infrastructure, and it | 
|  | currently attempts to bookend the stream-writing with an auto-checkout | 
|  | and auto-checkin.  But mod_dav_svn doesn't support those operations | 
|  | yet, so they're just no-ops. | 
|  |  | 
|  | By supporting auto_checkout and auto_checkin, we're adding the magic | 
|  | ability for a PUT on a VCR to happen: the VCR is magically transformed | 
|  | 'in place' into a WR, and then back again. | 
|  |  | 
|  | auto_checkout: | 
|  |  | 
|  | * tries to checkout parent resource if deemed necessary, i.e. the | 
|  | resource doesn't exist, or if explicit parent checkout was | 
|  | requested by caller: | 
|  |  | 
|  | - vsn_hooks->auto_versionable() | 
|  |  | 
|  | We should *always* return DAV_AUTO_VERSION_ALWAYS for now. | 
|  | The other values require that locks exist or not, and we're | 
|  | not supporting any kind of locks yet. | 
|  |  | 
|  | - vsn_hooks->checkout(parent, 1 /*auto-checkout*/...) | 
|  |  | 
|  | So we need to allow an auto-checkout of a parent VCR. | 
|  | See checkout() discussion below. | 
|  |  | 
|  | * if the resource doesn't exist, then create the resource: | 
|  |  | 
|  | - vsn_hooks->vsn_control(resource, NULL). | 
|  |  | 
|  | We need to implement this from scratch.  For now, we only | 
|  | allow a NULL target, which means, 'create an empty file'.  The | 
|  | resource itself must be tweaked in-place into a true VCR. | 
|  |  | 
|  | * if the resource exists but isn't a WR, check it out: | 
|  |  | 
|  | - vsn_hooks->checkout(resource, 1 /*auto-checkout*/...) | 
|  |  | 
|  | This routine currently takes a VR and an activity, and returns | 
|  | a totally new WR. | 
|  |  | 
|  | Here's what we need to make happen if we get 'auto-checkout' | 
|  | flag passed in: | 
|  |  | 
|  | - verify we have a VCR, and get the VCR's VR. | 
|  | - create a new activity (txn) | 
|  | - checkout the VR into the activity, creating a WR. | 
|  | - don't return the WR via pointer, but instead tweak the | 
|  | VCR to look like the WR (think about how to do this.) | 
|  | [ gstein: the docco for checkout() states you're allowed | 
|  | to tweak the passed-in resource; that is why it is | 
|  | non-const ] | 
|  |  | 
|  |  | 
|  | dav_svn_put() then attempts to push data into the WR's stream, no prob. | 
|  |  | 
|  |  | 
|  | auto_checkin: | 
|  |  | 
|  | * if something went wrong when PUTting data into the resource's | 
|  | stream, then this function attempts to either | 
|  |  | 
|  | - vsn_hooks->uncheckout()  [if a resource or parent was checked out] | 
|  |  | 
|  | I guess we would abort the svn txn and magically change the WR back | 
|  | into the VCR?  (think about how to do this.) | 
|  |  | 
|  | [ gstein: the dav_resource is non-const; just change it. we | 
|  | aren't talking a stateful change, just altering a runtime | 
|  | structure. ] | 
|  |  | 
|  | - vsn_hooks->remove_resource()  [if a new resource was created] | 
|  |  | 
|  | No prob.  This just calls svn_fs_delete_tree() on the newly | 
|  | created object. | 
|  |  | 
|  | * otherwise, in normal case, if resource was checked out: | 
|  |  | 
|  | - vsn_hooks->checkin(resource) | 
|  |  | 
|  | Need to write this routine!  It would commit the txn hidden | 
|  | within the WR, using an auto-generated log message. | 
|  | Furthermore, it needs to possibly return the new VR that was | 
|  | created, and convert the WR resource back into a VCR that | 
|  | points to the new VR. | 
|  |  | 
|  | (Do our VCR's point to VR's right now? | 
|  |  | 
|  | [ gstein: VCRs never "point"; semantically, they just get | 
|  | updated with properties and content to match a VR. ] | 
|  |  | 
|  | just implicitly through the checked-in property, right?) | 
|  |  | 
|  | * then, if parent was checked out too, | 
|  |  | 
|  | - vsn_hooks->checkin(parent) | 
|  |  | 
|  | Oops, this is a problem.  it's very likely that we just | 
|  | committed the txn in the previous call to checkin().  the best | 
|  | strategy here, I suppose, is to not throw an error... i.e. if | 
|  | the txn no longer exists, just do nothing.  (cmpilato isn't | 
|  | sure what happens if you try to open_txn() on a txn that is | 
|  | already committed.) | 
|  |  | 
|  | [ gstein: mod_dav should auto-checkin a set of resources rather | 
|  | than one at a time. the provider can then do it atomically, | 
|  | or one at a time, as they see fit ] | 
|  |  | 
|  |  | 
|  | [ gstein: note that we're more than likely going to need to update the | 
|  |  | 
|  | mod_dav provider APIs. I think the answer is to add a binary API | 
|  | version to the new ap_provider() interface, to publish a mod_dav | 
|  | provider (binary) API version, and to state that the old provider | 
|  | registration function now throws an error (by definition, modules | 
|  | using it would be obsolete). as we rev the API, we just bump the | 
|  | published mod_dav API version. | 
|  |  | 
|  | one problem here is that the current httpd release strategy might | 
|  | get in our way; I need to review some of the recent decisions to see | 
|  | how that affects us from an ongoing "httpd needs some fixes for svn" | 
|  | standpoint. | 
|  | ] | 
|  |  | 
|  |  | 
|  | ----------------------------------------------------------- | 
|  |  | 
|  | Late 2004 Notes: | 
|  |  | 
|  | We're working on a real locking system now.  Eventually, we'll be | 
|  | able to use this feature to complete autoversioning ("phase 2" | 
|  | above.) | 
|  |  | 
|  | - remember that we'll need to be able to look up a lock in the | 
|  | lock-table by UUID.  Generic DAV clients use UUID URIs to talk | 
|  | about locks. | 
|  |  | 
|  | - MSWord locks a document with a timeout of 180 seconds, then | 
|  | continuously re-LOCKs every so often, passing the existing | 
|  | lock-token back in an If: header.   mod_dav_fs returns the same | 
|  | lock-token UUID (presumably with a newer expiration time).  Our | 
|  | current implementation doesn't allow for mutable lock tokens.  We | 
|  | need to make sure that this doesn't mess up MSWord... that it's | 
|  | usin the *last* token to renew locks, not the first one. | 
|  |  | 
|  |  |