| A Streamlined HTTP Protocol for Subversion |
| |
| GOAL |
| ==== |
| |
| Write a new HTTP protocol for svn -- one which is entirely proprietary |
| and designed for speed and comprehensibility. |
| |
| |
| PURPOSE / HISTORY |
| ================= |
| |
| Subversion standardized on Apache and the WebDAV/DeltaV protocol as a |
| back in the earliest days of development, based on some very strong |
| value propositions: |
| |
| A. Able to go through corporate firewalls |
| B. Zillions of authn/authz options via Apache |
| C. Standardized encryption (SSL) |
| D. Excellent logging |
| E. Built-in repository browsing |
| F. Caching within intermediate proxies |
| G. Interoperability with other WebDAV clients |
| |
| Unfortunately, DeltaV is an insanely complex and inefficient protocol, |
| and doesn't fit Subversion's model well at all. The result is that |
| Subversion speaks a "limited portion" of DeltaV, and pays a huge |
| performance price for this complexity. |
| |
| A typical network trace involves dozens of unnecessary turnarounds |
| where the client keeps asking for the same information over and over |
| again, all for the sake of following DeltaV. And then once the client |
| has "discovered" the information it needs, it often ends up making a |
| custom REPORT request anyway. Most svn operations are at least twice |
| as slow over HTTP than over the custom svnserve protocol. |
| |
| The existing HTTP protocol is also devilishly hard to comprehend or |
| extend, since it requires understanding of the DeltaV spec, and |
| exactly to what partial-degree we support that standrd. |
| |
| |
| REQUIREMENTS |
| ============ |
| |
| Write a new HTTP protocol for svn ("HTTP v2"). Map RA requests |
| directly to HTTP requests. |
| |
| * svn over HTTP should be much faster (eliminate extra turnarounds) |
| |
| * svn over HTTP should be almost as easy to extend as svnserve. |
| |
| * svn over HTTP should be comprehensible to devs and users both |
| (require no knowledge of DeltaV concepts). |
| |
| * svn over HTTP should be designed for optimum cacheability by web |
| proxies. |
| |
| * svn over HTTP should make use of pipelined requests when possible. |
| |
| |
| MILE-HIGH DESIGN |
| ================ |
| |
| * Write new mod_svn module. Design it to (optionally) run |
| side-by-side with mod_dav_svn on the same public URI. |
| |
| * Extend libsvn_ra_serf to detect the new Apache protocol and if |
| present, use it. |
| |
| * Client/server compatibility: |
| |
| - newer clients can still operate against old servers: they look |
| for new protocol in OPTIONS response; if not available, fall |
| back to making DeltaV requests. |
| |
| - older clients can still operate against new servers: mod_svn |
| DECLINEs any old-style DeltaV request, allowing mod_dav_svn to |
| handle it instead. |
| |
| * To upgrade a service, admins simply install mod_svn next to |
| mod_dav_svn. They then ask their users to "upgrade the client to |
| get better HTTP speed". |
| |
| In theory, mod_svn should operate completely standalone (and should be |
| tested this way.) If an admin wants to support older clients or add |
| webdav functionality (such as autoversioning), then mod_dav_svn can be |
| installed "behind" mod_svn at the same URI. |
| |
| |
| DESIGN |
| ====== |
| |
| |
| 1. Client-Server Negotiation |
| ---------------------------- |
| |
| The administrator makes an svn repository available via mod_svn at a |
| specific URI, which we'll refer to as the "repository root URI". |
| (This same URI might also be serviced by mod_dav_svn too.) |
| |
| mod_svn then advertises the new protocol in an OPTIONS response |
| against the repository root URI. It specifically includes a mininum |
| and maximum version number of the protocol it understands. |
| |
| ra_serf always starts an RA session with an OPTIONS request against |
| the repository root URI. If new protocol isn't present (or is an |
| unsuitable version), it falls back to DeltaV protocol. |
| |
| TODO: like svnserve, mod_svn may also want to advertise specific |
| features in its OPTIONS response. |
| |
| |
| 2. General Command Mechanism |
| ---------------------------- |
| |
| From here, the client initiates HTTP requests match up with the |
| svn_ra.h interfaces. Each RA 'command' takes a set of parameters |
| and represents a single network turnaround. |
| |
| The standard pattern is to follow the lead of the mercurial network |
| protocol and embed these commands in either HTTP/1.1 GET or POST |
| methods against the repository root URI. The command and parameters |
| are embedded into the request URI itself as standard query syntax. |
| |
| For example, if the repository is available at the root URI |
| '/repos', then a client might send requests like these: |
| |
| GET /repos?cmd=get-latest-rev |
| |
| GET /repos?cmd=rev-proplist&r=23 |
| |
| GET /repos/trunk/foo.c?cmd=get-file&r=23 |
| |
| In general, we try to make these requests line up with the |
| corresponding RA APIs. One exception, however, is that we don't |
| split the RA_session URI and the 'path' parameter into two pieces. |
| Using 'path' as a query parameter is weird. So for an RA call like |
| this: |
| |
| svn_ra_open(&ra_session, "/repos/trunk/src"); |
| svn_ra_blort(ra_session, "foo.c", 23); |
| |
| ...we'd issue a command like this: |
| |
| GET /repos/trunk/src/foo.c?cmd=blort&rev=23 |
| |
| For requests which require real input data in the bodies, (such as |
| 'update' or 'commit') we use a POST request. For example: |
| |
| POST /repos?cmd=update&targetrev=100 |
| |
| [body contains complete 'update report' describing working copy's |
| revisions; response is a complete editor-drive.] |
| |
| POST /repos?cmd=commit&keeplocks=true |
| |
| [body contains complete editor-drive from client, including |
| possible revision-props that need changing (like svn:log), as |
| well as any necessary lock-tokens. response is the newly |
| committed revision number.] |
| |
| |
| 3. Representation of structured data in request/response bodies |
| --------------------------------------------------------------- |
| |
| XML is out : there's a huge performance penalty for producing and |
| consuming it, which is why companies like Facebook and Google have |
| released 'fast wire serialization' libraries like Thrift and |
| Protocol Buffers. Unfortunately, these libraries require entire |
| structures to be held in memory in order to serialize/deserialze |
| them, and this isn't an option when dealing with something |
| (potentially) infinitely large like an editor-drive. |
| |
| Luckily, svnserve already has a nice lisp-like representation of the |
| editor drive, and we can share its parsing/unparsing code. |
| |
| We can also use this same representation for things like property |
| lists. |
| |
| ## TODO: flesh out examples here |
| |
| |
| 4. Requests |
| ----------- |
| |
| The following request types generally correspond to the routines |
| svn_ra.h API; where they diverge, they do so in order to improve |
| performance, cacheability, or pipelining potential. |
| |
| According to proper HTTP specs, GET requests do not contain request |
| bodies; if a request needs a body, we use POST instead. |
| |
| |
| * get-latest-rev |
| |
| GET /repos[/path]/|get-latest-rev |
| |
| response: revnum |
| cacheable: NO, this value changes all the time. |
| |
| Note: The [/path] portion is tolerated but ignored by the server, |
| since the HEAD revision is a repository-wide attribute. |
| |
| |
| * get-dated-rev |
| |
| GET /repos[/path]/!get-dated-rev/DATESTRING |
| |
| DATESTRING must be in subversion standard format |
| (e.g. 2008-10-15T12:29:52.526295Z). See svn_time.h. |
| |
| response: revnum |
| cacheable: SOMETIMES: assuming it's a non-HEAD rev |
| |
| Note: The [/path] portion is tolerated but ignored by the server, |
| since the revision being fetched is part of repository's general |
| history. |
| |
| |
| * change-rev-prop |
| |
| POST /repos[/path]/!change-rev-prop/REV/PROPNAME |
| |
| REV is a revision number. |
| PROPNAME is the name of a revision property, properly URI-encoded. |
| The body of the request contains eithr a binary value, or is empty |
| (in which case the revprop is deleted.) |
| |
| response: no body response. (response code 200 implies success.) |
| cacheable: NO, this is a write request. |
| |
| Note: The [/path] portion is tolerated but ignored by the server, |
| since the revprop being changed is a repository-wide feature. |
| |
| |
| * rev-proplist |
| |
| GET /repos[/path]/!rev-proplist/REV |
| |
| REV is a revision number. |
| |
| response: a list of property/value pairs (format TBD) |
| cacheable: NO, because revprops are mutable. |
| |
| Note: The [/path] portion is tolerated but ignored by the server, |
| since the revprops are a repository-wide feature. |
| |
| |
| * rev-prop |
| |
| GET /repos[/path]/!rev-prop/REV/PROPNAME |
| |
| REV is a revision number. |
| PROPNAME is the name of a revision property, properly URI-encoded. |
| |
| response: a binary property value (200), or 404 if not found. |
| cacheable: NO, because revprops are mutable. |
| |
| Note: The [/path] portion is tolerated but ignored by the server, |
| since the revprop is a repository-wide feature. |
| |
| |
| * get-file |
| |
| GET /repos/path/!get-file/REV/[tp] |
| |
| REV is either a revision number or the string 'HEAD'. |
| Final path component is either 't', 'p', 'tp', or non-existent: |
| 't' means the file's text is wanted. |
| 'p' means the file's properties are wanted. |
| 'tp' (or non-existence) means both text and props are wanted. |
| |
| response: (structural encoding TBD:) |
| revnum |
| checksum of text (if text requested) |
| proplist (if props requested) |
| text (if text requested) |
| cacheable: YES, but only if a specific revnum was requested |
| (i.e. not HEAD) |
| |
| Note: two simpler, alternate URI forms work for fetching *just* |
| the file's contents (no revnum, checksum, or props): |
| |
| GET /repos/path |
| GET /repos/path?rev=number |
| |
| cacheable: YES, but only if a specific revnum is given in the |
| query arg. |
| |
| |
| * get-dir |
| |
| GET /repos/path/!get-dir/REV/[kshctla]/[tp] |
| |
| REV is either a revision number or the string 'HEAD'. |
| Penultimate path component is a set of characters indicating which |
| dirent fields to return: |
| k kind |
| s size |
| h has-props? |
| c created-rev |
| t time (created-date) |
| l last-author |
| a all fields |
| Final path component is either 't', 'p', 'tp', or non-existent: |
| 't' means the directory's dirents are wanted. |
| 'p' means the directory's properties are wanted. |
| 'tp' (or non-existence) means both text and props are wanted. |
| |
| response: (structural encoding TBD): |
| revnum |
| proplist (if props requested) |
| list of dirents: (name, kind, size, has-props, |
| created-rev, time, last-author) |
| |
| cacheable: YES, but only if specific revnum is requested. |
| |
| examples: |
| |
| Get all dirents of path@38: |
| GET /repos/path/!get-dir/38/a |
| |
| Get only dirent names and kinds for path@HEAD, no properties: |
| GET /repos/path/!get-dir/HEAD/k/t |
| |
| |
| |
| |
| .... NOT YET FINISHED .... |
| |
| |
| |
| 5. Implementation details |
| |
| |
| A. (new) mod_svn module |
| |
| mod_svn requirements: |
| |
| * operates completely standalone |
| * provides reasonable opportunity for proxy caching |
| * provides reasonable opportunity for pipelining clients |
| * must DECLINE DeltaV requests, so mod_dav_svn can be |
| installed 'behind it' on the same <Location>, for |
| compatibility with old clients. |
| |
| |
| B. (new) libsvn_ra_http.so library |
| |
| Uses serf library like libsvn_ra_serf, but speaks new http v2 |
| protocol. |
| |
| When libsvn_ra (the switching library) decides that some http RA |
| module is necessary, have it first call a utility function to do |
| an OPTIONS probe, then decide to use either ra_http or ra_serf. |
| |
| This strategy aligns with the way in which libsvn_ra currently |
| "chooses" either ra_neon or ra_serf based on a runtime config |
| file. |
| |
| |
| |
| 6. Optimization Possibilities |
| |
| |
| * Have the svn client stash some metadata which records whether a |
| working copy comes from a 'v2' HTTP server or not. This would |
| save us from doing an extra OPTIONS probe at the start of each RA |
| session. |
| |