blob: 072d3462873979d599673860d56235d6ac4f0d5c [file] [log] [blame]
A Streamlined HTTP Protocol for Subversion
GOAL
====
Write a new HTTP protocol for svn, one which is entirely proprietary
and designed for speed.
PURPOSE / HISTORY
=================
Subversion standardized on Apache and the WebDAV/DeltaV protocol as a
back in the earliest days of development, based on some very strong
value propositions:
A. Able to go through corporate firewalls
B. Zillions of authn/authz options via Apache
C. Standardized encryption (SSL)
D. Excellent logging
E. Built-in repository browsing
F. Interoperability with other WebDAV clients
G. Caching within intermediate proxies
Unfortunately, DeltaV is an insanely complex and inefficient protocol,
and doesn't fit Subversion's model well at all. The result is that
Subversion speaks a "limited portion" of DeltaV, and pays a huge price
for this complexity: speed.
A typical network trace involves dozens of unnecessary turnarounds
where the client keeps asking for the same information over and over
again, all for the sake of following DeltaV. And then once the client
has "discovered" the information it needs, it often ends up making a
custom REPORT request anyway. Most svn operations are at least twice
as slow over HTTP than over the custom svnserve protocol.
PROPOSAL
========
Write a new HTTP protocol for svn ("HTTP v2"). Map RA requests
directly to HTTP requests.
* svn over HTTP should be much faster (eliminate turnarounds)
* svn over HTTP should be almost as easy to extend as svnserve.
* svn over HTTP should be comprehensible to devs and users both
(require no knowledge of DeltaV concepts).
* svn over HTTP should be designed for optimum cacheability by
proxy-servers.
MILE-HIGH DESIGN
================
* Write new mod_svn module. Design it to run side-by-side with
mod_dav_svn on the same public URI.
* Extend libsvn_ra_serf to detect the Apache feature and if present,
speak the new protocol.
* Client/server compatibility:
- newer clients can still operate against old servers: they look
for new protocol in OPTIONS response; if not available, fall
back to making DeltaV requests.
- older clients can still operate against new servers: mod_svn
DECLINEs any old-style DeltaV request, allowing mod_dav_svn to
handle it instead.
* To upgrade a service, admins simply install mod_svn next to
mod_dav_svn. They then ask their users to "upgrade the client to
get better HTTP speed".
In theory, mod_svn should operate completely standalone (and should be
tested this way.) If an admin wants to support older clients, or add
webdav functionality (such as autoversioning), then mod_dav_svn can be
installed "behind" mod_svn.
DESIGN
======
1. Client-Server Negotiation
----------------------------
The administrator makes an svn repository available via mod_svn at a
specific URI, which we'll refer to as the "repository root URI".
(This same URI might also be serviced by mod_dav_svn too.)
mod_svn then advertises the new protocol in an OPTIONS response
against the repository root URI. It specifically includes a mininum
and maximum version number of the protocol it understands.
ra_serf always starts an RA session with an OPTIONS request against
the repository root URI. If new protocol isn't present (or an
unsuitable version), it falls back to DeltaV protocol.
TODO: like svnserve, mod_svn may also want to advertise specific
features in its OPTIONS response.
2. General Command Mechanism
----------------------------
From here, the client initiates HTTP requests match up with the
svn_ra.h interfaces. Each RA 'command' takes a set of parameters
and represents a single network turnaround.
The standard pattern is to follow the lead of the mercurial network
protocol and embed these commands in either HTTP/1.1 GET or POST
methods against the repository root URI. The command and parameters
are embedded into the request URI itself as standard query syntax.
For example, if the repository is available at the root URI
'/repos', then a client might send requests like these:
GET /repos?cmd=get-latest-rev
GET /repos?cmd=rev-proplist&r=23
GET /repos/trunk/foo.c?cmd=get-file&r=23
In general, we try to make these requests line up with the
corresponding RA APIs. One exception, however, is that we don't
split the RA_session URI and the 'path' parameter into two pieces.
Using 'path' as a query parameter is weird. So for an RA call like
this:
svn_ra_open(&ra_session, "/repos/trunk/src");
svn_ra_blort(ra_session, "foo.c", 23);
...we'd issue a command like this:
GET /repos/trunk/src/foo.c?cmd=blort&rev=23
For requests which require real input data in the bodies, (such as
'update' or 'commit') we use a POST request. For example:
POST /repos?cmd=update&targetrev=100
[body contains complete 'update report' describing working copy's
revisions; response is a complete editor-drive.]
POST /repos?cmd=commit&keeplocks=true
[body contains complete editor-drive from client, including
possible revision-props that need changing (like svn:log), as
well as any necessary lock-tokens. response is the newly
committed revision number.]
3. Representation of structured data in request/response bodies
---------------------------------------------------------------
XML is out : there's a huge performance penalty for producing and
consuming it, which is why companies like Facebook and Google have
released 'fast wire serialization' libraries like Thrift and
Protocol Buffers. Unfortunately, these libraries require entire
structures to be held in memory in order to serialize/deserialze
them, and this isn't an option when dealing with something
(potentially) infinitely large like an editor-drive.
Luckily, svnserve already has a nice lisp-like representation of the
editor drive, and we can share its parsing/unparsing code.
We can also use this same representation for things like property
lists.
## TODO: flesh out examples here
4. Commands
-----------
In the list of commands, all commands are assumed to be attached as
?cmd=command to the request URI. Command parameters are all
query-encoded (&parm=val), and optional parameters are listed in
square brackets. Server response values are assumed to be in response
bodies.
get-latest-rev
GET /repos[/path]?cmd=get-latest-rev
response: revnum
get-dated-rev
GET /repos[/path]?cmd=get-dated-rev&date=string
response: revnum
change-rev-prop
POST /repos[/path]?cmd=change-rev-prop&rev=num&name=string
[body contains binary value. If body is empty rev-prop is deleted.]
rev-proplist
GET /repos[/path]?cmd=rev-proplist&rev=num
response: proplist
rev-prop
GET /repos[/path]?cmd=rev-prop&rev=num&name=string
response: propval (may be binary data)
commit
POST /repos[/path]?cmd=commit[&keep-locks=bool]
[body contains:
optional list of revprops (including svn:log)
optional list of lockpath:locktoken pairs
editor-drive relative to /repos[/path] ]
response: new revnum OR commit-error-message
get-file
GET /repos/path&cmd=get-file[&rev=num&want-props=bool&want-contents=bool]
If optional params aren't specified, rev defaults to HEAD, and
want-props and want-contents default to 'true'.
response: an s-expression containing:
revnum
checksum
props [if requested]
contents [if requested]
*** Note that two simpler URI forms work for fetching *raw* file
contents as well (no checksum, props, rev):
GET /repos/path
GET /repos/path?rev=number
## TODO: need to design cacheability of both the long-form and
short-form of these requests.
.... NOT YET FINISHED ....
5. Implementation details
A. (new) mod_svn module
mod_svn requirements:
* operates completely standalone
* provides reasonable opportunity for proxy caching
* provides reasonable opportunity for pipelining clients
* must DECLINE DeltaV requests, so mod_dav_svn can be
installed 'behind it' on the same <Location>, for
compatibility with old clients.
B. (new) libsvn_ra_http.so library
Uses serf library like libsvn_ra_serf, but speaks new http v2
protocol.
When libsvn_ra (the switching library) decides that some http RA
module is necessary, have it first call a utility function to do
an OPTIONS probe, then decide to use either ra_http or ra_serf.
This strategy aligns with the way in which libsvn_ra currently
"chooses" either ra_neon or ra_serf based on a runtime config
file.
6. Optimization Possibilities
* Have the svn client stash some metadata which records whether a
working copy comes from a 'v2' HTTP server or not. This would
save us from doing an extra OPTIONS probe at the start of each RA
session.