notes/http-protocol-v2.txt - subversion - Git at Google

                A Streamlined HTTP Protocol for Subversion

 GOAL
 ====

 Write a new HTTP protocol for svn -- one which is entirely proprietary
 and designed for speed and comprehensibility.


 PURPOSE / HISTORY
 =================

 Subversion standardized on Apache and the WebDAV/DeltaV protocol as a
 back in the earliest days of development, based on some very strong
 value propositions:

   A. Able to go through corporate firewalls
   B. Zillions of authn/authz options via Apache
   C. Standardized encryption (SSL)
   D. Excellent logging
   E. Built-in repository browsing
   F. Caching within intermediate proxies
   G. Interoperability with other WebDAV clients

 Unfortunately, DeltaV is an insanely complex and inefficient protocol,
 and doesn't fit Subversion's model well at all.  The result is that
 Subversion speaks a "limited portion" of DeltaV, and pays a huge
 performance price for this complexity.


 REQUIREMENTS
 ============

 Write a new HTTP protocol for svn ("HTTP v2").  Map RA requests
 directly to HTTP requests.

   * svn over HTTP should be much faster (eliminate extra turnarounds)

   * svn over HTTP should be almost as easy to extend as svnserve.

   * svn over HTTP should be comprehensible to devs and users both
     (require no knowledge of DeltaV concepts).

   * svn over HTTP should be designed for optimum cacheability by web
     proxies.

   * svn over HTTP should make use of pipelined and parallel requests
     when possible.


 Our Plans, in a Nutshell
 ========================

 * Phase 1:  Remove all DeltaV mechanics & formalities

   - get rid of all the PROPFIND 'discovery' turnarounds.
   - stop doing CHECKOUT requests before each PUT
   - publish a public URI syntax for browsing historical objects

 * Phase 2:  Speed up commits

   - Make PUT requests pipelined, the way ra_svn does.

 * Phase 3:  (maybe) get rid of XML in request/response bodies

   - if there's a worthwhile speed gain, us serialzed Thrift objects.


 Phase 1 in Detail
 =================

 At the moment, ra_serf has to 'discover' and manipulate the following
 DeltaV objects:

    - Version Controlled Resource (VCC) :  !svn/vcc
    - Baseline resource:                   !svn/bln
    - Working baseline resource:           !svn/wbl
    - Baseline collection resource:        !svn/bc/REV/
    - Activity collection:                 !svn/act/activityUUID/
    - Versioned resource:                  !svn/ver/REV/path
    - Working resource:                    !svn/wrk/activityUUID/path

 All of these objects will be deprecated and no longer used.
 mod_dav_svn will still support older clients, of course, but new
 clients will be able to automatically construct all of the URIs they
 need.


  * Opening an RA session:

    ra_serf will send an OPTIONS request when creating a new
    ra_session.  mod_dav_svn will send back what it already sends now,
    but will also return new information:

       youngest revision:  number
             "root stub":  !svn/me
           "pegrev stub":  !svn/bc
         "revision stub":  !svn/rev

    The presence of these new stubs tells ra_serf that this is a new
    server, and that the new streamlined HTTP protocol can be used.
    ra_serf then caches them in the ra_session object.  If these new
    OPTIONS responses are not returned, ra_serf falls back to 'classic'
    DeltaV protocol.


  * What the new stubs are used for:

    - root stub:  represents the "repository itself".  This is the URI
      that custom REPORTS are sent against.

      Note:  this eliminates our need for the VCC resource.

    - pegrev stub: an opaque string to append to, whenever the client
      wants to refer to a (pegrev, path) in the repository.
      Specifically, /REV/PATH are appended, e.g.

           GET !svn/bc/2398/trunk/foo.c

      Note:  that this syntax is already the one mod_dav_svn understands;
      what's changing here is that we no longer need to do a bunch of
      PROPFINDs to discover it -- we get the stub right up front when
      the session is opened.

    - revision stub: represents an opaque string to append to, whenever
      the client wants to access a revision's revprops (either reading
      or writing).  Specifically, /REV is appended, e.g.

           PROPFIND !svn/rev/2398

      Standard PROPFIND and PROPATCH requests can be used against the
      constructed URI, with the understanding that the name/value pairs
      being accessed are unversioned revision props, rather than file
      or directory props.

      Note:  this eliminates our need for baseline (bln) or working
      baseline (wbl) resources.


  * Simple read requests

    These RA functions each send single request/response, either GET or
    PROPFIND.

    The only changes here is that we no longer need to "discover"
    pegrev or revision URIs with extra turnarounds; instead we construct
    them directly.

     get-latest-rev    -> already present in ra_session (via OPTIONS)

     get-file          -> GET (against a pegrev URI)

     get-dir           -> PROPFIND depth 1 (against a pegrev URI)

     rev-prop          -> PROPFIND (against a revision URI)

     rev-proplist      -> PROPFIND (against a revision URI, but recursive)

     check-path        -> PROPFIND (against a pegrev URI)

     stat              -> PROPFIND (against a pegrev URI)

     get-lock          -> PROPFIND (against a public HEAD URI)


  * Complex read requests

    These RA functions are each accomplished in a single REPORT
    request/response.

    These REPORTs are not changing, except that they'll be sent against
    the root stub URI (!svn/me) rather than a VCC URI.  Again, we're
    eliminating all "discovery" turnarounds which used to preceed these
    requests.

    log                      -> REPORT (against root stub)

    get-dated-rev            -> REPORT (against root stub)

    get-locations            -> REPORT (against root stub)

    get-locations-segments   -> REPORT (against root stub)

    get-file-revs            -> REPORT (against root stub)

    get-locks                -> REPORT (against root stub)

    get-mergeinfo            -> REPORT (against root stub)

    replay                   -> REPORT (against root stub)

    replay-range             -> pipelined REPORT requests (against root stub)
                                on each revision in the range


 * The "update" family of requests

    update
    switch
    status
    diff

    For these RA functions, the existing ra_serf strategy stays the same:

     1. Client sends custom REPORT describing state of working copy;
        it does *not* request text-deltas in response (the way ra_neon does).

     2. Server responds with a 'skeletal' editor-drive.

     3. Client pipelines bunches of GET and PROPFIND requests.


    The only changes we plan to make:

     - the REPORT happens against the new 'root stub', rather than a
       discovered VCC URI.

     - no need to cache the !svn/ver "wcprops" in the working copy
       anymore, since our commit process has changed (see below).

     - no need to do any PROPFIND discovery of pegrev objects to fetch;
       client can construct them at will using the 'pegrev stub' it
       received when the ra_session began.


 * Simple write requests

    change-rev-prop          -> PROPPATCH (against a revision URI)

    lock                     -> LOCK (against a public HEAD URI)

    unlock                   -> UNLOCK (against a public HEAD URI)


 * Commit process

   This will change significantly.  The current methodology looks like:

       OPTIONS to start ra_session
       PROPFINDs to discover various opaque URIs
       MKACTIVITY to create a transaction
       for each changed object:
          CHECKOUT object to get working resource
          {PUT, PROPPATCH, DELETE, COPY} working resource
          MKCOL to create new directories
       MERGE to commit the transaction

   The new sequence looks like:

       OPTIONS to start ra_session
       POST against root stub, to create a transaction
       for each changed object:
          {PUT, PROPPATCH, DELETE, COPY, MKCOL} against transaction resources
       MERGE to commit the transaction

   Specific new changes:

     - The new POST request replaces the MKACTIVITY request.

        - no more need to "discover" the activity URI;  !svn/act/ is gone.
        - client no longer creates an activity UUID itself.
        - instead, POST returns two new stubs:

                "transaction stub":  !svn/txn/TXN_UUID
           "transaction prop stub":  !svn/txp/TXN_UUID

        - transaction stub:  an opaque URI which contains the svn
          transaction's actual UUID (generated by libsvn_fs).  Client
          can then append paths to the stub to refer to any file or
          directory within the transaction, e.g.

                PUT !svn/txn/TXN_UUID/trunk/foo.c

        - transaction prop stub: a opaque URI representing unversioned
          props on a transaction Client can use this URI to read or
          modify unversioned transaction properties (such as
          'svn:log'), e.g.

                PROPPATCH !svn/txp/TXN_UUID

      - Once the commit transaction is created, the client is free to
        send write requests against transaction resources it constructs itself.

        Note:  this eliminates the CHECKOUT requests, and also removes
        our need to use versioned resources (!svn/ver) or working
        resources (!svn/wrk).

      - When modifying transaction resources, clients should send
        'If-match:' headers to facilitate server-side out-of-dateness
        checks.  (TODO:  value of header is probably an etag?)
	A Streamlined HTTP Protocol for Subversion

	GOAL
	====

	Write a new HTTP protocol for svn -- one which is entirely proprietary
	and designed for speed and comprehensibility.


	PURPOSE / HISTORY
	=================

	Subversion standardized on Apache and the WebDAV/DeltaV protocol as a
	back in the earliest days of development, based on some very strong
	value propositions:

	A. Able to go through corporate firewalls
	B. Zillions of authn/authz options via Apache
	C. Standardized encryption (SSL)
	D. Excellent logging
	E. Built-in repository browsing
	F. Caching within intermediate proxies
	G. Interoperability with other WebDAV clients

	Unfortunately, DeltaV is an insanely complex and inefficient protocol,
	and doesn't fit Subversion's model well at all. The result is that
	Subversion speaks a "limited portion" of DeltaV, and pays a huge
	performance price for this complexity.


	REQUIREMENTS
	============

	Write a new HTTP protocol for svn ("HTTP v2"). Map RA requests
	directly to HTTP requests.

	* svn over HTTP should be much faster (eliminate extra turnarounds)

	* svn over HTTP should be almost as easy to extend as svnserve.

	* svn over HTTP should be comprehensible to devs and users both
	(require no knowledge of DeltaV concepts).

	* svn over HTTP should be designed for optimum cacheability by web
	proxies.

	* svn over HTTP should make use of pipelined and parallel requests
	when possible.



	Our Plans, in a Nutshell
	========================

	* Phase 1: Remove all DeltaV mechanics & formalities

	- get rid of all the PROPFIND 'discovery' turnarounds.
	- stop doing CHECKOUT requests before each PUT
	- publish a public URI syntax for browsing historical objects

	* Phase 2: Speed up commits

	- Make PUT requests pipelined, the way ra_svn does.

	* Phase 3: (maybe) get rid of XML in request/response bodies

	- if there's a worthwhile speed gain, us serialzed Thrift objects.



	Phase 1 in Detail
	=================

	At the moment, ra_serf has to 'discover' and manipulate the following
	DeltaV objects:

	- Version Controlled Resource (VCC) : !svn/vcc
	- Baseline resource: !svn/bln
	- Working baseline resource: !svn/wbl
	- Baseline collection resource: !svn/bc/REV/
	- Activity collection: !svn/act/activityUUID/
	- Versioned resource: !svn/ver/REV/path
	- Working resource: !svn/wrk/activityUUID/path

	All of these objects will be deprecated and no longer used.
	mod_dav_svn will still support older clients, of course, but new
	clients will be able to automatically construct all of the URIs they
	need.


	* Opening an RA session:

	ra_serf will send an OPTIONS request when creating a new
	ra_session. mod_dav_svn will send back what it already sends now,
	but will also return new information:

	youngest revision: number
	"root stub": !svn/me
	"pegrev stub": !svn/bc
	"revision stub": !svn/rev

	The presence of these new stubs tells ra_serf that this is a new
	server, and that the new streamlined HTTP protocol can be used.
	ra_serf then caches them in the ra_session object. If these new
	OPTIONS responses are not returned, ra_serf falls back to 'classic'
	DeltaV protocol.


	* What the new stubs are used for:

	- root stub: represents the "repository itself". This is the URI
	that custom REPORTS are sent against.

	Note: this eliminates our need for the VCC resource.

	- pegrev stub: an opaque string to append to, whenever the client
	wants to refer to a (pegrev, path) in the repository.
	Specifically, /REV/PATH are appended, e.g.

	GET !svn/bc/2398/trunk/foo.c

	Note: that this syntax is already the one mod_dav_svn understands;
	what's changing here is that we no longer need to do a bunch of
	PROPFINDs to discover it -- we get the stub right up front when
	the session is opened.

	- revision stub: represents an opaque string to append to, whenever
	the client wants to access a revision's revprops (either reading
	or writing). Specifically, /REV is appended, e.g.

	PROPFIND !svn/rev/2398

	Standard PROPFIND and PROPATCH requests can be used against the
	constructed URI, with the understanding that the name/value pairs
	being accessed are unversioned revision props, rather than file
	or directory props.

	Note: this eliminates our need for baseline (bln) or working
	baseline (wbl) resources.


	* Simple read requests

	These RA functions each send single request/response, either GET or
	PROPFIND.

	The only changes here is that we no longer need to "discover"
	pegrev or revision URIs with extra turnarounds; instead we construct
	them directly.

	get-latest-rev -> already present in ra_session (via OPTIONS)

	get-file -> GET (against a pegrev URI)

	get-dir -> PROPFIND depth 1 (against a pegrev URI)

	rev-prop -> PROPFIND (against a revision URI)

	rev-proplist -> PROPFIND (against a revision URI, but recursive)

	check-path -> PROPFIND (against a pegrev URI)

	stat -> PROPFIND (against a pegrev URI)

	get-lock -> PROPFIND (against a public HEAD URI)


	* Complex read requests

	These RA functions are each accomplished in a single REPORT
	request/response.

	These REPORTs are not changing, except that they'll be sent against
	the root stub URI (!svn/me) rather than a VCC URI. Again, we're
	eliminating all "discovery" turnarounds which used to preceed these
	requests.

	log -> REPORT (against root stub)

	get-dated-rev -> REPORT (against root stub)

	get-locations -> REPORT (against root stub)

	get-locations-segments -> REPORT (against root stub)

	get-file-revs -> REPORT (against root stub)

	get-locks -> REPORT (against root stub)

	get-mergeinfo -> REPORT (against root stub)

	replay -> REPORT (against root stub)

	replay-range -> pipelined REPORT requests (against root stub)
	on each revision in the range


	* The "update" family of requests

	update
	switch
	status
	diff

	For these RA functions, the existing ra_serf strategy stays the same:

	1. Client sends custom REPORT describing state of working copy;
	it does not request text-deltas in response (the way ra_neon does).

	2. Server responds with a 'skeletal' editor-drive.

	3. Client pipelines bunches of GET and PROPFIND requests.


	The only changes we plan to make:

	- the REPORT happens against the new 'root stub', rather than a
	discovered VCC URI.

	- no need to cache the !svn/ver "wcprops" in the working copy
	anymore, since our commit process has changed (see below).

	- no need to do any PROPFIND discovery of pegrev objects to fetch;
	client can construct them at will using the 'pegrev stub' it
	received when the ra_session began.


	* Simple write requests

	change-rev-prop -> PROPPATCH (against a revision URI)

	lock -> LOCK (against a public HEAD URI)

	unlock -> UNLOCK (against a public HEAD URI)


	* Commit process

	This will change significantly. The current methodology looks like:

	OPTIONS to start ra_session
	PROPFINDs to discover various opaque URIs
	MKACTIVITY to create a transaction
	for each changed object:
	CHECKOUT object to get working resource
	{PUT, PROPPATCH, DELETE, COPY} working resource
	MKCOL to create new directories
	MERGE to commit the transaction

	The new sequence looks like:

	OPTIONS to start ra_session
	POST against root stub, to create a transaction
	for each changed object:
	{PUT, PROPPATCH, DELETE, COPY, MKCOL} against transaction resources
	MERGE to commit the transaction

	Specific new changes:

	- The new POST request replaces the MKACTIVITY request.

	- no more need to "discover" the activity URI; !svn/act/ is gone.
	- client no longer creates an activity UUID itself.
	- instead, POST returns two new stubs:

	"transaction stub": !svn/txn/TXN_UUID
	"transaction prop stub": !svn/txp/TXN_UUID

	- transaction stub: an opaque URI which contains the svn
	transaction's actual UUID (generated by libsvn_fs). Client
	can then append paths to the stub to refer to any file or
	directory within the transaction, e.g.

	PUT !svn/txn/TXN_UUID/trunk/foo.c

	- transaction prop stub: a opaque URI representing unversioned
	props on a transaction Client can use this URI to read or
	modify unversioned transaction properties (such as
	'svn:log'), e.g.

	PROPPATCH !svn/txp/TXN_UUID

	- Once the commit transaction is created, the client is free to
	send write requests against transaction resources it constructs itself.

	Note: this eliminates the CHECKOUT requests, and also removes
	our need to use versioned resources (!svn/ver) or working
	resources (!svn/wrk).

	- When modifying transaction resources, clients should send
	'If-match:' headers to facilitate server-side out-of-dateness
	checks. (TODO: value of header is probably an etag?)