notes/svnpatch/svnpatch.txt - subversion - Git at Google

 This file documents the 'svnpatch' format that's used with both diff and patch
 subcommands.

 I HISTORY
   -------

 Subversion's diff facility by default generates an unidiff format output.  The
 unidiff format is famous with tools like diff(1) and has been used for decades
 to produce contextual differences between files.  We also often associate it
 with patch(1) to apply those contextual differences.  When it comes to
 non-contextual changes like moving a directory, adding a property to a file, or
 modifying an image, unidiff is helpless.  Enters the svnpatch format.  It
 enables capturing all non-contextual changes into a WC-portable output so that
 it is possible to create rich patches to apply across working copies.  Another
 way to look at it is as an "offline merge", in which one dumps the diffs,
 passes them as a patch on to her peer who then applies it, without ever
 interacting with the repository.

 The svnpatch format is in fact a simplified version of the Subversion protocol
 -- see subversion/libsvn_ra_svn/protocol -- that meets our needs.  The
 advantage here is that changes are serialized into a language that Subversion
 already speaks and only a few minor tweaks were needed to accommodate.  As an
 example, revisions have been stripped from the protocol to allow fuzzing.

 The implementation with the command line client uses `svn diff --svnpatch' to
 generate the rich diffs and `svn patch' to apply the diffs against a working
 copy.  Other frontends can also take advantage of svnpatch in the same way
 through the usual API's svn_client_diff5 and svn_client_patch that use files to
 communicate.


 II SVNPATCH FORMAT IN A NUTSHELL
    -----------------------------

 First off, let's define it.  svnpatch format is made of two ordered parts:
   * (a) human-readable: made of unidiff bytes
   * (b) computer-readable: made of svn protocol bytes (ra_svn), gzip'ed,
         base64-encoded

 But, as we're not in a client/server configuration:
   - (b) only uses the svn protocol's Editor Command Set, there's no need for
     the Main Command Set nor the Report Command Set
   - a client reads Editor Commands from the patch, i.e. the patch silently
     drives the client's editor
   - the only direction the information takes is from the patch to the client
   - svndiff1 is solely used instead of being able to choose between svndiff1
     and svndiff0 (e.g. binary-change needs svndiff)

 Such a format can be seen as a subset of the svn protocol which:
   - Capabilities and Edit Pipelining have nothing to do with as we can't adjust
     once the patch is rock-hard written in the file nor negotiate anything
   - commands are restricted to the Editor Command Set
   - lacks revision numbers and checksums except for binary files (see VI
     FUZZING)

 For more about Command Sets, consult libsvn_ra_svn/protocol.


 III BOUNDARIES BETWEEN THE TWO PARTS
     --------------------------------

 Now since the svn protocol would be happy to handle just any change that a
 working copy comes with, rules have to be set up so that we meet our goals (see
 I HISTORY).

 Concretely, what's in each part?

 In (a):
  - contextual differences
  - property-changes (in a similar way to 'svn diff')
  - new non-binary-file content

 In (b):
  - tree-changes ({add,del,move,copy}-directory, {add,del,move,copy}-file)
  - property-changes
  - binary-changes

 Consequences are we face cases where one change's representation lives in the
 two parts of the patch. e.g. a modified-file move: the move is represented
 within (b) while contextual differences within (a);  a file add: an add-file
 Editor Command in (b) plus its content in (a).

 Furthermore, we never end up with redundant information but with
 property-changes.  A file copy with modifications generates (a) contextual
 diff, (b) add-file w/ copy-path.

 The only thing that's left unreadable is tree-changes as defined above.
 However, a higher level layer (e.g. GUIs) would perfectly be able to
 base64-decode, uncompress and read operations to visually-render the changes.

 The (b) block starts with a header and its version.

 Here's what a directory add, a file add and a propset would look like:

 [[[
 Index: bar
 ===================================================================
 --- bar
 +++ bar
 @@ -0,0 +1,2 @@
 +This is bar content.
 +

 Property changes on: bar
 ___________________________________________________________________
 Name: newprop
    + propval

 ======================== SVNPATCH BLOCK 1 =========================
 H4sICOz0mEYAA291dABtjsEKwyAMhu97Co/tQejcoZC3cU26CWLElu31l0ZXulE8GL//8yed4UzJ
 FubVdHJ64wAHuXp5eEQ7h0gy3uDuS80cTFc1qzQ9fXqQejYXzoLUGCHRu4ERtuHl4/5rq8ZQtHlm
 /jajOzZHXqhZGp3i4Qe3df931IwwrBVePlyTX//3AAAA
 ]]]

 Let's uncompress and decode the above base64 block (lines are wrapped):

 ( open-root ( ( ) 2:d0 ) ) ( add-file ( 3:bar 2:d0 2:c1 ( ) ) ) (
 change-file-prop ( 2:c1 7:newprop ( 7:propval ) ) ) ( add-dir ( 3:foo 2:d0 2:d2
 ( ) ) ) ( close-dir ( 2:d2 ) ) ( close-dir ( 2:d0 ) ) ( close-file ( 2:c1 ( ) )
 ) ( close-edit ( ) )

 Further examples can be found in subversion/tests/cmdline/diff_tests.py
 test-suite.


 IV SVNPATCH EDIT-ABILITY
    ---------------------

 Because encoded and compressed, the computer-readable chunk (b) is not directly
 editable.  Should it be in cleartext, the user would still have to go through
 svn protocol writing manually -- calculate checksums and strings length, and
 place tokens, assumed to be not so friendly for the end-user.  However, there's
 a much easier workaround: apply the patch, and then start editing the working
 copy with regular svn subcommands.


 V PATCHING
   --------

 When it comes to applying an svnpatch patch (RAS syndrom), the 'svn patch'
 subcommand is a good friend.  We do support applying (a) Unidiffs
 internally, and (b) is handled with routines that read and drive
 editor functions out from the patch file much like what's being performed by
 libsvn_ra_svn with a network stream.

 Now some words about the order to process (a) and (b).  There might be cases
 when operations to a single file live in the two parts of the patch (see above).
 Since Unidiff indexes are made against the most up-to-date file name, it makes
 sense that 'svn patch' first deals with the svnpatch block and then the Unidiff
 block.  E.g. consider a WC with a file copy from foo to bar and then contextual
 modifications to bar.  The patch that represents this WC changes would show
 diffs against 'bar' file.  So 'svn patch' first has to schedule-add-with-history
 bar from foo and then apply contextual diffs, which would not work the other way
 around.

 When the Editor Command Set comes to be extended, 'svn patch' will face
 unexpected commands and/or syntax.  As in libsvn_ra_svn, we warn the user with
 'unsupported command' messages and ignore its application.


 VI FUZZING a.k.a. DYSTOPIA
    -----------------------

 The svn protocol is not very sensitive to fuzzing since most operations include
 a revision number.  However, to stick with this policy would widely lower the
 patch-application scope we're expecting.  For instance, 'svn patch' would fail
 at deleting dir@REV when REV is different from the one that comes with the
 delete-entry Editor Command.  Obviously we need loose here, and the solution is
 to free the svn protocol from revision numbers and checksums in our
 implementation for every change but binary-changes (for the checksums).  (It
 would be insane to associate binary stuff with fuzzing in this world.)  Now
 dealing with (b) patching is similar in many ways to GNU Patch's: we end up
 trying by all methods to drive the editor in the dark jungle, possibly failing
 in few cases shooting 'hunk failed' warnings.


 VII PATCH AND MERGE IN SUBVERSION
     -----------------------------

 'svn patch' is similar in many ways to 'svn merge'.  Basically, we have a
 tree-delta in hand that we want to apply to a working-tree.  Thus it's not
 surprising to see they have a lot in common when comparing both implementations.
 'patch' uses a mix of revamped merge_callbacks (see libsvn_client/merge.c) and
 repos-repos editor functions (see libsvn_client/repos_diff.c).  Why not merge
 those two together then, for code-share sake?  Well, although they share a close
 logic, to join the two implies having one single file (repos_diff.c) to handle
 at least three burdens:  repos-repos diff, merge, and patch.  Such a design
 can't be achieved without a myriad of tests/conditions and a large amount of
 blurry mess at mixing three different tools in one place.  In the end, what was
 supposed to enhance software maintainability turned out to cause a lot of damage
 at tightening different things together.
	This file documents the 'svnpatch' format that's used with both diff and patch
	subcommands.

	I HISTORY
	-------

	Subversion's diff facility by default generates an unidiff format output. The
	unidiff format is famous with tools like diff(1) and has been used for decades
	to produce contextual differences between files. We also often associate it
	with patch(1) to apply those contextual differences. When it comes to
	non-contextual changes like moving a directory, adding a property to a file, or
	modifying an image, unidiff is helpless. Enters the svnpatch format. It
	enables capturing all non-contextual changes into a WC-portable output so that
	it is possible to create rich patches to apply across working copies. Another
	way to look at it is as an "offline merge", in which one dumps the diffs,
	passes them as a patch on to her peer who then applies it, without ever
	interacting with the repository.

	The svnpatch format is in fact a simplified version of the Subversion protocol
	-- see subversion/libsvn_ra_svn/protocol -- that meets our needs. The
	advantage here is that changes are serialized into a language that Subversion
	already speaks and only a few minor tweaks were needed to accommodate. As an
	example, revisions have been stripped from the protocol to allow fuzzing.

	The implementation with the command line client uses `svn diff --svnpatch' to
	generate the rich diffs and `svn patch' to apply the diffs against a working
	copy. Other frontends can also take advantage of svnpatch in the same way
	through the usual API's svn_client_diff5 and svn_client_patch that use files to
	communicate.


	II SVNPATCH FORMAT IN A NUTSHELL
	-----------------------------

	First off, let's define it. svnpatch format is made of two ordered parts:
	* (a) human-readable: made of unidiff bytes
	* (b) computer-readable: made of svn protocol bytes (ra_svn), gzip'ed,
	base64-encoded

	But, as we're not in a client/server configuration:
	- (b) only uses the svn protocol's Editor Command Set, there's no need for
	the Main Command Set nor the Report Command Set
	- a client reads Editor Commands from the patch, i.e. the patch silently
	drives the client's editor
	- the only direction the information takes is from the patch to the client
	- svndiff1 is solely used instead of being able to choose between svndiff1
	and svndiff0 (e.g. binary-change needs svndiff)

	Such a format can be seen as a subset of the svn protocol which:
	- Capabilities and Edit Pipelining have nothing to do with as we can't adjust
	once the patch is rock-hard written in the file nor negotiate anything
	- commands are restricted to the Editor Command Set
	- lacks revision numbers and checksums except for binary files (see VI
	FUZZING)

	For more about Command Sets, consult libsvn_ra_svn/protocol.


	III BOUNDARIES BETWEEN THE TWO PARTS
	--------------------------------

	Now since the svn protocol would be happy to handle just any change that a
	working copy comes with, rules have to be set up so that we meet our goals (see
	I HISTORY).

	Concretely, what's in each part?

	In (a):
	- contextual differences
	- property-changes (in a similar way to 'svn diff')
	- new non-binary-file content

	In (b):
	- tree-changes ({add,del,move,copy}-directory, {add,del,move,copy}-file)
	- property-changes
	- binary-changes

	Consequences are we face cases where one change's representation lives in the
	two parts of the patch. e.g. a modified-file move: the move is represented
	within (b) while contextual differences within (a); a file add: an add-file
	Editor Command in (b) plus its content in (a).

	Furthermore, we never end up with redundant information but with
	property-changes. A file copy with modifications generates (a) contextual
	diff, (b) add-file w/ copy-path.

	The only thing that's left unreadable is tree-changes as defined above.
	However, a higher level layer (e.g. GUIs) would perfectly be able to
	base64-decode, uncompress and read operations to visually-render the changes.

	The (b) block starts with a header and its version.

	Here's what a directory add, a file add and a propset would look like:

	[[[
	Index: bar
	===================================================================
	--- bar
	+++ bar
	@@ -0,0 +1,2 @@
	+This is bar content.
	+

	Property changes on: bar
	___________________________________________________________________
	Name: newprop
	+ propval

	======================== SVNPATCH BLOCK 1 =========================
	H4sICOz0mEYAA291dABtjsEKwyAMhu97Co/tQejcoZC3cU26CWLElu31l0ZXulE8GL//8yed4UzJ
	FubVdHJ64wAHuXp5eEQ7h0gy3uDuS80cTFc1qzQ9fXqQejYXzoLUGCHRu4ERtuHl4/5rq8ZQtHlm
	/jajOzZHXqhZGp3i4Qe3df931IwwrBVePlyTX//3AAAA
	]]]

	Let's uncompress and decode the above base64 block (lines are wrapped):

	( open-root ( ( ) 2:d0 ) ) ( add-file ( 3:bar 2:d0 2:c1 ( ) ) ) (
	change-file-prop ( 2:c1 7:newprop ( 7:propval ) ) ) ( add-dir ( 3:foo 2:d0 2:d2
	( ) ) ) ( close-dir ( 2:d2 ) ) ( close-dir ( 2:d0 ) ) ( close-file ( 2:c1 ( ) )
	) ( close-edit ( ) )

	Further examples can be found in subversion/tests/cmdline/diff_tests.py
	test-suite.


	IV SVNPATCH EDIT-ABILITY
	---------------------

	Because encoded and compressed, the computer-readable chunk (b) is not directly
	editable. Should it be in cleartext, the user would still have to go through
	svn protocol writing manually -- calculate checksums and strings length, and
	place tokens, assumed to be not so friendly for the end-user. However, there's
	a much easier workaround: apply the patch, and then start editing the working
	copy with regular svn subcommands.


	V PATCHING
	--------

	When it comes to applying an svnpatch patch (RAS syndrom), the 'svn patch'
	subcommand is a good friend. We do support applying (a) Unidiffs
	internally, and (b) is handled with routines that read and drive
	editor functions out from the patch file much like what's being performed by
	libsvn_ra_svn with a network stream.

	Now some words about the order to process (a) and (b). There might be cases
	when operations to a single file live in the two parts of the patch (see above).
	Since Unidiff indexes are made against the most up-to-date file name, it makes
	sense that 'svn patch' first deals with the svnpatch block and then the Unidiff
	block. E.g. consider a WC with a file copy from foo to bar and then contextual
	modifications to bar. The patch that represents this WC changes would show
	diffs against 'bar' file. So 'svn patch' first has to schedule-add-with-history
	bar from foo and then apply contextual diffs, which would not work the other way
	around.

	When the Editor Command Set comes to be extended, 'svn patch' will face
	unexpected commands and/or syntax. As in libsvn_ra_svn, we warn the user with
	'unsupported command' messages and ignore its application.


	VI FUZZING a.k.a. DYSTOPIA
	-----------------------

	The svn protocol is not very sensitive to fuzzing since most operations include
	a revision number. However, to stick with this policy would widely lower the
	patch-application scope we're expecting. For instance, 'svn patch' would fail
	at deleting dir@REV when REV is different from the one that comes with the
	delete-entry Editor Command. Obviously we need loose here, and the solution is
	to free the svn protocol from revision numbers and checksums in our
	implementation for every change but binary-changes (for the checksums). (It
	would be insane to associate binary stuff with fuzzing in this world.) Now
	dealing with (b) patching is similar in many ways to GNU Patch's: we end up
	trying by all methods to drive the editor in the dark jungle, possibly failing
	in few cases shooting 'hunk failed' warnings.


	VII PATCH AND MERGE IN SUBVERSION
	-----------------------------

	'svn patch' is similar in many ways to 'svn merge'. Basically, we have a
	tree-delta in hand that we want to apply to a working-tree. Thus it's not
	surprising to see they have a lot in common when comparing both implementations.
	'patch' uses a mix of revamped merge_callbacks (see libsvn_client/merge.c) and
	repos-repos editor functions (see libsvn_client/repos_diff.c). Why not merge
	those two together then, for code-share sake? Well, although they share a close
	logic, to join the two implies having one single file (repos_diff.c) to handle
	at least three burdens: repos-repos diff, merge, and patch. Such a design
	can't be achieved without a myriad of tests/conditions and a large amount of
	blurry mess at mixing three different tools in one place. In the end, what was
	supposed to enhance software maintainability turned out to cause a lot of damage
	at tightening different things together.