notes/the_update_problem.txt - subversion - Git at Google


 (writ by Ben, 7/21/00)

 Problem
 -------

  *  A client checks out a working copy of a tree from a repository.
     We call this tree `B' (for `base' revision).  We call the client's
     tree `WC' (for `working copy').

  *  The client procedes to commit a few changes to the repository.
     The client also makes some local modifications.

  *  Meanwhile, other people continue to commit changes to the repository.

  *  Now the client wishes to update WC, to bring all of WC up-to-date
     with the latest tree in the repository (`L').

 The issue is that WC is an ugly hodgepodge of files; many files are
 still based on the original base tree B, but some files have been
 individually committed and match the contents of repository trees that
 are descendants of B.  And some files are just locally modified, not
 yet committed.

 In other words, given that WC's tree is a unique mix of files unknown
 to the server, what exchange of information must take place so that
 the server can send a tree-delta that converts WC into L?

 (Note: this problem also describes the situation where the client is
 attempting to commit WC, and the server first needs to check WC's
 `up-to-date-ness' and point out conflicts.)


 Triangulation
 -------------

 Here's the plan of attack.

             d2
    ---> B ------> L
          \       ^
           \      |
            \     | ?? d3
          d1 \    |
              \   |
               -> WC


  *  The client sends a delta `d1' which describes how to convert B
     into WC.  (It can do this because each change has been
     meticulously tracked in the working copy's administrative files.)

  *  The server generates a delta `d2' which describes how to convert B
     into L.  (It can do this because the repository has also been
     meticulously tracking each change from B to L.)

  *  In theory, therefore, knowledge of d1 and d2 should allow us to
     create d3, which describes how to change WC into L.


 Defining `Change'
 -----------------

 Deltas only talk about entities that have changed.

 We say that two deltas *conflict* iff they each change one or more of
 the same entity.

 Define a function C() which, given a delta, returns an unordered list
 of all node numbers that the delta changes.

 Here are the rules that C() follows:

 1.  If a file's contents change, then that file's node has changed.

 2.  If a file is added, renamed, or deleted, then both the file's node
     *and* it's parent node have changed.


     Notation --

        C(P->Q) = { 3, 7, 9, 21 }.

     This statement means that a certain delta converts tree P into
     tree Q, and the node numbers changed in the process are 3, 7, 9,
     and 21.


 The Process
 -----------

 1.  The client generates a `special' delta

 What do we mean by `special'?

 The client tracks local changes made to a working copy, queueing them
 up as the user works.  Normally, when it's time to commit, the client
 generates a delta describing all the queued local changes, and submits
 them to the server.  This makes sense, because the client only cares
 about discovering conflicts that are *immediately* relevant to the
 newest changes.

 But this list of local changes is *not* enough to create a delta
 B->WC!  To be perfectly clear, there are two types of information that
 must included in a delta B->WC:

          * local changes that are queued, but not yet committed
          * local changes that *have* been committed since WC == B

 Therefore, the client's administrative files must keep track of both
 types of changes.


 2.  Reconstruction of WC

 Next, the server receives B->WC and effectively "rebuilds" an internal
 representation of the oh-so-unique WC.  The server's internal
 reconstruction of WC needn't be some large, complex object on disk or
 in RAM; it's just a representation of the WC tree structure, complete
 with each node's number and internal version number.

 We call this reconstructed WC `RWC'.


 3.  The server generates a second delta.

 Examining node numbers (and their internal version numbers), the
 server creates a delta RWC->L.


 4.  The server runs C() on both deltas

 The server compares the output of C(B->WC) and C(RWC->L), looking for
 any intersection of the sets.  Any intersection of the sets indicates
 conflicting deltas.


 Examples
 --------

 Example 1:  No conflict


        C(B->WC)   = { 2, 5, 9, 13, 21, 87 }
        C(RWC->L)  = { 3, 4, 6, 22, 54, 99, 102 }

     Because the two deltas have changed *completely* different sets of
     nodes, there's no conflict.  If the client were attempting to
     commit, the delta would be approved.  If the client were asking
     for an update, the server would send back RWC->L.


 Example 2:  One conflict

        C(B->WC)   = { 2, 5, 9, 13, 21, 87 }
        C(RWC->L)  = { 3, 4, 6, 22, 54, 87, 102 }

     Both deltas have changed node 87.  If the client were attempting
     to commit, the entire transaction would be denied.  If the client
     were asking for an update, the server would point out the
     conflict.  In both cases, it's up to the client to describe the
     conflict either with conflict markers (in the case of a changed
     file) or with an informational message (in the case of a changed
     directory).


 Example 3:  Client adds a file

      Suppose that completely new file has been added to the working
      copy, but not yet committed.

        C(B->WC)   = { 2, 5, 9, 13, 21, 87, ?? }
        C(RWC->L)  = { 3, 4, 6, 22, 54, 99, 102 }

      The question marks indicate the fact that while C() was
      traversing the delta and looking up node numbers, it ran into an
      unrecognized pathname.  There's no node number for the file,
      because it doesn't yet exist in the repository!

      This case demonstrates that mere node numbers aren't quite
      enough;  pathnames must be also be examined when looking for
      conflicts between two deltas.  E.g., suppose that the new file
      has the exact same pathname as node 102 in L?


 Example 4:  Client deletes a file

      Deleting a file shows up in a tree-delta as a "change" to a file,
      and thus this file's node shows up in C(B->WC), so this is the
      same as example 2.


 Example 5:  Client moves a file or directory.

      This action would show up as *two* changed node numbers in
      C(B->WC), and the rest would play out like example 2.

	(writ by Ben, 7/21/00)

	Problem
	-------

	* A client checks out a working copy of a tree from a repository.
	We call this tree `B' (for `base' revision). We call the client's
	tree `WC' (for `working copy').

	* The client procedes to commit a few changes to the repository.
	The client also makes some local modifications.

	* Meanwhile, other people continue to commit changes to the repository.

	* Now the client wishes to update WC, to bring all of WC up-to-date
	with the latest tree in the repository (`L').

	The issue is that WC is an ugly hodgepodge of files; many files are
	still based on the original base tree B, but some files have been
	individually committed and match the contents of repository trees that
	are descendants of B. And some files are just locally modified, not
	yet committed.

	In other words, given that WC's tree is a unique mix of files unknown
	to the server, what exchange of information must take place so that
	the server can send a tree-delta that converts WC into L?

	(Note: this problem also describes the situation where the client is
	attempting to commit WC, and the server first needs to check WC's
	`up-to-date-ness' and point out conflicts.)



	Triangulation
	-------------

	Here's the plan of attack.

	d2
	---> B ------> L
	\ ^
	\ \|
	\ \| ?? d3
	d1 \ \|
	\ \|
	-> WC


	* The client sends a delta `d1' which describes how to convert B
	into WC. (It can do this because each change has been
	meticulously tracked in the working copy's administrative files.)

	* The server generates a delta `d2' which describes how to convert B
	into L. (It can do this because the repository has also been
	meticulously tracking each change from B to L.)

	* In theory, therefore, knowledge of d1 and d2 should allow us to
	create d3, which describes how to change WC into L.



	Defining `Change'
	-----------------

	Deltas only talk about entities that have changed.

	We say that two deltas conflict iff they each change one or more of
	the same entity.

	Define a function C() which, given a delta, returns an unordered list
	of all node numbers that the delta changes.

	Here are the rules that C() follows:

	1. If a file's contents change, then that file's node has changed.

	2. If a file is added, renamed, or deleted, then both the file's node
	and it's parent node have changed.


	Notation --

	C(P->Q) = { 3, 7, 9, 21 }.

	This statement means that a certain delta converts tree P into
	tree Q, and the node numbers changed in the process are 3, 7, 9,
	and 21.



	The Process
	-----------

	1. The client generates a `special' delta

	What do we mean by `special'?

	The client tracks local changes made to a working copy, queueing them
	up as the user works. Normally, when it's time to commit, the client
	generates a delta describing all the queued local changes, and submits
	them to the server. This makes sense, because the client only cares
	about discovering conflicts that are immediately relevant to the
	newest changes.

	But this list of local changes is not enough to create a delta
	B->WC! To be perfectly clear, there are two types of information that
	must included in a delta B->WC:

	* local changes that are queued, but not yet committed
	* local changes that have been committed since WC == B

	Therefore, the client's administrative files must keep track of both
	types of changes.


	2. Reconstruction of WC

	Next, the server receives B->WC and effectively "rebuilds" an internal
	representation of the oh-so-unique WC. The server's internal
	reconstruction of WC needn't be some large, complex object on disk or
	in RAM; it's just a representation of the WC tree structure, complete
	with each node's number and internal version number.

	We call this reconstructed WC `RWC'.


	3. The server generates a second delta.

	Examining node numbers (and their internal version numbers), the
	server creates a delta RWC->L.


	4. The server runs C() on both deltas

	The server compares the output of C(B->WC) and C(RWC->L), looking for
	any intersection of the sets. Any intersection of the sets indicates
	conflicting deltas.


	Examples
	--------

	Example 1: No conflict


	C(B->WC) = { 2, 5, 9, 13, 21, 87 }
	C(RWC->L) = { 3, 4, 6, 22, 54, 99, 102 }

	Because the two deltas have changed completely different sets of
	nodes, there's no conflict. If the client were attempting to
	commit, the delta would be approved. If the client were asking
	for an update, the server would send back RWC->L.


	Example 2: One conflict

	C(B->WC) = { 2, 5, 9, 13, 21, 87 }
	C(RWC->L) = { 3, 4, 6, 22, 54, 87, 102 }

	Both deltas have changed node 87. If the client were attempting
	to commit, the entire transaction would be denied. If the client
	were asking for an update, the server would point out the
	conflict. In both cases, it's up to the client to describe the
	conflict either with conflict markers (in the case of a changed
	file) or with an informational message (in the case of a changed
	directory).


	Example 3: Client adds a file

	Suppose that completely new file has been added to the working
	copy, but not yet committed.

	C(B->WC) = { 2, 5, 9, 13, 21, 87, ?? }
	C(RWC->L) = { 3, 4, 6, 22, 54, 99, 102 }

	The question marks indicate the fact that while C() was
	traversing the delta and looking up node numbers, it ran into an
	unrecognized pathname. There's no node number for the file,
	because it doesn't yet exist in the repository!

	This case demonstrates that mere node numbers aren't quite
	enough; pathnames must be also be examined when looking for
	conflicts between two deltas. E.g., suppose that the new file
	has the exact same pathname as node 102 in L?


	Example 4: Client deletes a file

	Deleting a file shows up in a tree-delta as a "change" to a file,
	and thus this file's node shows up in C(B->WC), so this is the
	same as example 2.


	Example 5: Client moves a file or directory.

	This action would show up as two changed node numbers in
	C(B->WC), and the rest would play out like example 2.