notes/tree-conflicts/detection.txt - subversion - Git at Google


                                -*- text -*-

                           TREE CONFLICT DETECTION


 Issue reference:  http://subversion.tigris.org/issues/show_bug.cgi?id=2282


 This file describes how tree conflicts described in use-cases.txt
 can be detected. It documents how detection currently works in the
 actual code, and also explains the limits of tree conflict detection
 imposed by Subversion's current design.

 Note that at the time of writing tree conflict detection has been
 implemented only for use cases 1 to 3. The current implementation has
 imperfect tree conflict detection, but it is still better than not
 handling tree conflicts at all. It provides a good safety net that
 helps users avoid running into tree conflict use cases 1 to 3. Once
 Subversion has been taught about true renames tree conflict detection
 can be changed to make use of this and become extremely precise.  See
 below for further explanation.

 ==========
 USE CASE 1
 ==========

 If 'svn update' modifies a file that has been scheduled for deletion
 in the working copy, the file is a tree conflict victim.

 ==========
 USE CASE 2
 ==========

 If 'svn update' deletes a file that has local modifications, the file
 is a tree conflict victim.

 ==========
 USE CASE 3
 ==========

 If 'svn update' deletes a file that has been scheduled for deletion in
 the working copy, the file is a tree conflict victim.

 ==========
 USE CASE 4
 ==========

 We skip tree conflict detection if the record_only field of the
 merge-command baton is TRUE. A record-only merge operation updates
 mergeinfo without touching files.

 If 'svn merge' tries to modify a file that does not exist in the
 target working copy, then the target file is a tree conflict victim.


 Notes on Resolution
 -------------------

 A likely cause of this case is that the source diff doesn't cover as
 many revisions as it should. The file should either be brought in by
 adding the revision that created the file to the list of revisions
 to be merged, or changes made to it on the source branch should be
 omitted from the merge range entirely.

 If the user does not wish to choose a source diff that avoids this
 conflict, then the user must resolve the conflict manually.

 If a modification to the nonexistent file is part of a larger diff
 with changes to other files that should be merged, the user will
 need to be able to manually resolve the tree-conflict while keeping
 the desired changes.

 Users must be able to run a second merge command to resolve the
 tree-conflict, or repeat a previous merge operation, but with
 additional revisions, without harm.

 However, the current plan is to disallow merges into tree-conflicted
 directories. This means that users will first have to mark the
 tree-conflict around the missing victim as resolved before attempting
 to merge the file again, this time including the revision that created
 the file. This may be a bit of an awkward work flow but is required to
 solve the problem this use case has in the current implementation,
 namely that missing files may accidentally be overlooked during merging.

 ==========
 USE CASE 5
 ==========

 We skip tree conflict detection if the record_only field of the
 merge-command baton is TRUE. A record-only merge operation updates
 mergeinfo without touching files.

 If 'svn merge' deletes an existing file, the file is a tree conflict
 victim if its text is different from the corresponding file on the left
 side of the merge source.

 To account for uncommitted text modifications in the working copy,
 we should do any text comparisons against the WORKING revision.

 Rationale:

 We don't want to flag every file deletion as a tree conflict.  We
 want to warn the user if the file to be deleted locally is different
 from the file deleted in the merge source.  The user then has a chance
 to merge these unique changes.

 Implementation:

 Call svn_client_diff_summarize2() to compare the target file to the
 file at the left side of the merge source.

 ==========
 USE CASE 6
 ==========

 We skip tree conflict detection if the record_only field of the
 merge-command baton is TRUE. A record-only merge operation updates
 mergeinfo without touching files.

 If 'svn merge' tries to delete a file that does not exist in the
 target working copy, then the target file is a tree conflict victim.

 This is similar to UC4.

 Rationale:

 Semantically, a tree conflict occurs if 'svn merge' either tries to apply
 the "delete" half of a "move" onto a file that was simply deleted in the
 target branch's history, or tries to apply a simple "delete" onto a file
 that has been moved in the target branch, or tries to move a file that
 has already been moved to a different name in the target branch.

 Notes on Resolution
 -------------------
 Some users may want to skip the tree conflict and have the result automatically
 resolved if two rename operations have the same destination, or if a file is
 simply deleted on both branches. But we have to mark these as tree conflicts
 due to the current lack of "true rename" support. It does not appear to be
 feasible to detect more than the double-delete aspect of the move operation.

 =========================
 OBSTRUCTIONS DURING MERGE
 =========================

 If 'svn merge' fails to apply an operation to a file because the
 file is obstructed (i.e. an unversioned item of the same name is
 in the file's place), the obstructed file is a tree conflict victim.

 Rationale:

 We want to make sure that a merge either completes successfully
 or any problems found during a merge are flagged as conflicts.
 Skipping obstructed items during merge is no longer acceptable
 behaviour, since users might not be aware of obstructions that were
 skipped when they commit the result of a merge.

 =========================================
 TREE CONFLICT DETECTION WITH TRUE RENAMES
 =========================================

 To properly detect the situations described in the "diagram of current
 behaviour" for use case 2 and 3, we need to have access to a list of
 all files the update will add with history.

 For use cases 1 and 3, we need a list of all files added locally with
 history.

 We need access to this list during the whole update editor drive.
 Then we could do something like this in the editor callbacks:

       edit_file(file):

         if file is locally deleted:
           for each added_file in files_locally_added_with_history:
             if file has common ancestor with added_file:
               /* user ran "svn move file added_file" */
               use case 1 has happened!

       delete_file(file):

         if file is locally modified:
           for each added_file in files_added_with_history_by_update:
             if file has common ancestor with added_file:
               use case 2 has happened!

         else if file is locally deleted:
           for each added_file in files_added_with_history_by_update:
             if file has common ancestor with added_file:
               use case 3 has happened!

 Since the update editor drive crawls through the working copy and the
 callbacks consider only a single file, we need to generate the list
 before checking for tree conflicts.  Two ideas for this are:

         1) Wrap the update editor with another editor that passes
            all calls through but takes note of which files the
            update adds with history. Once the wrapped editor is
            done run a second pass over the working copy to populate
            it with tree conflict info.

         2) Wrap the update editor with another editor that does
            not actually execute any edits but remembers them all.
            It only applies the edits once the wrapped editor has
            been fully driven. Tree conflicts could now be detected
            precisely because the list of files we need would be
            present before the actual edit is carried out.

 Approach 1 has the problem that there is no reliable way of storing
 the file list in face of an abort.

 Approach 2 is obviously insane. ;-)

 Keeping the list in RAM is dangerous, because the list would be lost
 if the user aborts, leaving behind an inconsistent working copy that
 potentially lacks tree conflict info for some conflicts.

 The usual place to store persistent information inside the working
 copy is the entries file in the administrative area. Loggy writes to
 this file ensure consistency even if the update is aborted.  But
 keeping the list in entries files also has problems: Which entries
 file do we keep it in? Scattering the list across lots of entries
 files isn't an option because the list needs to be global.  Crawling
 the whole working copy at the start of an update to gather lost file
 lists would be too much of a performance penalty.

 Storing it in the entries file of the anchor of the update operation
 (i.e. the current working directory of the "svn update" process) is a
 bad idea as well because when the interrupted update is continued the
 anchor might have changed. The user may change the working directory
 before running "svn update" again.

 Either way, interrupted updates would leave scattered partial lists of
 files in entries throughout the working copy. And interrupted updates
 may not correctly mark all tree conflicts.

 So how can, for example, use case 3 be detected properly?

 The answer could be "true renames". All the above is due to the fact
 that we have to try to catch use case 3 from a "delete this file"
 callback. We are in fact trying to reconstruct whether a deletion
 of a file was due to the file being moved with "svn move" or not.

 But if we had a callback in the update editor like:

         move_file(source, dest);

 detecting use case 3 would be extremely simple. Simply check whether
 the source of the move is locally deleted. If it is, use case 3 has
 happened, and the source of the move is a tree conflict victim.

 Use case 2 could be caught by checking whether the source of the move
 has local modifications.

 Use case 1 could be detected by checking whether the target for a file
 modification by update matches the source of a rename operation in the
 working copy. This would require storing rename information inside the
 administrative areas of both the source and target directories of file
 move operations to avoid having to maintain a global list of rename
 operations in the working copy for reference by the update editor.

	-- text --

	TREE CONFLICT DETECTION


	Issue reference: http://subversion.tigris.org/issues/show_bug.cgi?id=2282


	This file describes how tree conflicts described in use-cases.txt
	can be detected. It documents how detection currently works in the
	actual code, and also explains the limits of tree conflict detection
	imposed by Subversion's current design.

	Note that at the time of writing tree conflict detection has been
	implemented only for use cases 1 to 3. The current implementation has
	imperfect tree conflict detection, but it is still better than not
	handling tree conflicts at all. It provides a good safety net that
	helps users avoid running into tree conflict use cases 1 to 3. Once
	Subversion has been taught about true renames tree conflict detection
	can be changed to make use of this and become extremely precise. See
	below for further explanation.

	==========
	USE CASE 1
	==========

	If 'svn update' modifies a file that has been scheduled for deletion
	in the working copy, the file is a tree conflict victim.

	==========
	USE CASE 2
	==========

	If 'svn update' deletes a file that has local modifications, the file
	is a tree conflict victim.

	==========
	USE CASE 3
	==========

	If 'svn update' deletes a file that has been scheduled for deletion in
	the working copy, the file is a tree conflict victim.

	==========
	USE CASE 4
	==========

	We skip tree conflict detection if the record_only field of the
	merge-command baton is TRUE. A record-only merge operation updates
	mergeinfo without touching files.

	If 'svn merge' tries to modify a file that does not exist in the
	target working copy, then the target file is a tree conflict victim.


	Notes on Resolution
	-------------------

	A likely cause of this case is that the source diff doesn't cover as
	many revisions as it should. The file should either be brought in by
	adding the revision that created the file to the list of revisions
	to be merged, or changes made to it on the source branch should be
	omitted from the merge range entirely.

	If the user does not wish to choose a source diff that avoids this
	conflict, then the user must resolve the conflict manually.

	If a modification to the nonexistent file is part of a larger diff
	with changes to other files that should be merged, the user will
	need to be able to manually resolve the tree-conflict while keeping
	the desired changes.

	Users must be able to run a second merge command to resolve the
	tree-conflict, or repeat a previous merge operation, but with
	additional revisions, without harm.

	However, the current plan is to disallow merges into tree-conflicted
	directories. This means that users will first have to mark the
	tree-conflict around the missing victim as resolved before attempting
	to merge the file again, this time including the revision that created
	the file. This may be a bit of an awkward work flow but is required to
	solve the problem this use case has in the current implementation,
	namely that missing files may accidentally be overlooked during merging.

	==========
	USE CASE 5
	==========

	We skip tree conflict detection if the record_only field of the
	merge-command baton is TRUE. A record-only merge operation updates
	mergeinfo without touching files.

	If 'svn merge' deletes an existing file, the file is a tree conflict
	victim if its text is different from the corresponding file on the left
	side of the merge source.

	To account for uncommitted text modifications in the working copy,
	we should do any text comparisons against the WORKING revision.

	Rationale:

	We don't want to flag every file deletion as a tree conflict. We
	want to warn the user if the file to be deleted locally is different
	from the file deleted in the merge source. The user then has a chance
	to merge these unique changes.

	Implementation:

	Call svn_client_diff_summarize2() to compare the target file to the
	file at the left side of the merge source.

	==========
	USE CASE 6
	==========

	We skip tree conflict detection if the record_only field of the
	merge-command baton is TRUE. A record-only merge operation updates
	mergeinfo without touching files.

	If 'svn merge' tries to delete a file that does not exist in the
	target working copy, then the target file is a tree conflict victim.

	This is similar to UC4.

	Rationale:

	Semantically, a tree conflict occurs if 'svn merge' either tries to apply
	the "delete" half of a "move" onto a file that was simply deleted in the
	target branch's history, or tries to apply a simple "delete" onto a file
	that has been moved in the target branch, or tries to move a file that
	has already been moved to a different name in the target branch.

	Notes on Resolution
	-------------------
	Some users may want to skip the tree conflict and have the result automatically
	resolved if two rename operations have the same destination, or if a file is
	simply deleted on both branches. But we have to mark these as tree conflicts
	due to the current lack of "true rename" support. It does not appear to be
	feasible to detect more than the double-delete aspect of the move operation.

	=========================
	OBSTRUCTIONS DURING MERGE
	=========================

	If 'svn merge' fails to apply an operation to a file because the
	file is obstructed (i.e. an unversioned item of the same name is
	in the file's place), the obstructed file is a tree conflict victim.

	Rationale:

	We want to make sure that a merge either completes successfully
	or any problems found during a merge are flagged as conflicts.
	Skipping obstructed items during merge is no longer acceptable
	behaviour, since users might not be aware of obstructions that were
	skipped when they commit the result of a merge.

	=========================================
	TREE CONFLICT DETECTION WITH TRUE RENAMES
	=========================================

	To properly detect the situations described in the "diagram of current
	behaviour" for use case 2 and 3, we need to have access to a list of
	all files the update will add with history.

	For use cases 1 and 3, we need a list of all files added locally with
	history.

	We need access to this list during the whole update editor drive.
	Then we could do something like this in the editor callbacks:

	edit_file(file):

	if file is locally deleted:
	for each added_file in files_locally_added_with_history:
	if file has common ancestor with added_file:
	/* user ran "svn move file added_file" */
	use case 1 has happened!

	delete_file(file):

	if file is locally modified:
	for each added_file in files_added_with_history_by_update:
	if file has common ancestor with added_file:
	use case 2 has happened!

	else if file is locally deleted:
	for each added_file in files_added_with_history_by_update:
	if file has common ancestor with added_file:
	use case 3 has happened!

	Since the update editor drive crawls through the working copy and the
	callbacks consider only a single file, we need to generate the list
	before checking for tree conflicts. Two ideas for this are:

	1) Wrap the update editor with another editor that passes
	all calls through but takes note of which files the
	update adds with history. Once the wrapped editor is
	done run a second pass over the working copy to populate
	it with tree conflict info.

	2) Wrap the update editor with another editor that does
	not actually execute any edits but remembers them all.
	It only applies the edits once the wrapped editor has
	been fully driven. Tree conflicts could now be detected
	precisely because the list of files we need would be
	present before the actual edit is carried out.

	Approach 1 has the problem that there is no reliable way of storing
	the file list in face of an abort.

	Approach 2 is obviously insane. ;-)

	Keeping the list in RAM is dangerous, because the list would be lost
	if the user aborts, leaving behind an inconsistent working copy that
	potentially lacks tree conflict info for some conflicts.

	The usual place to store persistent information inside the working
	copy is the entries file in the administrative area. Loggy writes to
	this file ensure consistency even if the update is aborted. But
	keeping the list in entries files also has problems: Which entries
	file do we keep it in? Scattering the list across lots of entries
	files isn't an option because the list needs to be global. Crawling
	the whole working copy at the start of an update to gather lost file
	lists would be too much of a performance penalty.

	Storing it in the entries file of the anchor of the update operation
	(i.e. the current working directory of the "svn update" process) is a
	bad idea as well because when the interrupted update is continued the
	anchor might have changed. The user may change the working directory
	before running "svn update" again.

	Either way, interrupted updates would leave scattered partial lists of
	files in entries throughout the working copy. And interrupted updates
	may not correctly mark all tree conflicts.

	So how can, for example, use case 3 be detected properly?

	The answer could be "true renames". All the above is due to the fact
	that we have to try to catch use case 3 from a "delete this file"
	callback. We are in fact trying to reconstruct whether a deletion
	of a file was due to the file being moved with "svn move" or not.

	But if we had a callback in the update editor like:

	move_file(source, dest);

	detecting use case 3 would be extremely simple. Simply check whether
	the source of the move is locally deleted. If it is, use case 3 has
	happened, and the source of the move is a tree conflict victim.

	Use case 2 could be caught by checking whether the source of the move
	has local modifications.

	Use case 1 could be detected by checking whether the target for a file
	modification by update matches the source of a rename operation in the
	working copy. This would require storing rename information inside the
	administrative areas of both the source and target directories of file
	move operations to avoid having to maintain a global list of rename
	operations in the working copy for reference by the update editor.