notes/wc-ng/conflict-storage - subversion - Git at Google

                                                                 -*- Text -*-

 Conflict meta data storage in wc-ng
 ===================================

 Conflict meta data is stored in the ACTUAL_NODE table, within the
 'conflict_data' column. The data in this column is a skel containing
 conflict information (meaning the node is in conflict, and the details
 are inside), or NULL (meaning no conflict is present).

 The conflict skel has the form:

   (OPERATION (KIND KIND-SPECIFIC) (KIND KIND-SPECIFIC) ...)

 OPERATION indicates the operation which caused the conflict(s) and is
 detailed below.

 KIND indicates the kind of conflict description that follows and is one
 of:

   "text" - meaning a text conflict of the whole text of the node (which
     must be a file), with left/right/mine full texts saved, and (unless
     it's "binary") conflict markers in the working text;
   "prop" - meaning a "normal" property conflict, with left/right/mine full
     values saved;
   "tree" - meaning a tree conflict;
   "reject" - meaning a text conflict for a single hunk of unidiff text,
     with the source being a patch file (rather than left/right full texts),
     and with a "reject" file being saved containing the unidiff text;
   "obstructed" - meaning ### TODO

 KIND-SPECIFIC is specific to each KIND, and is detailed below.

 There are restrictions on what mixture of conflicts can meaningfully be
 recorded - e.g. there must not be two "text" nor one "text" and one
 "reject".  These restrictions are implied by the nature of operations
 creating the conflicts but not spelled out here.

 If the 'conflict_data' column is not NULL, then at least one
 KIND of conflict skel must exist, describing the conflict(s).

 Contrary to wc-1, wc-ng records sufficient information to help users
 understand, in hindsight, which operation led to the conflict (as long
 as all conflict information is exposed by the UI).

 Some information which wc-1 was storing in entries has no direct
 equivalent in wc-ng conflict storage (such as paths to temporary files),
 but this information can be deduced from the information stored
 (e.g. conflict-old and friends; foo.r42 is now 'foo' + '.r' + left_rev)

 ### BH: We have to store the exact name of the conflict marker files. If we
 ### just 'guess' how conflict markers are named by using their revision
 ### numbers, we can't handle situations where there are existing files with
 ### these names. The WC-1.0 code uses a unique name function to generate a
 ### unique marker file name which happens to match this pattern if there are
 ### no conflicts, but sometimes explicitly preserves the existing file
 ### extension to help diff tools. (See 'preserved-conflict-file-exts' in our
 ### config). I think we can move the names of the markers into the skel and/or
 ### keep them in their own columns. (These names are needed on filling a
 ### svn_wc_entry_t)

 Operation skel
 --------------

 Meaning:  The Operation skel indicates what kind of operation was being
 performed that resulted in a conflict, including the format and content
 (or reference to content) of the conflicting change that was being
 applied to this node.

 The OPERATION skel has the following form:

   (NAME OPERATION-SPECIFIC)

 NAME is one of:

   "update" - meaning a 3-way merge as in "svn update";
   "switch" - meaning a 3-way merge as in "svn switch";
   "merge" - meaning a 3(4?)-way merge as in "svn merge";
   "patch" - meaning application of a unidiff patch, as in "svn patch".

 OPERATION-SPECIFIC is as follows:

 To record an "update" operation, the skel has the form:

   ("update" BASE_REV TARGET_REV)

   BASE_REV is the base revision prior to the update.
   TARGET_REV is the revision being updated to.

 ### sbutler: What about mixed-revision working copies?  Let's record
 ### the equivalent of svn_wc_revision_status_t, plus the target rev:
 ###
 ###   ("update" MIN_REV MAX_REV SWITCHED MODIFIED TARGET_REV)
 ###
 ### Otherwise, the user may get the mistaken impression that the local
 ### tree is entirely at the URL and revision of the victim dir.

 For "switch", the skel has the form:

   ("switch" BASE_REV TARGET_REV REPOS_RELPATH)

   BASE_REV and TARGET_REV are as for "update" above.
   REPOS_RELPATH is the path in the repository being switched to.

 For "merge", the skel has the form:

   ("merge" LEFT_REV RIGHT_REV REPOS_UUID REPOS_ROOT_URL
    (LEFT_REPOS_RELPATH LEFT_PEG_REV)
    (RIGHT_REPOS_RELPATH RIGHT_PEG_REV) )

   LEFT_REV is the merge-left revision, and RIGHT_REV is the merge-right
     revision of a continuous revision range which was merged (merge tracking
     might split a merge up into multiple merges of continuous revision ranges).

   REPOS_UUID is the UUID of the repository being merged from, in order to
     recognize merges from foreign repositories.

   REPOS_ROOT_URL is the repository root URL the repository being merged from.

   {LEFT,RIGHT}_REPOS_RELPATH is the path in the repository of the {left,right}
     version of the item.

   {LEFT,RIGHT}_CONFLICT_REV is the revision in which to find the
     {left,right} version of the item which caused the conflict.
     These are usually LEFT_REV or RIGHT_REV, but in some cases they
     may differ (a simple example is if a file was replaced in revision rX
     somewhere between LEFT_REV and RIGHT_REV, and the conflict is due to
     events which happened between LEFT_REV and rX).

 For "patch", the skel has the form:

   ("patch" PATCH_SOURCE_LABEL)

   PATCH_SOURCE_LABEL is (typically) the absolute path of the patch
   file the application of which led to conflicts. In the future, it
   may also be something like "<stdin>".


 Text conflicts
 --------------

 Text conflicts only exist on files. The following skel represents the
 "text" KIND of conflict:

   ("text" ORIGINAL_SHA1 MINE_SHA1 INCOMING_SHA1)

 {ORIGINAL,MINE}_SHA1 are SHA1 checksums of the full texts of
 the {original (BASE), mine (WORKING)} version of the file.

 INCOMING_SHA1 is the SHA1 checksum of the incoming version of the file.
 ### Need INCOMING_{LEFT,RIGHT}_SHA1 for 4-way merge?

 File version's content can be obtained from the pristine store.

 ### BH: We need some marker here, but these values must also be stored
 ###     in the older_checksum, left_checksum, right_checksum colums of ACTUAL
 ###     to allow pristine store cleanups.

 ### BH: What about symlinks?
 ### stsp: I guess we can say that all SHA1 sums refer to proper files,
 ###   and symlinks are resolved before the SHA1 is calculated and
 ###   stored in the db?


 Property conflicts
 --------------

 Property conflicts can exist on files, directories and symlinks.
 There can be one or more property conflicts on the node, represented
 by one or more "prop" KIND conflicts. Each "prop" conflict has the
 following form:

   ("prop" PROPERTY_NAME
           ([ORIGINAL_VALUE])
           ([MINE_VALUE])
           ([INCOMING_VALUE])
           ([INCOMING_BASE_VALUE]))

 PROPERTY_NAME is the name of the property, such as "svn:eol-style".

 Each property value ({ORIGINAL,MINE,INCOMING,INCOMING_BASE}_VALUE) is
 represented as an empty list indicating the property did not exist in
 that version, or a 1-item list containing the particular value.

 ORIGINAL_VALUE is the property that was checked out
 MINE_VALUE is the current/ACTUAL value in the working copy
 INCOMING_VALUE is the new/target value from an update/merge/etc
 INCOMING_BASE_VALUE is used during merges, as an incoming property
   change is expressed as "change from INCOMING_BASE to INCOMING"

 ### stsp: What's the size limit of a prop value?
 ### HKW: In theory, there isn't one.  In practice, we used to caution people
 ###   against having too large of props, but I don't know if that is
 ###   true anymore.


 Tree conflicts
 --------------

 Tree conflicts exist on files or directories.

 ### JAF: And symlinks, I presume - or, if not, why not?
 ### stsp: Symlinks are resolved before retrieving conflict information.

 The following information is stored if there is a tree conflict on the node:

   ("tree" LOCAL_STATE INCOMING_STATE)

   LOCAL_STATE := (LOCAL_CHANGE ORIGINAL_NODE_KIND MINE_NODE_KIND
                   [ORIGINAL_SHA1 MINE_SHA1])
   INCOMING_STATE := (INCOMING_CHANGE INCOMING_NODE_KIND [INCOMING_SHA1])

 LOCAL_CHANGE is the local change which conflicted with the
 incoming change during the operation. Possible values are "edit", "add",
 "delete", "rename", "replace", "obstructed", "missing", "unversioned",
 "moved-away", "moved-here", and "copied-here".

 ### possibly collapse "unversioned" with "obstructed"?

 ### what is "replace"? we should probably have "replace-add",
 ### "replace-moved-away", "replace-moved-here", and "replace-copied-here"
 ### hrm. this probably isn't the right representation. "replace-add"
 ### says how the new node arrived, but not how the original departed.
 ### was it a deleted or moved-away? for example, a local-deleted
 ### followed by an add, followed by an incoming-delete should probably
 ### be deemed "no conflict".

 ORIGINAL_NODE_KIND is the kind of the node in the BASE tree.
 MINE_NODE_KIND is the kind of the node from the WORKING tree at the
 time the conflict was flagged.

 INCOMING_CHANGE is the incoming change which conflicted with the
 local change during the operation. Possible values are "edit", "add",
 "delete", "rename", "replace", "moved-away", "moved-here", and
 "copied-here".
 ### see concerns above about LOCAL_CHANGE.

 The *_SHA1 sum fields are only present if {ORIGINAL,MINE,INCOMING}_NODE_KIND
 is "file".

 ORIGINAL_SHA1 is the SHA1 of the BASE version of the tree conflict victim
 file in the working copy. MINE_SHA1 is the SHA1 of the WORKING version
 of the tree conflict victim file as of the time the conflict was flagged.

 If INCOMING_KIND is "file", INCOMING_SHA1 is the SHA1 of the file
 which the operation was attempting to install in the working copy.

 The file version's content can be obtained from the pristine store.

 ### BH: We need to duplicate the sha1 values in the older_checksum,
 ###     left_checksum, right_checksum columns of ACTUAL
 ###     to allow pristine store cleanups.

 ### BH: Can we share some of the sha1 logic with the text conflicts to
 ###     allow resolving this in the same way?
 ###     (We should keep the history of the node valid via replace vs update)
 ### stsp: I don't really understand your question. Can you be more specific?


 (Unversioned) Obstructions
 --------------------------

 When an update introduces a new node where an existing unversioned node is
 stored locally we need to add some marker to allow the operation to update
 the BASE_NODE table.

 There is no particular data which needs to be recorded for an
 obstruction. Thus, the "obstructed" conflict skel has the form:

   ("obstructed")


 Reject conflicts
 ----------------
 For patches, the content of the left and right versions is not fully known,
 so the conflict is not a diff3-style text conflict. Rather, the conflict
 is the failure to find a match for a hunk's context in the patch target.
 There can be one or more reject conflicts on a node. Each "reject" conflict
 has the following form:

   REJECTED_HUNK_LIST := (HUNK_ORIGINAL_OFFSET HUNK_ORIGINAL_LENGTH
                          HUNK_MODIFIED_OFFSET HUNK_MODIFIED_LENGTH)*

   ("reject" REJECT_FILE TARGET_PATCH_SHA1 REJECTED_HUNK_LIST)

 REJECT_FILE is ...

 TARGET_PATCH_SHA1 is <selection of patch file applying to target>
 the sha1 of the unidiff content of the rejected
 hunk as written to the .svnpatch.rej file. The actual unidiff content
 (which can be large!) can be retrieved from the pristine store.

 HUNK_{ORIGINAL,MODIFIED}_OFFSET and HUNK_{ORIGINAL,MODIFIED}_LENGTH
 are the hunk header values as parsed from the patch file (i.e. the "ID"
 of the hunk within the patch file). These also occur in the reject
 diff text but are stored here for easy retrieval.

 ### BH: Using a sha1 here, makes it impossible to cleanup the pristine store
 ###     The pristine store needs all references to be stored in a DB column.
 ###     To support this we would need an extra table.
 ### stsp: I'm fine with not storing the reject diff text if we don't
 ###   have a good location for it. However, keeping it around in case
 ###   the user deletes the tempfile would be nice. And I don't see an issue
 ###   with also storing the SHA1 sum in the ACTUAL table. We do this for
 ###   text conflicts as well. Why would it need an extra table?

   ("prop-reject" REJECT_FILE
    (PROPERTY_NAME TARGET_PATCH_SHA1 REJECTED_HUNK_LIST)* )
	-- Text --

	Conflict meta data storage in wc-ng
	===================================

	Conflict meta data is stored in the ACTUAL_NODE table, within the
	'conflict_data' column. The data in this column is a skel containing
	conflict information (meaning the node is in conflict, and the details
	are inside), or NULL (meaning no conflict is present).

	The conflict skel has the form:

	(OPERATION (KIND KIND-SPECIFIC) (KIND KIND-SPECIFIC) ...)

	OPERATION indicates the operation which caused the conflict(s) and is
	detailed below.

	KIND indicates the kind of conflict description that follows and is one
	of:

	"text" - meaning a text conflict of the whole text of the node (which
	must be a file), with left/right/mine full texts saved, and (unless
	it's "binary") conflict markers in the working text;
	"prop" - meaning a "normal" property conflict, with left/right/mine full
	values saved;
	"tree" - meaning a tree conflict;
	"reject" - meaning a text conflict for a single hunk of unidiff text,
	with the source being a patch file (rather than left/right full texts),
	and with a "reject" file being saved containing the unidiff text;
	"obstructed" - meaning ### TODO

	KIND-SPECIFIC is specific to each KIND, and is detailed below.

	There are restrictions on what mixture of conflicts can meaningfully be
	recorded - e.g. there must not be two "text" nor one "text" and one
	"reject". These restrictions are implied by the nature of operations
	creating the conflicts but not spelled out here.

	If the 'conflict_data' column is not NULL, then at least one
	KIND of conflict skel must exist, describing the conflict(s).

	Contrary to wc-1, wc-ng records sufficient information to help users
	understand, in hindsight, which operation led to the conflict (as long
	as all conflict information is exposed by the UI).

	Some information which wc-1 was storing in entries has no direct
	equivalent in wc-ng conflict storage (such as paths to temporary files),
	but this information can be deduced from the information stored
	(e.g. conflict-old and friends; foo.r42 is now 'foo' + '.r' + left_rev)

	### BH: We have to store the exact name of the conflict marker files. If we
	### just 'guess' how conflict markers are named by using their revision
	### numbers, we can't handle situations where there are existing files with
	### these names. The WC-1.0 code uses a unique name function to generate a
	### unique marker file name which happens to match this pattern if there are
	### no conflicts, but sometimes explicitly preserves the existing file
	### extension to help diff tools. (See 'preserved-conflict-file-exts' in our
	### config). I think we can move the names of the markers into the skel and/or
	### keep them in their own columns. (These names are needed on filling a
	### svn_wc_entry_t)

	Operation skel
	--------------

	Meaning: The Operation skel indicates what kind of operation was being
	performed that resulted in a conflict, including the format and content
	(or reference to content) of the conflicting change that was being
	applied to this node.

	The OPERATION skel has the following form:

	(NAME OPERATION-SPECIFIC)

	NAME is one of:

	"update" - meaning a 3-way merge as in "svn update";
	"switch" - meaning a 3-way merge as in "svn switch";
	"merge" - meaning a 3(4?)-way merge as in "svn merge";
	"patch" - meaning application of a unidiff patch, as in "svn patch".

	OPERATION-SPECIFIC is as follows:

	To record an "update" operation, the skel has the form:

	("update" BASE_REV TARGET_REV)

	BASE_REV is the base revision prior to the update.
	TARGET_REV is the revision being updated to.

	### sbutler: What about mixed-revision working copies? Let's record
	### the equivalent of svn_wc_revision_status_t, plus the target rev:
	###
	### ("update" MIN_REV MAX_REV SWITCHED MODIFIED TARGET_REV)
	###
	### Otherwise, the user may get the mistaken impression that the local
	### tree is entirely at the URL and revision of the victim dir.

	For "switch", the skel has the form:

	("switch" BASE_REV TARGET_REV REPOS_RELPATH)

	BASE_REV and TARGET_REV are as for "update" above.
	REPOS_RELPATH is the path in the repository being switched to.

	For "merge", the skel has the form:

	("merge" LEFT_REV RIGHT_REV REPOS_UUID REPOS_ROOT_URL
	(LEFT_REPOS_RELPATH LEFT_PEG_REV)
	(RIGHT_REPOS_RELPATH RIGHT_PEG_REV) )

	LEFT_REV is the merge-left revision, and RIGHT_REV is the merge-right
	revision of a continuous revision range which was merged (merge tracking
	might split a merge up into multiple merges of continuous revision ranges).

	REPOS_UUID is the UUID of the repository being merged from, in order to
	recognize merges from foreign repositories.

	REPOS_ROOT_URL is the repository root URL the repository being merged from.

	{LEFT,RIGHT}_REPOS_RELPATH is the path in the repository of the {left,right}
	version of the item.

	{LEFT,RIGHT}_CONFLICT_REV is the revision in which to find the
	{left,right} version of the item which caused the conflict.
	These are usually LEFT_REV or RIGHT_REV, but in some cases they
	may differ (a simple example is if a file was replaced in revision rX
	somewhere between LEFT_REV and RIGHT_REV, and the conflict is due to
	events which happened between LEFT_REV and rX).

	For "patch", the skel has the form:

	("patch" PATCH_SOURCE_LABEL)

	PATCH_SOURCE_LABEL is (typically) the absolute path of the patch
	file the application of which led to conflicts. In the future, it
	may also be something like "<stdin>".


	Text conflicts
	--------------

	Text conflicts only exist on files. The following skel represents the
	"text" KIND of conflict:

	("text" ORIGINAL_SHA1 MINE_SHA1 INCOMING_SHA1)

	{ORIGINAL,MINE}_SHA1 are SHA1 checksums of the full texts of
	the {original (BASE), mine (WORKING)} version of the file.

	INCOMING_SHA1 is the SHA1 checksum of the incoming version of the file.
	### Need INCOMING_{LEFT,RIGHT}_SHA1 for 4-way merge?

	File version's content can be obtained from the pristine store.

	### BH: We need some marker here, but these values must also be stored
	### in the older_checksum, left_checksum, right_checksum colums of ACTUAL
	### to allow pristine store cleanups.

	### BH: What about symlinks?
	### stsp: I guess we can say that all SHA1 sums refer to proper files,
	### and symlinks are resolved before the SHA1 is calculated and
	### stored in the db?


	Property conflicts
	--------------

	Property conflicts can exist on files, directories and symlinks.
	There can be one or more property conflicts on the node, represented
	by one or more "prop" KIND conflicts. Each "prop" conflict has the
	following form:

	("prop" PROPERTY_NAME
	([ORIGINAL_VALUE])
	([MINE_VALUE])
	([INCOMING_VALUE])
	([INCOMING_BASE_VALUE]))

	PROPERTY_NAME is the name of the property, such as "svn:eol-style".

	Each property value ({ORIGINAL,MINE,INCOMING,INCOMING_BASE}_VALUE) is
	represented as an empty list indicating the property did not exist in
	that version, or a 1-item list containing the particular value.

	ORIGINAL_VALUE is the property that was checked out
	MINE_VALUE is the current/ACTUAL value in the working copy
	INCOMING_VALUE is the new/target value from an update/merge/etc
	INCOMING_BASE_VALUE is used during merges, as an incoming property
	change is expressed as "change from INCOMING_BASE to INCOMING"

	### stsp: What's the size limit of a prop value?
	### HKW: In theory, there isn't one. In practice, we used to caution people
	### against having too large of props, but I don't know if that is
	### true anymore.


	Tree conflicts
	--------------

	Tree conflicts exist on files or directories.

	### JAF: And symlinks, I presume - or, if not, why not?
	### stsp: Symlinks are resolved before retrieving conflict information.

	The following information is stored if there is a tree conflict on the node:

	("tree" LOCAL_STATE INCOMING_STATE)

	LOCAL_STATE := (LOCAL_CHANGE ORIGINAL_NODE_KIND MINE_NODE_KIND
	[ORIGINAL_SHA1 MINE_SHA1])
	INCOMING_STATE := (INCOMING_CHANGE INCOMING_NODE_KIND [INCOMING_SHA1])

	LOCAL_CHANGE is the local change which conflicted with the
	incoming change during the operation. Possible values are "edit", "add",
	"delete", "rename", "replace", "obstructed", "missing", "unversioned",
	"moved-away", "moved-here", and "copied-here".

	### possibly collapse "unversioned" with "obstructed"?

	### what is "replace"? we should probably have "replace-add",
	### "replace-moved-away", "replace-moved-here", and "replace-copied-here"
	### hrm. this probably isn't the right representation. "replace-add"
	### says how the new node arrived, but not how the original departed.
	### was it a deleted or moved-away? for example, a local-deleted
	### followed by an add, followed by an incoming-delete should probably
	### be deemed "no conflict".

	ORIGINAL_NODE_KIND is the kind of the node in the BASE tree.
	MINE_NODE_KIND is the kind of the node from the WORKING tree at the
	time the conflict was flagged.

	INCOMING_CHANGE is the incoming change which conflicted with the
	local change during the operation. Possible values are "edit", "add",
	"delete", "rename", "replace", "moved-away", "moved-here", and
	"copied-here".
	### see concerns above about LOCAL_CHANGE.

	The *_SHA1 sum fields are only present if {ORIGINAL,MINE,INCOMING}_NODE_KIND
	is "file".

	ORIGINAL_SHA1 is the SHA1 of the BASE version of the tree conflict victim
	file in the working copy. MINE_SHA1 is the SHA1 of the WORKING version
	of the tree conflict victim file as of the time the conflict was flagged.

	If INCOMING_KIND is "file", INCOMING_SHA1 is the SHA1 of the file
	which the operation was attempting to install in the working copy.

	The file version's content can be obtained from the pristine store.

	### BH: We need to duplicate the sha1 values in the older_checksum,
	### left_checksum, right_checksum columns of ACTUAL
	### to allow pristine store cleanups.

	### BH: Can we share some of the sha1 logic with the text conflicts to
	### allow resolving this in the same way?
	### (We should keep the history of the node valid via replace vs update)
	### stsp: I don't really understand your question. Can you be more specific?


	(Unversioned) Obstructions
	--------------------------

	When an update introduces a new node where an existing unversioned node is
	stored locally we need to add some marker to allow the operation to update
	the BASE_NODE table.

	There is no particular data which needs to be recorded for an
	obstruction. Thus, the "obstructed" conflict skel has the form:

	("obstructed")


	Reject conflicts
	----------------
	For patches, the content of the left and right versions is not fully known,
	so the conflict is not a diff3-style text conflict. Rather, the conflict
	is the failure to find a match for a hunk's context in the patch target.
	There can be one or more reject conflicts on a node. Each "reject" conflict
	has the following form:

	REJECTED_HUNK_LIST := (HUNK_ORIGINAL_OFFSET HUNK_ORIGINAL_LENGTH
	HUNK_MODIFIED_OFFSET HUNK_MODIFIED_LENGTH)*

	("reject" REJECT_FILE TARGET_PATCH_SHA1 REJECTED_HUNK_LIST)

	REJECT_FILE is ...

	TARGET_PATCH_SHA1 is <selection of patch file applying to target>
	the sha1 of the unidiff content of the rejected
	hunk as written to the .svnpatch.rej file. The actual unidiff content
	(which can be large!) can be retrieved from the pristine store.

	HUNK_{ORIGINAL,MODIFIED}_OFFSET and HUNK_{ORIGINAL,MODIFIED}_LENGTH
	are the hunk header values as parsed from the patch file (i.e. the "ID"
	of the hunk within the patch file). These also occur in the reject
	diff text but are stored here for easy retrieval.

	### BH: Using a sha1 here, makes it impossible to cleanup the pristine store
	### The pristine store needs all references to be stored in a DB column.
	### To support this we would need an extra table.
	### stsp: I'm fine with not storing the reject diff text if we don't
	### have a good location for it. However, keeping it around in case
	### the user deletes the tempfile would be nice. And I don't see an issue
	### with also storing the SHA1 sum in the ACTUAL table. We do this for
	### text conflicts as well. Why would it need an extra table?

	("prop-reject" REJECT_FILE
	(PROPERTY_NAME TARGET_PATCH_SHA1 REJECTED_HUNK_LIST)* )