doc/programmer/design/goals.texi - subversion - Git at Google

 @node Goals
 @chapter Goals

 The goal of the Subversion project is to write a version control system
 that takes over CVS's current and future user base @footnote{If you're
 not familiar with CVS or its shortcomings, then skip to
 @ref{Model}}. The first release has all the major features of CVS, plus
 certain new features that CVS users often wish they had.  In general,
 Subversion works like CVS, except where there's a compelling reason to
 be different.

 So what does Subversion have that CVS doesn't?

 @itemize @bullet
 @item
 It versions directories, file-metadata, renames, copies and
 removals/resurrections.  In other words, Subversion records the changes
 users make to directory trees, not just changes to file contents.

 @item
 Tagging and branching are constant-time and constant-space.

 @item
 It is natively client-server, hence much more maintainable than CVS.
 (In CVS, the client-server protocol was added as an afterthought.
 This means that most new features have to be implemented twice, or at
 least more than once: code for the local case, and code for the
 client-server case.)

 @item
 The repository is organized efficiently and comprehensibly.  (Without
 going into too much detail, let's just say that CVS's repository
 structure is showing its age.)

 @item
 Commits are atomic.  Each commit results in a single revision number,
 which refers to the state of the entire tree.  Files no longer have
 their own revision numbers.

 @item
 The locking scheme is only as strict as absolutely necessary.
 Reads are never locked, and writes lock only the files being
 written, for only as long as needed.

 @item
 It has internationalization support.

 @item
 It handles binary files gracefully (experience has shown that CVS's
 binary file handling is prone to user error).

 @item
 It takes advantage of the Net's experience with CVS by choosing better
 default behaviors for certain situations.

 @end itemize

 Some of these advantages are clear and require no further discussion.
 Others are not so obvious, and are explained in greater detail below.

 @menu
 * Rename/removal/resurrection support::
 * Text vs binary issues::
 * I18N/Multilingual support::
 * Branching and tagging::
 * Miscellaneous new behaviors::
 @end menu

 @c -----------------------------------------------------------------------
 @node Rename/removal/resurrection support
 @section Rename/removal/resurrection support

 Full rename support means you can trace through ancestry by name
 @emph{or} by entity.  For example, if you say "Give me revision 12 of
 foo.c", do you mean revision 12 of the file whose name is @emph{now}
 foo.c (but perhaps it was named bar.c back at revision 12), or the file
 whose name was foo.c in revision 12 (perhaps that file no longer exists,
 or has a different name now)?  In Subversion, both interpretations are
 available to the user.

 (Note:  we've not yet implemented this, but it wouldn't be too hard.
 People are advocating switches to 'svn log' that cause history to be
 traced backwards either by entity or by path.)

 @c -----------------------------------------------------------------------
 @node Text vs binary issues
 @section Text vs binary issues

 Historically, binary files have been problematic in CVS for two
 unrelated reasons: keyword expansion, and line-end conversion.
 @*
 @itemize @bullet
 @item
 @dfn{Keyword expansion} is when CVS expands "$Revision: 1.1 $" into "$Revision
 1.1$", for example.  There are a number of keywords in CVS: "$Author: sussman $",
 "$Date: 2001/06/04 22:00:52 $", and so on.
 @*
 @item
 @dfn{Line-end conversion} is when CVS gives plaintext files the
 appropriate line-ending conventions for the working copy's platform.
 For example, Unix working copies use LF, but Windows working copies use
 CRLF.  (Like CVS, the Subversion repository stores text files in Unix LF
 format).
 @end itemize
 @*
 Both keyword substitution and line-end conversion are sensible only for
 plain text files.  CVS only recognizes two file types anyway: plaintext
 and binary.  And CVS assumes files are plain text unless you tell it
 otherwise.

 Subversion recognizes the same two types.  The question is, how does
 it determine a file's type?  Experience with CVS suggests that
 assuming text unless told otherwise is a losing strategy -- people
 frequently forget to mark images and other opaque formats as binary,
 then later they wonder why CVS mangled their data.  So Subversion will
 not mangle data: when moving over the network, or when being stored in
 the repository, it treats all files as binary.  In the working copy, a
 tweakable meta-data property indicates whether to treat the file as
 text or binary for purposes of whether or not to allow contextual
 merging during updates.

 Users can turn line-end conversion on or off per file by tweaking
 meta-data.  Files do @emph{not} undergo keyword substitution by
 default, on the theory that if someone wants substitution and isn't
 getting it, they'll look in the manual; but if they are getting it and
 didn't want it, they might just be confused and not know what to do.
 Users can turn substitution on or off per file.

 Both of these changes are done on the client side; the repository does
 not even know about them.

 @c -----------------------------------------------------------------------
 @node I18N/Multilingual support
 @section I18N/Multilingual support

 Subversion is internationalized -- commands, user messages, and errors
 can be customized to the appropriate human language at build-time (or
 run time, if that's not much harder).

 File names and contents may be multilingual; Subversion does not assume
 an ASCII-only universe.  For purposes of keyword expansion and line-end
 conversion, Subversion also understands the UTF-* encodings (but not
 necessarily all of them by the first release).

 @c -----------------------------------------------------------------------
 @node Branching and tagging
 @section Branching and tagging

 Subversion supports branching and tagging with one efficient operation:
 `clone'.  To clone a tree is to copy it, to create another tree exactly
 like it (except that the new tree knows its ancestry relationship to the
 old one).

 At the moment of creation, a clone requires only a small, constant
 amount of space in the repository -- most of its storage is shared with
 the original tree.  If you never commit anything on the clone, then it's
 just like a CVS tag.  If you start committing on it, then it's a branch.
 Voila!  This also implies CVS's "vendor branching" feature, since
 Subversion has real rename and directory support.

 @c -----------------------------------------------------------------------
 @node Miscellaneous new behaviors
 @section Miscellaneous new behaviors

 @menu
 * Log messages::
 * Client side diff plug-ins::
 * Better merging::
 * Conflicts resolution::
 @end menu

 @c -----------------------------------------------------------------------
 @node Log messages
 @subsection Log messages

 Subversion has a flexible log message policy (a small matter, but one
 dear to our hearts).

 Log messages should be a matter of project policy, not version control
 software policy.  If a user commits with no log message, then Subversion
 defaults to an empty message.  (CVS tries to require log messages, but
 fails: we've all seen empty log messages in CVS, where the user
 committed with deliberately empty quotes.  Let's stop the madness now.)

 @c -----------------------------------------------------------------------
 @node Client side diff plug-ins
 @subsection Client side diff plug-ins

 Subversion supports client-side plug-in diff programs.

 There is no need for Subversion to have every possible diff mechanism
 built in.  It can invoke a user-specified client-side diff program on
 the two revisions of the file(s) locally.

 (Note:  This feature does not exist yet, but is planned for post-1.0.)

 @c -----------------------------------------------------------------------
 @node Better merging
 @subsection Better merging

 Subversion remembers what has already been merged in and what hasn't,
 thereby avoiding the problem, familiar to CVS users, of spurious
 conflicts on repeated merges.

 (Note:  This feature does not exist yet, but is planned for post-1.0.)

 For details, @xref{Merging and Ancestry}.

 @c -----------------------------------------------------------------------
 @node Conflicts resolution
 @subsection Conflicts resolution

 For text files, Subversion resolves conflicts similarly to CVS, by
 folding repository changes into the working files with conflict
 markers.  But, for @emph{both} text and binary files, Subversion also
 always puts the old and new pristine repository revisions into
 temporary files, and the pristine working copy revision in another
 temporary file.

 Thus, for any conflict, the user has four files readily at hand:

 @enumerate
 @item the original working copy file with local mods
 @item the older repository file
 @item the newest repository file
 @item the merged file, with conflict markers
 @end enumerate

 and in a binary file conflict, the user has all but the last.

 When the conflict has been resolved and the working copy is committed,
 Subversion automatically removes the temporary pristine files.

 A more general solution would allow plug-in merge resolution tools on
 the client side; but this is not scheduled for the first release).
 Note that users can use their own merge tools anyway, since all the
 original files are available.
	@node Goals
	@chapter Goals

	The goal of the Subversion project is to write a version control system
	that takes over CVS's current and future user base @footnote{If you're
	not familiar with CVS or its shortcomings, then skip to
	@ref{Model}}. The first release has all the major features of CVS, plus
	certain new features that CVS users often wish they had. In general,
	Subversion works like CVS, except where there's a compelling reason to
	be different.

	So what does Subversion have that CVS doesn't?

	@itemize @bullet
	@item
	It versions directories, file-metadata, renames, copies and
	removals/resurrections. In other words, Subversion records the changes
	users make to directory trees, not just changes to file contents.

	@item
	Tagging and branching are constant-time and constant-space.

	@item
	It is natively client-server, hence much more maintainable than CVS.
	(In CVS, the client-server protocol was added as an afterthought.
	This means that most new features have to be implemented twice, or at
	least more than once: code for the local case, and code for the
	client-server case.)

	@item
	The repository is organized efficiently and comprehensibly. (Without
	going into too much detail, let's just say that CVS's repository
	structure is showing its age.)

	@item
	Commits are atomic. Each commit results in a single revision number,
	which refers to the state of the entire tree. Files no longer have
	their own revision numbers.

	@item
	The locking scheme is only as strict as absolutely necessary.
	Reads are never locked, and writes lock only the files being
	written, for only as long as needed.

	@item
	It has internationalization support.

	@item
	It handles binary files gracefully (experience has shown that CVS's
	binary file handling is prone to user error).

	@item
	It takes advantage of the Net's experience with CVS by choosing better
	default behaviors for certain situations.

	@end itemize

	Some of these advantages are clear and require no further discussion.
	Others are not so obvious, and are explained in greater detail below.

	@menu
	* Rename/removal/resurrection support::
	* Text vs binary issues::
	* I18N/Multilingual support::
	* Branching and tagging::
	* Miscellaneous new behaviors::
	@end menu

	@c -----------------------------------------------------------------------
	@node Rename/removal/resurrection support
	@section Rename/removal/resurrection support

	Full rename support means you can trace through ancestry by name
	@emph{or} by entity. For example, if you say "Give me revision 12 of
	foo.c", do you mean revision 12 of the file whose name is @emph{now}
	foo.c (but perhaps it was named bar.c back at revision 12), or the file
	whose name was foo.c in revision 12 (perhaps that file no longer exists,
	or has a different name now)? In Subversion, both interpretations are
	available to the user.

	(Note: we've not yet implemented this, but it wouldn't be too hard.
	People are advocating switches to 'svn log' that cause history to be
	traced backwards either by entity or by path.)

	@c -----------------------------------------------------------------------
	@node Text vs binary issues
	@section Text vs binary issues

	Historically, binary files have been problematic in CVS for two
	unrelated reasons: keyword expansion, and line-end conversion.
	@*
	@itemize @bullet
	@item
	@dfn{Keyword expansion} is when CVS expands "$Revision: 1.1 $" into "$Revision
	1.1$", for example. There are a number of keywords in CVS: "$Author: sussman $",
	"$Date: 2001/06/04 22:00:52 $", and so on.
	@*
	@item
	@dfn{Line-end conversion} is when CVS gives plaintext files the
	appropriate line-ending conventions for the working copy's platform.
	For example, Unix working copies use LF, but Windows working copies use
	CRLF. (Like CVS, the Subversion repository stores text files in Unix LF
	format).
	@end itemize
	@*
	Both keyword substitution and line-end conversion are sensible only for
	plain text files. CVS only recognizes two file types anyway: plaintext
	and binary. And CVS assumes files are plain text unless you tell it
	otherwise.

	Subversion recognizes the same two types. The question is, how does
	it determine a file's type? Experience with CVS suggests that
	assuming text unless told otherwise is a losing strategy -- people
	frequently forget to mark images and other opaque formats as binary,
	then later they wonder why CVS mangled their data. So Subversion will
	not mangle data: when moving over the network, or when being stored in
	the repository, it treats all files as binary. In the working copy, a
	tweakable meta-data property indicates whether to treat the file as
	text or binary for purposes of whether or not to allow contextual
	merging during updates.

	Users can turn line-end conversion on or off per file by tweaking
	meta-data. Files do @emph{not} undergo keyword substitution by
	default, on the theory that if someone wants substitution and isn't
	getting it, they'll look in the manual; but if they are getting it and
	didn't want it, they might just be confused and not know what to do.
	Users can turn substitution on or off per file.

	Both of these changes are done on the client side; the repository does
	not even know about them.

	@c -----------------------------------------------------------------------
	@node I18N/Multilingual support
	@section I18N/Multilingual support

	Subversion is internationalized -- commands, user messages, and errors
	can be customized to the appropriate human language at build-time (or
	run time, if that's not much harder).

	File names and contents may be multilingual; Subversion does not assume
	an ASCII-only universe. For purposes of keyword expansion and line-end
	conversion, Subversion also understands the UTF-* encodings (but not
	necessarily all of them by the first release).

	@c -----------------------------------------------------------------------
	@node Branching and tagging
	@section Branching and tagging

	Subversion supports branching and tagging with one efficient operation:
	`clone'. To clone a tree is to copy it, to create another tree exactly
	like it (except that the new tree knows its ancestry relationship to the
	old one).

	At the moment of creation, a clone requires only a small, constant
	amount of space in the repository -- most of its storage is shared with
	the original tree. If you never commit anything on the clone, then it's
	just like a CVS tag. If you start committing on it, then it's a branch.
	Voila! This also implies CVS's "vendor branching" feature, since
	Subversion has real rename and directory support.

	@c -----------------------------------------------------------------------
	@node Miscellaneous new behaviors
	@section Miscellaneous new behaviors

	@menu
	* Log messages::
	* Client side diff plug-ins::
	* Better merging::
	* Conflicts resolution::
	@end menu

	@c -----------------------------------------------------------------------
	@node Log messages
	@subsection Log messages

	Subversion has a flexible log message policy (a small matter, but one
	dear to our hearts).

	Log messages should be a matter of project policy, not version control
	software policy. If a user commits with no log message, then Subversion
	defaults to an empty message. (CVS tries to require log messages, but
	fails: we've all seen empty log messages in CVS, where the user
	committed with deliberately empty quotes. Let's stop the madness now.)

	@c -----------------------------------------------------------------------
	@node Client side diff plug-ins
	@subsection Client side diff plug-ins

	Subversion supports client-side plug-in diff programs.

	There is no need for Subversion to have every possible diff mechanism
	built in. It can invoke a user-specified client-side diff program on
	the two revisions of the file(s) locally.

	(Note: This feature does not exist yet, but is planned for post-1.0.)

	@c -----------------------------------------------------------------------
	@node Better merging
	@subsection Better merging

	Subversion remembers what has already been merged in and what hasn't,
	thereby avoiding the problem, familiar to CVS users, of spurious
	conflicts on repeated merges.

	(Note: This feature does not exist yet, but is planned for post-1.0.)

	For details, @xref{Merging and Ancestry}.

	@c -----------------------------------------------------------------------
	@node Conflicts resolution
	@subsection Conflicts resolution

	For text files, Subversion resolves conflicts similarly to CVS, by
	folding repository changes into the working files with conflict
	markers. But, for @emph{both} text and binary files, Subversion also
	always puts the old and new pristine repository revisions into
	temporary files, and the pristine working copy revision in another
	temporary file.

	Thus, for any conflict, the user has four files readily at hand:

	@enumerate
	@item the original working copy file with local mods
	@item the older repository file
	@item the newest repository file
	@item the merged file, with conflict markers
	@end enumerate

	and in a binary file conflict, the user has all but the last.

	When the conflict has been resolved and the working copy is committed,
	Subversion automatically removes the temporary pristine files.

	A more general solution would allow plug-in merge resolution tools on
	the client side; but this is not scheduled for the first release).
	Note that users can use their own merge tools anyway, since all the
	original files are available.