doc/model.texi - subversion - Git at Google

 @node Model
 @chapter Model

 This chapter explains the user's view of Subversion --- what ``objects''
 you interact with, how they behave, and how they relate to each other.

 @menu
 * Working Directories and Repositories::
 * Transactions and Version Numbers::
 * How Working Directories Track the Repository::
 * Subversion Does Not Lock Files::
 * Properties::
 * Merging and Ancestry::
 @end menu

 @c -----------------------------------------------------------------------
 @node Working Directories and Repositories
 @section Working Directories and Repositories

 Suppose you are using Subversion to manage a software project.  There
 are two things you will interact with: your working directory, and the
 repository.

 Your @dfn{working directory} is an ordinary directory tree, on your
 local system, containing your project's sources.  You can edit these
 files and compile your program from them in the usual way.  Your working
 directory is your own private work area: Subversion never changes the
 files in your working directory, or publishes the changes you make
 there, until you explicitly tell it to do so.

 After you've made some changes to the files in your working directory,
 and verified that they work properly, Subversion provides commands to
 publish your changes to the other people working with you on your
 project.  If they publish their own changes, Subversion provides
 commands to incorporate those changes into your working directory.

 A working directory contains some extra files, created and maintained by
 Subversion, to help it carry out these commands.  In particular, these
 files help Subversion recognize which files contain unpublished changes,
 and which files are out-of-date with respect to others' work.

 While your working directory is for your use alone, the @dfn{repository}
 is the common public record you share with everyone else working on the
 project.  To publish your changes, you use Subversion to put them in the
 repository.  (What this means, exactly, we explain below.)  Once your
 changes are in the repository, others can tell Subversion to incorporate
 your changes into their working directories.  In a collaborative
 environment like this, each user will typically have their own working
 directory (or perhaps more than one), and all the working directories
 will be backed by a single repository, shared amongst all the users.

 A Subversion repository holds a single directory tree, and records the
 history of changes to that tree.  The repository retains enough
 information to recreate any prior state of the tree, compute the
 differences between any two prior trees, and report the relations
 between files in the tree --- which files are derived from which other
 files.

 A Subversion repository can hold the source code for several projects;
 usually, each project is a subdirectory in the tree.  In this
 arrangement, a working directory will usually correspond to a particular
 subtree of the repository.

 For example, suppose you have a repository laid out like this:
 @example
 /trunk/paint/Makefile
              canvas.c
              brush.c
        write/Makefile
              document.c
              search.c
 @end example

 In other words, the repository's root directory has a single
 subdirectory named @file{trunk}, which itself contains two
 subdirectories: @file{paint} and @file{write}.

 To get a working directory, you must @dfn{check out} some subtree of the
 repository.  If you check out @file{/trunk/write}, you will get a working
 directory like this:
 @example
 write/Makefile
       document.c
       search.c
       SVN/
 @end example
 This working directory is a copy of the repository's @file{/trunk/write}
 directory, with one additional entry --- @file{SVN} --- which holds the
 extra information needed by Subversion, as mentioned above.

 Suppose you make changes to @file{search.c}.  Since the @file{SVN}
 directory remembers the file's modification date and original contents,
 Subversion can tell that you've changed the file.  However, Subversion
 does not make your changes public until you explicitly tell it to.

 To publish your changes, you can use Subversion's @samp{commit} command:
 @example
 $ pwd
 /home/jimb/write
 $ ls
 Makefile   SVN/    document.c    search.c
 $ svn commit search.c
 $
 @end example

 Now your changes to @file{search.c} have been committed to the
 repository; if another user checks out a working copy of
 @file{/trunk/write}, they will see your text.

 Suppose you have a collaborator, Felix, who checked out a working
 directory of @file{/trunk/write} at the same time you did.  When you
 commit your change to @file{search.c}, Felix's working copy is left
 unchanged; Subversion only modifies working directories at the user's
 request.

 To bring his working directory up to date, Felix can use the Subversion
 @samp{update} command.  This will incorporate your changes into his
 working directory, as well as any others that have been committed since
 he checked it out.
 @example
 $ pwd
 /home/felix/write
 $ ls
 Makefile    SVN/   document.c    search.c
 $ svn update
 U search.c
 $
 @end example

 The output from the @samp{svn update} command indicates that Subversion
 updated the contents of @file{search.c}.  Note that Felix didn't need to
 specify which files to update; Subversion uses the information in the
 @file{SVN} directory, and further information in the repository, to
 decide which files need to be brought up to date.

 We explain below what happens when both you and Felix make changes to
 the same file.


 @c -----------------------------------------------------------------------
 @node Transactions and Version Numbers
 @section Transactions and Version Numbers

 A Subversion @samp{commit} operation can publish changes to any number
 of files and directories as a single atomic transaction.  In your
 working directory, you can change files' contents, create, delete,
 rename and copy files and directories, and then commit the completed set
 of changes as a unit.

 In the repository, each commit is treated as an atomic transaction:
 either all the commit's changes take place, or none of them take place.
 Subversion tries to retain this atomicity in the face of program
 crashes, system crashes, network problems, and other users' actions.  We
 may call a commit a @dfn{transaction} when we want to emphasize its
 indivisible nature.

 Each time the repository accepts a transaction, this creates a new state
 of the tree, called a @dfn{version}.  Each version is assigned a unique
 natural number, one greater than the number of the previous version.
 The initial version of a freshly created repository is numbered zero,
 and consists of an empty root directory.

 Since each transaction creates a new version, with its own number, we
 can also use these numbers to refer to transactions; transaction @var{n}
 is the transaction which created version @var{n}.  There is no
 transaction numbered zero.

 Unlike those of many other systems, Subversion's version numbers apply
 to an entire tree, not individual files.  Each version number selects an
 entire tree.

 It's important to note that working directories do not always correspond
 to any single version in the repository; they may contain files from
 several different versions.  For example, suppose you check out a
 working directory from a repository whose most recent version is 4:
 @example
 write/Makefile:4
       document.c:4
       search.c:4
 @end example

 At the moment, this working directory corresponds exactly to version 4
 in the repository.  However, suppose you make a change to
 @file{search.c}, and commit that change.  Assuming no other commits have
 taken place, your commit will create version 5 of the repository, and
 your working directory will look like this:
 @example
 write/Makefile:4
       document.c:4
       search.c:5
 @end example
 Suppose that, at this point, Felix commits a change to
 @file{document.c}, creating version 6.  If you use @samp{svn update} to
 bring your working directory up to date, then it will look like this:
 @example
 write/Makefile:6
       document.c:6
       search.c:6
 @end example
 Felix's changes to @file{document.c} will appear in your working copy of
 that file, and your change will still be present in @file{search.c}.  In
 this example, the text of @file{Makefile} is identical in versions 4, 5,
 and 6, but Subversion will mark your working copy with version 6 to
 indicate that it is still current.  So, after you do a clean update at
 the root of your working directory, your working directory will
 generally correspond exactly to some version in the repository.


 @c -----------------------------------------------------------------------
 @node How Working Directories Track the Repository
 @section How Working Directories Track the Repository

 For each file in a working directory, Subversion records two essential
 pieces of information:
 @itemize @bullet
 @item
 what version of what repository file your working copy is based on (this is called the file's @dfn{base version}), and
 @item
 a timestamp recording when the local copy was last updated.
 @end itemize

 Given this information, by talking to the repository, Subversion can
 tell which of the following four states a file is in:
 @itemize
 @item
 @b{Unchanged, and current.}  The file is unchanged in the working
 directory, and no changes to that file have been committed to the
 repository since its base version.
 @item
 @b{Locally changed, and current}.  The file has been changed in the
 working directory, and no changes to that file have been committed to
 the repository since its base version.  There are local changes that
 have not been committed to the repository.
 @item
 @b{Unchanged, and out-of-date}.  The file has not been changed in the
 working directory, but it has been changed in the repository.  The file
 should eventually be updated, to make it current with the public
 version.
 @item
 @b{Locally changed, and out-of-date}.  The file has been changed both
 in the working directory, and in the repository.  The file should be
 updated; Subversion will attempt to merge the public changes with the
 local changes.  If it can't complete the merge in a plausible way
 automatically, Subversion leaves it to the user to resolve the conflict.
 @end itemize


 @c -----------------------------------------------------------------------
 @node Subversion Does Not Lock Files
 @section Subversion Does Not Lock Files

 Subversion does not prevent two users from making changes to the same
 file at the same time.  For example, if both you and Felix have checked
 out working directories of @file{/trunk/write}, Subversion will allow
 both of you to change @file{write/search.c} in your working directories.
 Then, the following sequence of events will occur:
 @itemize @bullet
 @item
 Suppose Felix tries to commit his changes to @file{search.c} first.  His
 commit will succeed, and his text will appear in the latest version in
 the repository.
 @item
 When you attempt to commit your changes to @file{search.c}, Subversion
 will reject your commit, and tell you that you must update
 @file{search.c} before you can commit it.
 @item
 When you update @file{search.c}, Subversion will try to merge Felix's
 changes from the repository with your local changes.  By default,
 Subversion merges as if it were applying a patch: if your local changes
 do not overlap textually with Felix's, then all is well; otherwise,
 Subversion leaves it to you to resolve the overlapping
 changes.  In either case,
 Subversion carefully preserves a copy of the original pre-merge text.
 @item
 Once you have verified that Felix's changes and your changes have been
 merged correctly, you can commit the new version of @file{search.c},
 which now contains everyone's changes.
 @end itemize

 Some version control systems provide ``locks'', which prevent others
 from changing a file once one person has begun working on it.  In our
 experience, merging is preferable to locks, because:
 @itemize @bullet
 @item
 changes usually do not conflict, so Subversion's behavior does the right
 thing by default, while locking can interfere with legitimate work;
 @item
 locking can prevent conflicts within a file, but not conflicts between
 files (say, between a C header file and another file that includes it),
 so it doesn't really solve the problem; and finally,
 @item
 people often forget that they are holding locks, resulting in
 unnecessary delays and friction.
 @end itemize

 Of course, the merge process needs to be under the users' control.
 Patch is not appropriate for files with rigid formats, like images or
 executables.  Subversion allows users to customize its merging behavior
 on a per-file basis.  You can direct Subversion to refuse to merge
 changes to certain files, and simply present you with the two original
 texts to choose from.  Or, you can direct Subversion to merge using a
 tool which respects the semantics of the file format.


 @c -----------------------------------------------------------------------
 @node Properties
 @section Properties

 Files generally have interesting attributes beyond their contents:
 owners and groups, access permissions, creation and modification times,
 and so on.  Subversion attempts to preserve these attributes, or at
 least record them, when doing so would be meaningful.  However,
 different operating systems support very different sets of file
 attributes: Windows NT supports access control lists, while Linux
 provides only the simpler traditional Unix permission bits.

 In order to interoperate well with clients on many different operating
 systems, Subversion supports @dfn{property lists}, a simple,
 general-purpose mechanism which clients can use to store arbitrary
 out-of-band information about files.

 A property list is a set of name / value pairs.  A property name is an
 arbitrary text string, expressed as a Unicode UTF-8 string, canonically
 decomposed and ordered.  A property value is an arbitrary string of
 bytes.  Property values may be of any size, but Subversion may not
 handle very large property values efficiently.  No two properties in a
 given a property list may have the same name.  Although the word `list'
 usually denotes an ordered sequence, there is no fixed order to the
 properties in a property list; the term `property list' is historical.

 Each version number, file, directory, and directory entry in the
 Subversion repository, has its own property list.  Subversion puts these
 property lists to several uses:
 @itemize @bullet
 @item
 Clients use properties to store file attributes, as described above.
 For example, the Unix Subversion client records the files' permission
 bits as the value of a property called
 @samp{svn-posix-access-permission}.  Operating systems which allow files
 to have more than one name, like Windows 95, can use directory entry
 property lists to record files' alternative names.
 @item
 The Subversion server uses properties to hold attributes of its own, and
 allow clients to read and modify them.  For example, the @samp{svn-acl}
 property holds an access control list which the Subversion server uses
 to regulate access to repository files.
 @item
 Users can invent properties of their own, to store arbitrary information
 for use by scripts, build environments, and so on.  Names of user
 properties should be URI's, to avoid conflicts between organizations.
 @end itemize

 Property lists are versioned, just like file contents.  You can change
 properties in your working directory, but those changes are not visible
 in the repository until you commit your local changes.  If you do commit
 a change to a property value, other users will see your change when they
 update their working directories.


 @c -----------------------------------------------------------------------
 @node Merging and Ancestry
 @section Merging and Ancestry

 Subversion defines merges the same way CVS does: to merge means to take
 a set of previously committed changes and apply them, as a patch, to a
 working copy.  This change can then be committed, like any other change.
 (In Subversion's case, the patch may include changes to directory trees,
 not just file contents.)

 As defined thus far, merging is equivalent to hand-editing the working
 copy into the same state as would result from the patch application.  In
 fact, in CVS there @emph{is} no difference -- it is equivalent to just
 editing the files, and there is no record of which ancestors these
 particular changes came from.  Unfortunately, this leads to conflicts
 when users unintentionally merge the same changes again.  (Experienced
 CVS users avoid this problem by using branch- and merge-point tags, but
 that involves a lot of unwieldy bookkeeping.)

 In Subversion, merges are remembered by recording @dfn{ancestry sets}.
 A version's ancestry set is the set of all changes "accounted for" in
 that version.  By maintaining ancestry sets, and consulting them when
 doing merges, Subversion can detect when it would apply the same patch
 twice, and spare users much bookkeeping.  Ancestry sets are stored as
 properties.

 In the examples below, bear in mind that version numbers usually refer
 to changes, rather than the full contents of that version.  For example,
 "the change A:4" means "the delta that resulted in A:4", not "the full
 contents of A:4".

 The simplest ancestor sets are associated with linear histories.  For
 example, here's the history of a file A:

 @example
 @group

  _____        _____        _____        _____        _____
 |     |      |     |      |     |      |     |      |     |
 | A:1 |----->| A:2 |----->| A:3 |----->| A:4 |----->| A:5 |
 |_____|      |_____|      |_____|      |_____|      |_____|

 @end group
 @end example

 The ancestor set of A:5 is:

 @example
 @group

   @{ A:1, A:2, A:3, A:4, A:5 @}

 @end group
 @end example

 That is, it includes the change that brought A from nothing to A:1, the
 change from A:1 to A:2, and so on to A:5.  From now on, ranges like this
 will be represented with a more compact notation:

 @example
 @group

   @{ A:1-5 @}

 @end group
 @end example

 Now assume there's a branch B based, or "rooted", at A:2.  (This
 postulates an entirely different version history, of course, and the
 global version numbers in the diagrams will change to reflect it.)
 Here's what the project looks like with the branch:

 @example
 @group

  _____        _____        _____        _____        _____        _____
 |     |      |     |      |     |      |     |      |     |      |     |
 | A:1 |----->| A:2 |----->| A:4 |----->| A:6 |----->| A:8 |----->| A:9 |
 |_____|      |_____|      |_____|      |_____|      |_____|      |_____|
                 \
                  \
                   \  _____        _____        _____
                    \|     |      |     |      |     |
                     | B:3 |----->| B:5 |----->| B:7 |
                     |_____|      |_____|      |_____|

 @end group
 @end example

 If we produce A:9 by merging the B branch back into the trunk

 @example
 @group

  _____        _____        _____        _____        _____        _____
 |     |      |     |      |     |      |     |      |     |      |     |
 | A:1 |----->| A:2 |----->| A:4 |----->| A:6 |----->| A:8 |---.->| A:9 |
 |_____|      |_____|      |_____|      |_____|      |_____|  /   |_____|
                 \                                            |
                  \                                           |
                   \  _____        _____        _____        /
                    \|     |      |     |      |     |      /
                     | B:3 |----->| B:5 |----->| B:7 |--->-'
                     |_____|      |_____|      |_____|

 @end group
 @end example

 then what will A:9's ancestor set be?

 @example
 @group

   @{ A:1, A:2, A:4, A:6, A:8, A:9, B:3, B:5, B:7@}

 @end group
 @end example

 or more compactly:

 @example
 @group

   @{ A:1-9, B:3-7 @}

 @end group
 @end example

 (It's all right that each file's ranges seem to include non-changes;
 this is just a notational convenience, and you can think of the
 non-changes as either not being included, or being included but being
 null deltas as far as that file is concerned).

 All changes along the B line are accounted for (changes B:3-7), and so
 are all changes along the A line, including both the merge and any
 non-merge-related edits made before the commit.

 Although this merge happened to include all the branch changes, that
 needn't be the case.  For example, the next time we merge the B line

 @example
 @group

  _____     _____     _____     _____     _____      _____      _____
 |     |   |     |   |     |   |     |   |     |    |     |    |     |
 | A:1 |-->| A:2 |-->| A:4 |-->| A:6 |-->| A:8 |-.->| A:9 |-.->|A:11 |
 |_____|   |_____|   |_____|   |_____|   |_____| |  |_____| |  |_____|
              \                                  /          |
               \                                /           |
                \  _____     _____     _____   / _____      |
                 \|     |   |     |   |     | / |     |    /
                  | B:3 |-->| B:5 |-->| B:7 |-->|B:10 |->-'
                  |_____|   |_____|   |_____|   |_____|

 @end group
 @end example

 Subversion will know that A's ancestry set already contains B:3-7, so
 only the difference between B:7 and B:10 will be applied.  A's new
 ancestry will be

 @example
 @group

   @{ A:1-11, B:3-10 @}

 @end group
 @end example

 But why limit ourselves to contiguous ranges?  An ancestry set is truly
 a set -- it can be any subset of the changes available:

 @example
 @group

  _____        _____        _____        _____        _____        _____
 |     |      |     |      |     |      |     |      |     |      |     |
 | A:1 |----->| A:2 |----->| A:4 |----->| A:6 |----->| A:8 |--.-->|A:10 |
 |_____|      |_____|      |_____|      |_____|      |_____| /    |_____|
                 |                                          /
                 |                ______________________.__/
                 |               /                      |
                 |              /                       |
                 \           __/_                      _|__
                  \         @{    @}                    @{    @}
                   \  _____        _____        _____        _____
                    \|     |      |     |      |     |      |     |
                     | B:3 |----->| B:5 |----->| B:7 |----->| B:9 |----->
                     |_____|      |_____|      |_____|      |_____|

 @end group
 @end example

 In this diagram, the change from B:3-5 and the change from B:7-9 are
 merged into a working copy whose ancestry set (so far) is @w{@{ A:1-8
 @}} plus any local changes.  After committing, A:10's ancestry set is

 @example
 @group

   @{ A:1-10, B:5, B:9 @}

 @end group
 @end example

 Clearly, saying "Let's merge branch B into A" is a little ambiguous.  It
 usually means "Merge all the changes accounted for in B's tip into A",
 but it @emph{might} mean "Merge the single change that resulted in B's
 tip into A".

 Any merge, when viewed in detail, is an application of a particular set
 of changes -- not necessarily adjacent ones -- to a working copy.  The
 user-level interface may allow some of these changes to be specified
 implicitly.  For example, many merges involve a single, contiguous range
 of changes, with one or both ends of the range easily deducible from
 context (i.e., branch root to branch tip).  These inference rules are
 not specified here, but it should be clear in most contexts how they
 work.

 Because each node knows its ancestors, Subversion never merges the same
 change twice (unless you force it to).  For example, if after the above
 merge, you tell Subversion to merge all B changes into A, Subversion
 will notice that two of them have already been merged, and so merge only
 the other two changes, resulting in a final ancestry set of:

 @example
 @group

   @{ A:1-10, B:3-9 @}

 @end group
 @end example

 @c Heh, what about this:
 @c
 @c    B:3 adds line 3, with the text "foo".
 @c    B:5 deletes line 3.
 @c    B:7 adds line 3, with the text "foo".
 @c    B:9 deletes line 3.
 @c
 @c The user first merges B:5 and B:9 into A.  If A had that line, it
 @c goes away now, nothing more.
 @c
 @c Next, user merges B:3 and B:7 into A.  The second merge must
 @c conflict.
 @c
 @c I'm not sure we need to care about this, I just thought I'd note how
 @c even merges that seem like they ought to be easily composable can
 @c still suck. :-)

 This description of merging and ancestry applies to both intra- and
 inter-repository merges.  However, inter-repository merging will
 probably not be implemented until a future release of Subversion
 (@pxref{Future}).