notes/obliterate/obliterate-functional-spec.txt - subversion - Git at Google

 *******************************************************
 * FUNCTIONAL SPECIFICATION FOR ISSUE #516: OBLITERATE *
 *******************************************************

 0. TABLE OF CONTENTS

      0. TABLE OF CONTENTS

      1. INTRODUCTION
         1.1. Use Case Overview
         1.2. Current Solution

      2. DEFINITION OF THE OBLITERATION OPERATION
         2.1. Supporting Definitions
         2.2. Main Definition
         2.3. Notes

      3. DIFFERENT TYPES OF (CORE) OBLITERATION
         3.1. ABSOLUTE vs. VIRTUAL obliteration
         3.2. ONLINE vs. OFFLINE vs. REPO-INVALIDATING

      4. LIMITATIONS ON THE OBLITERATION SET
         4.1. Obliteration of Deletions (is forbidden)
         4.2. Obliteration of the HEAD revision (is forbidden)

      5. USE-CASES IN DETAIL

 1. INTRODUCTION

   This document serves as the functional specification for what's
   commonly called the 'svn obliterate' feature.

   1.1. Use Case Overview

     A. Disable all remote access to confidential information in
        a repository. [client security]

        a. Description

           A user has added information to the repository that should
           not have been made public. The distribution of this
           information must be halted, and where it has been
           distributed, it must be removed.

        b. Examples

           User adding documents with confidential information to the
           repository. Distribution to working copies and mirrors needs
           to be stopped ASAP.

        c. Primary actor triggering this use case

           A key user of the repository that knows what confidential
           information should be removed, and who can estimate the
           impact of obliteration (which paths, which revision range(s)
           etc.

           Normal users should not be able to obliterate. For those
           users we already have 'svn rm'.

     B. Information must be completely removed, both from client access
        and from access on the server side. [server security]

        a. Description

           A user has added information to the repository that should
           not have been made public. The distribution of this
           information must be halted. Additionally, the information
           must be purged from the repository data files themselves,
           either because of the threat of a security breach or because
           the repository itself (or dumps) is publicly distributed.

        b. Examples

           User adding source code to the repository, finds out later
           that it's infringing certain intellectual rights. All traces
           of the infringing source code, including all derivatives,
           need to be removed from both distribution and from the
           repository itself.

        c. Primary actor triggering this use case

           A key user of the repository that knows what confidential
           information should be removed, and who can estimate the
           impact of obliteration (which paths, which revision
           range(s) etc.

           Normal users should not be able to obliterate.

     C. Remove obsolete information from a repository and free the
        associated disc space. [disc space]

        a. Description

           This is the case where unneeded or obsolete information is
           stored in the repository, taking up lots of disc space. In
           order to free up disc space, this information may be
           obliterated.

           This use case typically requires removal of certain subsets
           of the revision history of the repository, while leaving
           later revisions intact. Furthermore, any copies made from an
           obliterated part of the repository to an unobliterated part
           should still be intact.

           This use case is often combined with archiving of the
           obsolete information: archive first, then obliterate.

        b. Examples

           User adding large development tools, binaries or
           external libraries to the product by mistake.

           Users managing huge files (MB/GB's) as part of their
           normal workflow. These files can be removed when work on
           newer versions has started.

           Users adding source code, assets and build deliverables
           in the same repository. Certain assets or build
           deliverables can be removed

           When a project is moved to its own repository, the
           project's files may be obliterated from the original
           repository. This includes moving old projects to an
           archive repository.

           Repositories setup to store product deliverables. Those
           deliverables for old unmaintained versions, like
           everything older than a revision or date, may be
           obliterated from the repository.

           Removal of dead branches which changes have and will not
           be included in the main development line.

        C. Primary actor triggering this use case

           A repository administrator that's concerned about disc space
           usage. However, only a key user can decide which information
           may be obliterated.

           Normal users should not be able to obliterate.

   1.2. Current Solution

     A. Dump -> Filter -> Load

        Subversion already has a solution in place to completely remove
        information from a repository. It's a combination of dumping a
        repository to text format (svnadmin dump), using filters to
        remove some nodes or revisions from the text (svndumpfilter)
        and then loading it back into a new repository (svnadmin load).

        Where svndumpfilter is used to remove information from a
        repository, obliterate should cover at least all of its
        features.

     B. Advantages of current solution

        + svndumpfilter exists today.
        + It has the most basic include and exclude filters built-in.
        + Its functionality is reasonably well understood.

     C. Disadvantages of current solution

        + svndumpfilter has a number of issues (see issue tracker).
        + Filtering options are limited to include or exclude paths
        + No wildcard support...
        + Filtering is based on pathnames, not node based
        + Due to its streamy way of working it has no random access
          to the source nor target repository, hence it can't rewrite
          copies or later modifications on filtered files.
        + Uses an intermediate text format and requires filtering the
          whole repository, not only the relevant revisions -> Slow.
        + Requires the extra disc space for the output repository.
        + The svndumpfiler code is not actively maintained.
        + Slow.
        + Requires shell access on repository server or at least access
          to dump files.

 2. DEFINITION OF THE CORE OBLITERATION OPERATION

   This section defines the "core" functionality of the obliteration
   operation in a minimal way. The core functionality defined here
   can then be used to implement more complicated operations.

   2.1. Supporting Definitions

     An OBLITERATION SET is defined by a list of PATH@REVISON
     elements (that is, each element is a pair, consisting of a PATH
     and REVISION). The same PATH can be paired with multiple
     REVISIONS to form multiple elements and vice versa.
     (Some restrictions are needed for this set, see notes below)

     An ORIGINAL repository is a repository to which an OBLITERATION
     operation could be applied, but has not (this includes any
     subversion repository without obliterations).

     A MODIFIED repository is a repository which is identical to the
     ORIGINAL but for which an OBLITERATION SET has been defined and
     an OBLITERATION operation has been applied.

     A REFERENCE repository is a repository for which any checkout
     operation yields the same data as for the MODIFIED repository, but
     no OBLITERATION operation has been applied to it (it has been
     constructed using regular commits).

   2.2. Main Definition

     The OBLITERATION operation is defined by the following
     three properties:

     1. If a PATH@REVISION is checked out of the MODIFIED repository,
        and the PATH@REVISION is NOT in the OBLITERATION SET, the
        checkout data is identical to what would have been returned
        if PATH@REVISION had been checked out of the ORIGINAL.

     2. If a PATH@REVISION is checked out of the MODIFIED repository,
        and the PATH@REVISION IS in the OBLITERATION SET, the
        checkout data is identical to what would have been returned
        if PATH@REVPRIOR had been checked out of the ORIGINAL, where
        REVPRIOR is the last revision prior to REVISION for which
        PATH@REVPRIOR is not in the OBLITERATION SET.

     3. Any other mechanism through which a user can interact with
        the repository (diff/merge/copy/commit/svnsync/etc) should work
        consistently. That is, every remote interaction with MODIFIED
        must yield a result indistinguishable from what would happen if
        the same operation were applied to the REFERENCE repository.

   2.3. Notes

     a. In the text above, "data" refers to the reported existence of
        the path, the versioned properties that apply to the path, and
        for files, the actual contents of the file.

     b. The definition does not state what happens to revision
        properties (several options are available), and it does not
        state what happens to the reported history of the path (again,
        several options are available).

     c. This definition does not state how an obliteration set is
        constructed to map out each use case. This is intentional, and
        intended to minimize the complexity of the core OBLITERATION
        operation, and break down the assignment into manageable tasks.
        Mappings between use cases and obliteration sets are discussed
        in individual sections below.

     d. Implicit in the above is the fact that the core OBLITERATION
        functionality would not drop empty revisions. This is
        intentional, and intended to minimize the complexity of the
        core OBLITERATION operation, and break down the assignment
        into manageable tasks. The removal of empty revisions can
        be implemented through a separate pass (and is actually
        implemented already through svndumpfilter).

     e. The hierarchical nature of the path structure of a project
        unavoidably leads to certain restrictions on the OBLITERATION
        SET. A repository in consistent state has the property that
        whenever a path of the form [PATH/RELATIVEPATH] exists in the
        repository, the parent directory  must also exist.

        Thus, a minimal restriction on the OBLITERATION SET is that
          1) A change that adds [PATH]@REVISION must never be
             obliterated unless every sub-path change, adding
             [PATH/RELATIVEPATH]@REVISION,
             is obliterated at the same time.
          2) A change that deletes [PATH/RELATIVEPATH]@REVISION
             must never be obliterated unless every parent-path
             change, deleting [PATH]@REVISION,
             is obliterated at the same time.

        The restrictions implied by note (e) are discussed in more
        detail in section 4.1 of this document.

 3. DIFFERENT TYPES OF (CORE) OBLITERATION

     At a high level, obliteration can be classified along two
     dimensions. The ABSOLUTE vs. VIRTUAL distinction refers to whether
     obliterated data is removed from disk or merely hidden. The
     OFFLINE vs. ONLINE distinction refers to whether working copies
     can continue to be used after an obliteration operation. The
     dimensions are independent, so "ABSOLUTE ONLINE", "ABSOLUTE
     OFFLINE", "VIRTUAL ONLINE" and "VIRTUAL OFFLINE" are all possible
     (although "VIRTUAL OFFLINE" has little use).

   3.1. ABSOLUTE vs. VIRTUAL obliteration

     a. For an ABSOLUTE obliteration, the repository must be traversed,
        and all obliterated data expunged totally, recapturing disk
        space and preventing a person with access to the repository
        from retrieving the information.

        ABSOLUTE obliteration is very time-consuming and require a
        complete copy of the repository (in all currently proposed
        implementation methods). However, after a (modified) copy is
        created, it can replace the original relatively quickly.

     b. For a VIRTUAL obliteration, it is enough to mark the
        obliteratied revisions. The server layer will then dynamically
        determine what to return over the RA layer, presenting only the
        modified version to the client.

        Given a repository with VIRTUAL obliterations, ABSOLUTE
        obliteration can be obtained through a svnsync of the
        repository (since svnsync sees the repository only through the
        RA layer). Thus, the obliteration logic would only need to be
        implemented once.

   3.2. ONLINE vs. OFFLINE vs. REPO-INVALIDATING

     a. An ONLINE implementation of obliteration operates in-place on
        the repository (as seen from clients' point of wiew), and
        allows clients to continue working on the repository while the
        operation takes place. An ONLINE implementation must be save to
        use without disabling repository access (by shutting down
        servers), and will lock the repository for writing only for
        periods comparable to a commit operation.

     b. An OFFLINE implementation of obliteration leaves a valid
        repository intact at the same location as the original one,
        and attempts to allow most clients to continue working with
        the obliterated repository after obliteration. However, the
        repository is taken offline for some period of time, and all
        unrelated repository access must be blocked by shutting down
        server processes.

     c. A REPO-INVALIDATING implementation performs obliteration only
        as it copies the repository to a completely new location. No
        special effort is made to ensure that clients can continue to
        work without a fresh checkout, and any attempts to do so are
        at the users' risk.

 4. LIMITATIONS ON THE OBLITERATION SET

   4.1. Obliteration of Deletions (is forbidden)

     As section 2.3, note e, mentions, consistency requirements for
     the repository tree place limitations on the OBLITERATION SET
     is to lead to a valid tree layout. These restrictions stem from
     the requirement that any path in the repository be contained
     within a parent directory structure (i.e. ^/path/subpath cannot
     exist in the repo unless ^/path exists as well).

     A MINIMAL restriction on the OBLITERATION SET is that

     1) A change that adds [PATH]@REVISION must never be obliterated
        unless every sub-path change, adding
        [PATH/RELATIVEPATH]@REVISION, is obliterated at the same time.

     2) A change that deletes [PATH/RELATIVEPATH]@REVISION must never
        be obliterated unless every parent-path change, deleting
        [PATH]@REVISION, is obliterated at the same time.

     However, the minimal restriction is not very intuitive, and can
     lead to unexpected results. Thus a wider restriction should be
     placed on obliteration set so that:

     1) Deletion operations can NEVER enter an obliteration set.

     2) Apart from the restriction described in 1, the inclusion of
        [PATH]@REVISION in an obliteration set ALWAYS implies the
         inclusion of all [PATH/RELATIVEPATH]@REVISION as well.

   4.2. Obliteration of the HEAD revision  (is forbidden)

     The VIRTUAL approach to obliteration is performed by marking
     all revisions in the obliteration set. The server layer then
     filters the repository to reflect the marks when nodes from
     within the obliteration set are requested, but returning original
     data when other notes are requested.

     This works well when obliterating older revisions, but not for
     obliterating from HEAD. This can be seen through an example:

     svn add foo.txt
     svn commit              [at r1]
     edit foo.txt
     svn ci                  [at r2]
     svn obl foo.txt@0:HEAD  [OBL-SET: foo.txt@1:2]
     svn up                  [at r2, repo empty (foo.txt obliterated)]
     svn add bar.txt
     svn ci                  [at r3]
     svn up                  [at r3, foo.txt and bar.txt both exist]

     At this point, foo.txt magically reappears in the repository,
     since all the data still exists, foo.txt has never been deleted,
     and foo.txt@3 is not in the obliteration set. Hence, by the
     specification, it should not be affected by the obliteration.

     Note that extending the obliteration set does not solve the issue:

     svn obl foo.txt@0:HEAD [OBL-SET: foo.txt@1:99999999]

     This would imply that foo.txt can never be added again. This is
     not acceptable either.

     This issue does not arise with absolute obliteration, since the
     data is really erased, and a fresh addition can be performed.
     However, as this example shows, obliterating from HEAD can
     lead to confusion, and increases the number of clients that
     find themselves with working copies that seem up-to-date, but
     have been rendered (partially) invalidated.

     Therefore, use-cases requiring obliteration from HEAD should
     be converted to a commit transaction, which increments the
     revision number of the HEAD revision, and brings the repository
     into the desired state, coupled with an obliterate action
     working on the set up to OLD-HEAD.

     This conversion could either be implemented by the software, or by
     the user.


 5. USE-CASES IN DETAIL

   The workflow of the obliterate solution is defined as follows:

     1. SELECT the lines of history to obliterate.
     2. DEFINE the obliteration set consistently with the goals.
     3. OBLITERATE all node@rev elements in the obliteration set.


        + In the security use case, hiding confidential information is much more
          time-critical than the final obliteration.
        + Hiding information can be done by a key user, whereas obliteration
          should be done by an administrator with direct repository access.
          Note: while there's certainly a need to have repository administration
          control without requiring shell access to a server, this need is not
          obliterate specific and as such doesn't have to be solved in the scope
          of this solution.
        + Hiding information can be seen as a dry run for final obliteration. It
          allows the key user to analyse the impact of the selected filters,
          hide extra information or recover where needed before committing to
          removing it from the repository.

        Each of these steps are detailed in the following list of functional
        requirements. We'll probably find that the differences in requirements
        needed for each use cases are mainly in step 3 and 4.

        Priorities are one of:      ( MoSCoW )
          + M - MUST have this.
          + S - SHOULD have this if at all possible.
          + C - COULD have this if it does not affect anything else.
          + W - WON'T have this time but WOULD like in the future.

     1. SELECT a modification to obliterate.

        A. Description

           Allow the user to obliterate a single modification from the
           repository. The lowest level of modification we should
           consider is the change to a file or directory committed in a
           specific revision. (Read: no need to support obliterating a
           single line in a document)

           A modification can be selected by:

           + A path name
           + A PEG revision, default is HEAD.
           + A revision (FROM revision equals TO revision)

           This requirement can be seen as a combination of:
           - SELECT a file or directory.
           - LIMIT the range to the selected revision.

        B. Main use case
           all

        C. Primary actor
           key user

        D. Priority
           M - MUST have this.

     2. SELECT a file to obliterate.

        A. Description
           Allow the user to obliterate a file from the repository. The file
           can be selected by:

           + A path name
           + A PEG revision, default is HEAD.

           If the file was copied from another file, we should have the option
           to select either:
           + the copy
           + the file's ancestor

        B. Main use case
           all

        C. Primary actor
           key user

        D. Priority
           M - MUST have this.

     3. SELECT a directory to obliterate

        A. Description
           Allow the user to obliterate a directory, including all its children,
           the whole tree. The directory can be selected by:

           + A path name
           + A PEG revision, default is HEAD.

           If the directory was copied from another directory, we should have
           the option to select either:
           + the copy
           + the directory's ancestor

           Some of the children of the directory might be 'older' than the
           directory itself. This normally happens when the directory was copied
           from another directory (branched, tagged).

        B. Main use case
           all

        C. Primary actor
           key user

        D. Priority
           M - MUST have this.

     4. SELECT all modifications in a revision to obliterate

        A. Description
           Allows the user to obliterate all modifications made in:

           + A revision (FROM revision equals TO revision)

           This is equal to:
           - SELECT the root of the repository.
           - LIMIT the range to the selected revision.

           It should be possible to choose whether or not to obliterate:
           + the log message, author and date properties
           + all other revision properties.

           Obliterating the HEAD revision can be seen as a special case of this
           requirement.

           Note: the revision number itself does not need to removed.

        B. Main use case
           all

        C. Primary actor
           key user

        D. Priority
           SHOULD have this if at all possible.

     5. SELECT multiple modifications, files or directories to obliterate

        A. Description
           Allows the user to obliterate multiple modifications, files or
           directories.

           Modifications can be selected by:
           + A list of PATH@PEGREV's + revisions

           Paths can be selected by:
           + A list of path@PEGREV's.
           + Wildcards: '*.jpg', 'build_*'

        B. Main use case
           all

        C. Primary actor
           key user

        D. Priority
           SHOULD have this if at all possible.

     6. LIMIT the range between FROM and TO revisions or dates.

        A. Description
           Both FROM and TO may be specified in the form of revisions,
           dates or keywords like HEAD.

           This is the most general case, where both FROM revision and TO
           revision can be specified.

           Depending on which SELECT option was chosen, the default LIMITs will
           be different, as detailed in this table, which defines an
           OBLITERATION SET:

           +--------------+---------------------------+---------------+
           | SELECT       | LIMIT FROM rev            | LIMIT TO rev  |
           +--------------+---------------------------+---------------+
           | modification | HEAD                      | HEAD          |
           | file         | creation rev              | HEAD          |
           | directory    | creation rev              | HEAD          |
           | \ children   | creation rev of directory | HEAD          |
           +--------------+---------------------------+---------------+

        B. Main use case
           all

        C. Primary actor
           key user

        D. Priority
           M - MUST have this.

     7. LIMIT the range between PATH CREATION and TO revisions or dates.

        A. Description
           This LIMIT can only be used when SELECTing files or directories, not
           with modifications.

           This is a special case of requirement IV.6., where the FROM revision
           is defined as the revision in which the selected file or directory
           was either:
           - created
           - copied from another file or directory

           Implementation Note: Can this be implemented through a new keyword
           PATH-CREATION and PATH-LAST-COPY revision? This doesn't need to be
           obliterate specific.

        B. Main use case
           all

        C. Primary actor
           key user

        D. Priority
           M - MUST have this.

        E. Workaround
           As it's difficult right now to make the distinction between a copy
           of a directory and a rename, and a directory might be renamed a few
           times after it was copied, we might need to use a PEG revision to
           indicate where the real directory copy revision can be found.

     9. DEFINE: Include all descendants in the obliteration of a file

        A. Description
           This is basically a greedy obliteration, where all places in the
           repository where a file or a modification to a file has propagated
           through copies or later modifications is also obliterated.

           When obliterating a file, the impact of this obliteration should be
           checked in the selected revision range in the repository. Depending
           on the type of modification, actions should be taken. When the file
           is:

           + Added: This is the creation point of the file. Remove the Add
             operation and the content and properties delta.
           + Deleted: Remove the Delete operation.
           + Replaced by TARGET: see Deleted. Will become Copy operation of the
             TARGET.
           + Copied to TARGET (or resurected): delete the Copied operation and
             drop copy-from path and rev.
             Add the TARGET file  in the selection of to be obliterated files,
             using the same limit (revision range) and impact-on-descendants
             option.
           + Moved to TARGET: delete the Copy+Delete operations and drop
             copy-from path and rev.
             Add the TARGET file in the selection of to be obliterated files,
             using the same limit (revision range) and impact-on-descendants
             option.
           + Modified: delete the Modified operation and the delta.
             Add the modification (file-revision) in the selection of to be
             obliterated modifications, using the same limit (revision range)
             and impact-on-descendants option.

           #TODO: what to do when the TO revision is older than HEAD.

        B. Main use case
           security

        C. Primary actor
           key user

        D. Priority
           M - MUST have this

     9. DEFINE: Exclude all descendants from the obliteration of a file

        A. Description
           If the obliterated information is still needed in a later revision in
           the repository, the information will be restored in that later
           revision.

           When obliterating a file, the impact of this obliteration should be
           checked in the selected revision range in the repository. Depending
           on the type of modification, actions should be taken. When the file
           is:

           + Added: This is the creation point of the file. Remove the Add
             operation and the content and properties delta.
           + Deleted: when the file is obliterated earlier, there's nothing to
             Delete anymore. Remove the Delete operation.
           + Replaced by TARGET: see Deleted. Will become Copy operation of the
             replacing file.
           + Copied to TARGET (or resurected): replace the Copy operation with
             Add (drop copy-from path and rev), find the original contents and
             properties of the file at the copy-from revision and use these for
             the new TARGET.
             #TODO: what to do when the Copy was modified in the working copy
                    before committing. #END-TODO
           + Moved to TARGET: is combination of Deleted and Copied. Will become
             Add of the TARGET with the original content and properties.
           + Modified: replace the Modified operation with Add, find the
             original content and properties of the ancestor, apply the delta to
             that content and properties and use the result to recreate the
             file.

           Note: only the first change after the obliterated revision of the
           file should be handled, except for copies of the now obliterated
           revision.

           Note: When a file change is obliterated, any file modification are
           pushed to the next unobliterated revision, to ensure that
           obliteration has no effect on elements outside the
           OBLITERATION SET

           Note: When handling a copy from an ancestor (path@PEGREV) which has
           been obliterated, the revision will be changed so that it becomes
           a copy from the most recent unobliterated ancestor.

           Example:
             r1: A  iota   "original content\n"
             r2: M  iota   "original content\nextra line\n"
             r3: M  iota   "original content\nextra line\nthird\n"
             r4: D  iota
             r5: A  cp-iota (copy from iota@2)  "original content\nextra line\n"

             Here we obliterate iota, range -r 2:2, exclude descendants.

             Result:
             r1: A  iota   "original content\n"
             r2: [obliterated]
             r3: M  iota   "original content\nextra line\nthird\n"
             r4: D  iota
             r5: A  cp-iota (copy from iota@1)  "original content\nextra line\n"

           Note: If no unobliterated ancestors exist, the revision will be
           changed so that it becomes a new addition.

           Example:
             r1: A  iota   "original content\n"
             r2: M  iota   "original content\nextra line\n"
             r3: D  iota
             r4: A  cp-iota (copy from iota@1)    "original content\n"

             Here we obliterate iota, range -r 1:1, exclude descendants.

             Result:
             So, r1 will be obliterated, r2 will be rewritten, r3 should be
             ignored. Since r4 is based on the now obliterated r1, it should be
             rewritten as 'A  cp-iota' with the content and properties of iota@1.

             r1: [obliterated]
             r2: A  iota   "original content\nextra line\n"
             r3: D  iota
             r4: A  cp-iota    "original content\n"

           Note for implementation: if at all possible, this should be
           implemented so that we don't need more copies of the information than
           before the obliteration, to avoid increasing the repository size.
           If not possible, this requirement will only make sense for files that
           have never changed or copied.

        B. Main use case
           disc space

        C. Primary actor
           key user

        D. Priority
           M - MUST have this.

     10. DEFINE: Include all descendants in the obliteration of a directory

        A. Description

        B. Main use case
           security

        C. Primary actor
           key user

        D. Priority
           M - MUST have this

     11. DEFINE: Exclude all descendants from the obliteration of a directory

        A. Description
           When obliterating a directory, the impact of this obliteration should
           be checked in the selected revision range in the repository.
           Depending on the type of modification, actions should be taken. When
           the file is:
           #TODO: add effects of directory operations

        B. Main use case
           disc space

        C. Primary actor
           key user

        D. Priority
           M - MUST have this.

     12. DEFINE: Include all descendants in the obliteration of a modification

        A. Description
           Now that Subversion 1.5 includes merge tracking we have the option to
           find out how modifications cascade through the repository with
           merge operations.

 #IMPL-DETAIL
           A merge operation is identified by a change of type Modification
           that includes a change to the svn:mergeinfo property.
 #END-OF-IMPL-DETAIL

           When obliterating a modification, the impact of this obliteration
           should be checked in the selected revision range in the repository.
           Depending on the type of modification, actions should be taken.

           When the modification is:
           + Deleted:
           + Replaced by TARGET:
           + Copied to TARGET:
           + Moved to TARGET:
           + Merged to TARGET:
           + Modified:
           + Merged to TARGET:
        #TODO: define how to select descendants.

        B. Main use case
           security

        C. Primary actor
           key user

        D. Priority
           C - COULD have this if it does not affect anything else.

     13. DEFINE: Exclude all descendants from the obliteration of a modification

        A. Description
           If the obliterated modification is still needed in a later revision
           in the repository, that modification will be made available in that
           later revision.

           When obliterating a modification, the impact of this obliteration
           should be checked in the selected revision range in the repository.
           Depending on the type of modification, actions should be taken.

           + Deleted: The inclusion of deletions in an OBLITERATION SET results
             in a number of special issues that need to be handled to ensure
             that the internal consistency of the reposotory is maintained.
             These issues are covered in detail in the section on
             "Obliterating Deletions"
           + Replaced by TARGET: Can be ignored.
           + Copied to TARGET:
           + Moved to TARGET:
           + Merged to TARGET: the modification is merged to another file. This
             action can be ignored because when merging a delta to another path,
             that delta is copied and reapplied to the new path, not relying on
             the content of the original delta.
             #TODO: is that true for both FSFS and BDB? Are we not relying on an
             implementation detail that can change in the future?
           + Modified: If the modification contains lines that were modified or
             added in the now obliterated delta, find the original content and
             properties of the ancestor, apply the delta to that content and
             properties and use the result to recreate the file.
           #TODO: define how to select descendants.

        B. Main use case
           disc space

        C. Primary actor
           key user

        D. Priority
           M - MUST have this.

     14. LOG selected modifications, files, directories and revisions

        A. Description
           This is essentially a dry run of the obliterate action. In order to
           assess the impact of the selected filters, the user wants to see the
           list of to-be obliterated paths first.

           The result should be printed to the console and contain the info as
           shown in this example:

           +-----------------------------------------------+-------------------+
           | Revision | Current action | Path              | New action        |
           |          |                |                   | after obliteration|
           +----------+----------------+-------------------+-------------------+
           | r100     | A              | /trunk/SECRET     | [obliterated]     |
           | r101     | M              | /trunk/SECRET     | [obliterated]     |
           | r105     | A+             | /branch/1.0/SECRET| A                 |
           +----------+----------------+-------------------+-------------------+

        B. Main use case
           all

        C. Primary actor
           key user

        D. Priority
           S - SHOULD have this if at all possible.


     15. HIDE selected files, directories and revisions

        A. Description
           #TODO

        B. Main use case
           all

        C. Primary actor
           key user

        D. Priority
           S - SHOULD have this if at all possible.

     16. UNHIDE selected files, directories and revisions

        A. Description
           #TODO

        B. Main use case
           all

        C. Primary actor
           key user

        D. Priority
           S - SHOULD have this if at all possible.

     17. Keep audit trail of obliterated information

        A. Description
           #TODO

        B. Main use case
           security

        C. Primary actor
           administrator

        D. Priority
           C - COULD have this if it does not affect anything else.

     18. Propagating obliteration info to working copies

        A. Description
           #TODO

        B. Main use case
           all

        C. Primary actor
           administrator

        D. Priority
           C - COULD have this if it does not affect anything else.

        E. Workaround

     19. Propagating obliteration info to mirrors

        A. Description
           #TODO

        B. Main use case
           security

        C. Primary actor
           administrator

        D. Priority
           C - COULD have this if it does not affect anything else.

        E. Workaround

 [..]

 V. Detailed non-functional requirements

     1. Authorization for hiding the information
     2. Authorization for restoring the information
     3. Authorization for obliterating information from the repository
     4. Limit repository downtime
     5. Maintain integrity of the repository
     6. Limit temporary disc space
     7. Compatibility with older Subversion clients


 [..]

 VI. Requirements vs Use Cases

     This table matches the requirements with the use cases. It tries to answer
     two specific questions:

     1. What's the value of a requirement in terms of the use cases?
     2. Which requirements do we need to implement to really solve a specific
        use case.

     +-------------------------------------------------------------------------+
     | Disable all access to confidential information in a repository  ---     |
     | Remove obsolete information from a repository            |      |   \   |
     +----------------------------------------------------------+------+-------+
     +    [ #TODO: fill in when all reqs are defined ]          |   x  |   x   |
     +                                                          |      |   x   |
     +-------------------------------------------------------------------------+


 VII. Appendix

     1. Link to external documentation

     [1] Issue 516: https://issues.apache.org/jira/browse/SVN-516
     [2] Karl Fogel's proposal to use the replay API and filters:
         http://svn.haxx.se/dev/archive-2008-04/0687.shtml
     [3] Bob Jenkins's thread about "Auditability": keep log of what has been
         obliterated:
         http://svn.haxx.se/dev/archive-2008-04/0816.shtml
     [4] Users discussing some examples of the need for obliterate:
         http://svn.haxx.se/users/archive-2005-04/0715.shtml


 [The corresponding technical specification will be put in another document]