blob: f327fa2435b3207e94967fff2632c4a73f2e24b4 [file] [log] [blame]
******************************************************************************
REQUIREMENTS SPECIFICATION
FOR
ISSUE #516: OBLITERATE
******************************************************************************
TABLE OF CONTENTS
OPEN ISSUES
1. INTRODUCTION
1.1 Sources of Requirements
2. USER STORIES
2.1 Added secrets in a new file
2.2 Added secrets into an existing file
2.3 Added a single huge file by accident
2.4 Repeated modification of a huge file
3. REQUIREMENTS
3.1 Levels of Obliteration
3.2 Content of the Modified Repository
3.3 Working Copies
3.4 Access to the Modified Repository
3.5 Audit Trail
3.6 Svnsync Mirrors
3.7 Permissions
3.8 Time Taken
OPEN ISSUES
(none)
1. INTRODUCTION
This document captures the requirements for the Subversion feature commonly
known as "Obliterate". It is intended to include all of the requirements
that could be deemed to fall within the scope of an Obliterate feature. The
set of requirements to be satisfied by a proposed development of such a
feature may be a specified sub-set of those listed here.
The purpose of this document is to enable a design to be evaluated and an
implementation to be tested against specific criteria that are all written
down in one place.
Section 2 lists requirements from a user's point of view.
Section 3 lists requirements from a software design point of view.
1.1 Sources of Requirements
The requirements are sourced from:
* Comments in issue #516.
* Comments on the Subversion developers' mailing list.
* Personal experience of the authors.
2. USER STORIES
The "user stories" are examples, described from a user's point of view, of
scenarios in which the Obliterate feature should or might be used. Their
purpose is to indicate the range and diversity of requirements, without
being an exhaustive list of combinations. They loosely define the high-level
requirements which the specific requirements in section 3 must satisfy.
The following user stories are gathered from the sources in section 1 and
include both typical and unusual use cases.
2.1 Added secrets in a new file
User U1 has just accidentally committed the addition of a new file F1 that
contains confidential data (let's say people's addresses). F1 is visible
to other users of the repository. The probability of anyone committing
another change before the administrator can intervene is low. The
probability of anyone updating their WC to this revision is low.
U1 wants to restrict the visibility and propagation of the confidential
data as soon as possible.
Possible solutions:
* hide the existence of F1
* replace the content of F1 with empty content
* replace the content of F1 with its "previous" content (definition
required)
* replace the content of F1 with arbitrary other content
* roll back the entire head revision (definition required)
* something else.
2.2 Added secrets into an existing file
User U1 has just accidentally committed a change that adds confidential
data (let's say people's addresses) into an existing file F1. F1 is
visible to other users of the repository. The existence and other content
of F1 is important to other users.
U1 wants to restrict the visibility and propagation of the confidential
data as soon as possible.
2.3 Added a single huge file by accident
User U1 has just accidentally committed the addition of a new file F1 that
is huge and unwanted, with no other changes included in the commit.
U1 wants to get rid of the file in order to save space and time on
colleagues' WC updates.
2.4 Repeated modification of a huge file
User U1 keeps checking in the latest version of a huge file F1, in order
to have them handy for testing. Nobody needs versions of F1 older than 2
weeks; they can be re-generated from source if required. F1 is usually
checked in alongside some modifications to source files.
U1 wants to prune old versions of F1 regularly in order to limit server
disk space usage.
This use case is not directly what most people consider to be
"obliterate". It is really a separate feature that could use the
functionality of "obliterate" in its implementation, but could also be
implemented in other ways.
3. REQUIREMENTS
The requirements listed here are a set of design requirements that together
would satisfy all of the user-level requirements. A successful design will
satify most of these requirements to a large extent, but need not satisfy
all of them completely. A functional design document should specify which
of these requirements it satisfies, and to what extent.
Each requirement can be designated for convenience as "functional" or
"non-functional". A functional requirement specifies what output is produced
from what input, where input and output include such things as repositories,
working copies and audit trails. A non-functional requirement is a
constraint on how the functional operation is performed, such as speed of
operation or memory usage.
3.1 Levels of Obliteration
The requirements involve the following "levels" of obliteration:
L1: hiding data from clients
(a) avoiding sending the data in any new communications
(b) removing data from repository mirrors that already have it
(c) removing data from clients that already have it
L2: hiding data from people with direct access to the server disk
L3: recovering space on the server disk
NOTES:
L1 and L3 are directly relevant to the common use cases. Requirements
for L2 are coneivable but appear not to be common.
3.2 Content of the Modified Repository
* At revisions older than the obliteration, the repository should yield
exactly the same data that it used to.
RATIONALE: A Subversion repository has no forward-looking metadata so
there is no reason for old revisions to be changed so they should not be
changed.
EXCEPTIONS: Any manual adjustments to revision properties, such as to
forward-looking comments in log messages or to third-party data in
revision-0 properties.
* At the revision of the obliterated data, the stored tree should be
modified in a way to be specified in a Functional Spec. Briefly, two
likely schemes are:
(scheme "dd") each node to be obliterated is deleted; or
(scheme "cc") each node to be obliterated becomes exactly like it was
in the previous revision.
* At each revision younger than the obliteration, the repository file
system tree structure and content should look exactly as it used to.
However, any node with a "copied from" pointer that pointed to a node
which has been removed by obliteration should have this pointer adjusted
or removed, as defined by the Functional Spec.
NOTES:
This description assumes per-revision granularity of obliteration.
3.3 Working Copies
* A WC managed by an obliterate-aware Subversion client and logically
unaffected should show no sign that anything has happened.
* A WC managed by an obliterate-aware Subversion client and logically
affected by the change should behave in a friendly manner ...
* A WC managed by an old (pre-obliterate) Subversion client and logically
unaffected should show little or no sign that anything has happened, and
should require no user intervention to continue working.
* A WC managed by an old (pre-obliterate) Subversion client and logically
affected by the change should ...
3.4 Access to the Modified Repository
* The modified repository should keep the same URL and UUID, and client
access should continue without manual intervention, after any required
down-time, for all working copies that are not logically affected by the
obliteration.
Rationale: Obliteration is often required in large repositories having
large numbers of users, most of whom are not working near the
obliterated data. If all users were impacted each time, then
obliteration could become impractical.
3.5 Audit Trail
* On the client side, no trace of the obliteration need be visible other
than the intended changes to versioned data and to revision properties.
* On the server side, the administrator should be able to choose whether a
record of obliterations is stored. The form and storage location of this
record is not specified here.
NOTES:
Some customers are concerned about auditability and may want an audit
trail to be stored with the repository so that it is included in backups
and perpetually available for later examination.
3.6 Svnsync Mirrors
* A read-only mirror of the repository maintained by an old
(pre-obliterate) version of "svnsync" should either keep all of its
already-copied revisions exactly as they were, and continue to copy new
revisions from the modified repository without any hiccup, or it should
stop working so that its administrator has to intervene.
Rationale: An old svnsync has no way to re-synchronize old revisions. If
it behaves just like a regular client that had been taking snap-shots of
the master repository, that would be logical and self-consistent but not
propagating the obliteration; that's a problem for the secrecy use
cases. If it requires human intervention, that would disrupt its users
but would force a human to consider whether the mirrored data should be
kept or modified. Ideally the administrator of the master repository
would control which of these scenarios will occur.
* A read-only mirror of the repository maintained by an obliterate-aware
version of "svnsync" should re-synchronize its old revisions to match
the modified master repository.
3.7 Permissions
* The data-hiding part of an obliterate should be available to a user
with suitable permissions, from the client side, using a standard
Subversion client installation.
* The space-saving part of an obliterate should be available to an
administrator, from the server side, using a standard Subversion server
installation. This may also be available in the same way as the
data-hiding part.
3.8 Time Taken
* The time from when an administrator discovers an accidental secrecy
problem to when the data in question is unavailable to ordinary clients
(that don't already have it) should be within minutes, or at most hours,
on a large repository.
* The time from when an administrator discovers an accidental large
check-in until the data can be removed from the repository should be at
most hours, on a large repository. (The intent here is that an
administrator should be able to avoid the data getting into a nightly
back-up, if desired.)