IDEAS - subversion - Git at Google

    Oh Most High and Puissant Emacs, please be in -*- outline -*- mode!

 Some ideas for the future.
 (Please use outline-mode format for this file.)

 * svn Client Command Enhancements

 ** 'svn sync' - add and remove many files at once

   Imagine an intelligent update that knows to add new files and remove
   files that you've removed, and all recursively (without having to svn
   add, svn rm, ad nauseam).

   So, for example, you've got:

   $ ls -1
     foo.c
     bar.c

   $ touch qux
   $ touch qux/fizzle
   $ touch qux/smootch

   $svn up
     ? qux
     M foo.c

   $ rm bar.c

   *

   $ svn sync
     A qux
     A qux/fizzle
     A qux/smootch
     M foo.c
     R bar.c

   $ svn ci -m "Boy have I been busy."
     ...

   * Note, that if you were to do an update at this point, bar.c would be
     pulled from the repository

     -Fitz

 ** 'svn convert' - add versioning metadata to a local tree (JimB's idea)

   provide a svn command to add versioning metadata (SVN directories
   and whatnot) to a local directory tree that is (presumably) the
   equivalent of a working copy without the SVN directories.

   svn convert local-dir repository

   Did I get that right? -Fitz

 ** 'svn repo' - do repository operations from the client

   the ability to do repository operations from the client via the 'svn
   repo' subcommand.

 ** 'svn status="format"' - use a format string to construct 'svn status' output

   Because `svn status' seems like a very convenient vehicle for
   supplying arbitrary information that a script may wish to extract.
   What if `svn status' had a standard, default display that was
   configured with a format string.  That format string could be
   over-ridden with an option (``--format=....'') and we made an
   option alias or two for formats that were likely to be commonly
   used?  Here is an approximation that would likely be abbreviated:

      --format="%{obj-name}:  %{ver=TAG} vs. %{local-ver} is %{mod-state}\n"

   So, if TAG were associated with the 1234 version of object mumble,
   but the local copy were version 1222 but unchanged, you get:

     mumble:  1234 vs. 1222 is unchanged
   - Bruce

 ** Reserved and unreserved checkouts

   From Karl Fogel <kfogel@collab.net>

   Brian Behlendorf <brian@collab.net> writes:
   > I would also ask that reserved checkouts not be made impossible to
   > do, for the same reasons.

    Oh, not to worry -- nothing in the current design makes them
   impossible to implement later.  It's a matter of the repository
   keeping records about who has checked out what, that's all.

   To prove that it's not a problem, here's an implementation:

      1. Client A requests a reserved checkout of tree T

      2. Server sets a non-historical property on T noting that client
         has reserved it and sends T to client.

      3. Client B tries to commit to T, the commit fails because the
         repository notices the property.

      4. Client A releases the reservation.  Server removes the
         non-historical property.

      5. Client B commits and succeeds.

   With a flexible server-side hook interface, such as Subversion will
   have, you don't even need much (any?) extra code in Subversion itself
   to support this.


   From Branko Cibej <brane@xbc.nu>

   Jim Blandy wrote:
   > I think the most interesting issues here are in the human interface
   > side of things.  I designed `cvs watch' and `cvs edit', and they suck
   > rocks.  Nobody can remember how you're supposed to do things.  Perhaps
   > someone on the list could describe a reserved checkout interface they
   > enjoyed using...

   [snip]
   As for user interface thingies ... Off the top of my head I'd propose
   something like this, given the following commands: get, update, commit
   checkout, checkin:

     get:
         create working copy, making files writable or not depending
         on each file's reserved-checkout policy;

     update:
         update the WC, with same behaviour WRT file flags;

     commit:
         commit changes in the usual way, /without/ changing
         file flags or lock status;

     checkout:
         update, and for files with unreserved checkout policy,
         lock it and make it writable; With appropriate permissions,
         can override the file's checkout policy (with a flag) until
         the next checkin;

     checkin:
         commit, and if it was locked, unlock it and set the file
         flag according to the file's checkout policy (meaning that
         it might become readonly, or it might not).

   Note that when I say "file", I should in fact say "object version".
   You might want to allow unreserved checkouts on files, for instance,
   but require reserved checkouts for directories when adding new
   objects.


   Notice that we get paired commands: `update' vs. `checkout', `commit'
   vs. `checkin'. This means that `checkout' and `checkin' should behave
   like `update' and `commit', respectively, on files with unreserved
   checkout policy. Maybe we don't even need pairs of commands, just a
   flag for `update' to change the checkout policy for the current update?

   Maybe `status' should show locks on files? Perhaps another command
   would be better for that, so as not to slow down `status'. Should
   show a) locks on working-copy branch; b) locks on all branches.

   Users with appropriate privileges would be allowed to break locks.

   -- Brane Cibej <brane@xbc.nu>

 ** Handling conflicts in the client and working copy

   From Branko Cibej <brane@xbc.nu>

   Ben Collins-Sussman wrote:
   >
   > > Branko =?iso-8859-2?Q?=C8ibej?= <brane@xbc.nu> writes:
   > >
   > > I thought it should be possible for status to report that a file is
   > > currently in conflict? After an update, of course. Oh, that would be
   > > nice! Let's not have any ">>>>>>"'s in working copy files, let's not
   > > have to update to see conflicts ...
   >
   >
   > Sounds like some bells and whistles -- a good idea, but of course this
   > goes beyond what CVS does, which is all we're trying to do at this
   > moment.
   >
   > Right now, the `status' command just grabs a version number
   > from the repository.  Perhaps we can eventually pass an extra flag
   > which makes it fetch the *entire* repos file and attempt to merge it
   > in a scratch area.
   >
   > Of course, this starts to blur the line between `status' and `update'.

   Notice what I said above: "After an update, of course."

   I see it working like this: when update detects a conflict, it stores
   the current repository version of the file somewhere in the admin
   directory, probably near the stored original version. Status would
   just look if that locally stored version exists, and subsequent
   updates would pull in a new version, or remove it if the conflict goes
   away. Extra bonus: you can do a three-way merge locally.

   >   Suppose the status command tells you that your locally modified file
   > is currently in conflict.  What then?  What are you going to do about
   > it?  Do you expect to see the conflict in a scratch area?  Are you
   > going to update that one file to see the conflict?  I'm not sure I
   > understand what itch is being scratched.

   a) You don't have to do an update to check if you currently have any
   conflics; b) you can still compile in the local copy without tripping
   over those wakas; c) works well for binary files that can have special
   merge tools; d) etcetera ad nauseam.


 Here's how `update', `status' and `commit' could cooperate:

     update:
         If there is a conflict, put the text of the current repository
         version into SVN/conflicts. Could also store a diff against
         the pristine copy in SVN/text-base, instead.

         If a conflict is resolved by a later update, remove the entry
         in SVN/conflicts.

     status:
         Report `C' instead of `M' if the entry in SVN/conflicts
         exists. This operation is completely local.

     commit:
         Refuse commit if there is a conflict. Should not do only a
         local check, as the user may have resolved the conflict in
         the meantime. If successful, remove the entry in
         SVN/conflicts.

         Could have a --force flag to force the commit anyway; could
         allow or forbid --force based on user privileges or repository
         policy.


 Declaring a "pure" merge (see list archives, can't find that thread
 right now) could remove the entry in SVN/conflicts, too.

 ** Smart conflict resolution

   Certain kinds of conflicts can be resolved without human intervention. For
   example, files like /etc/passwd just need to keep lines unique by username
   and user ID.

   Right now, merging new repository data into a modified working copy of
   a passwd file can result in a textual conflict even when there's no
   "semantic conflict".  But if the Subversion client knew something
   about the format of passwd files, then it could merge without flagging
   a conflict.

   A similar rule could be used for ChangeLogs, based on the dates in the
   header lines.  And so on.

   Since all merging takes place on the client, these ``smart merges''
   should be implemented as a client-side plug-in.

 * diff Handling

 ** Visual diff similar to xxdiff

   Write platform neutral parsing of diff output to fill in enough
   data structures in order to drive UI dependant rendering of a visual
   diff very similar to xxdiff.

 ** svndiff encoders/decoders

   A shell script or elisp to encode/decode/view base64-svndiff data?
   For example: "make-svndiff.pl SRCFILE TARGETFILE > blah.svndiff"
                "read-svndiff.pl SRCFILE blah.svndiff > TARGETFILE"
                "M-x svn-decode-region" (displays the data and opcodes)
                etc
   Not crucial, just development and debugging help.

 ** Random access to delta-encoded files

 Several features collude to produce a challenge:
 - HTTP provides the ability to fetch substrings of files.
 - Subversion provides a uniform interface for reading both old and new
   versions of nodes.
 - Subversion wants to use deltas to store multiple versions of a file
   in a compact way.

 So, suppose an HTTP request comes in for bytes 1,000,000,000 through
 1,000,004,000 of a two-gigabyte file, which happens to be stored in
 the repository as the result of applying ten deltas to another
 two-gigabyte file which is stored in fulltext.  It would be nice if it
 were possible to extract the requested bytes without reconstructing
 the full text of the file.

 In other words, we'd like random access to the result of applying a
 text delta to some source string.

 I was thinking of extending the svndelta format with a hierarchical
 index, a B+tree of delta windows: after every tenth window, store a
 little table of pointers to the windows.  After every tenth little
 table, store a little table of pointers to little tables of pointers
 to windows.  After every tenth... you see what I mean.  Then you can
 start at the end of the file and find any window in logarithmic time.
 That procedure always emits balanced trees.  Use 100 instead of ten,
 and your tree has 1/100th the number of nodes as your file has
 windows, and is very shallow.

 * CVS Interaction

 ** Repository conversion

   We need to write a tool which will convert a CVS repository into an SVN
   repository, preserving all history. This is absolutely critical in
   persuading people to switch to SVN.

   This tool exists, but doesn't yet recognize CVS branches and tags.  It can
   be found in tools/cvs2svn/

 ** possible cvs-compatibility wrapper for svn

   Daniel Stenberg <daniel@haxx.se> wrote:

   ... no matter how the command line interface for SVN ends up like, it
   would still be cool to offer a 'stealth-mode' SVN client that would
   look and feel exactly (well, as similar as it can) like a cvs
   client...


   "Jonathan S. Shapiro" <shap@eros-os.org> replied:

   Wouldn't Perl be good for this?
   Why crud up the new system with compatibility code?


   Dave Glowacki <dglo@ssec.wisc.edu> further opined:

   I agree, and I also support Jonathan Shapiro's idea that a
   cvs-compatibility wrapper could be written in Perl and would simply
   map the CVS options to the Subversion options (and maybe echo the
   proper 'svn' options back so that it'd serve as a learning tool as
   well.)

   That way, Subversion can have its own options (-r instead of -R)
   and not have to worry about the CVS cruft like '-d' meaning one
   thing as a global option and another thing as a command option.

 ** Interoperate with CVS servers

 Create libsvn_ra_cvs (client mode only, maybe just :pserver: and :ext:,
 and possibly :fork: for accessing local repositories).

 This will let a Subversion client pull modules directly out of CVS
 repositories. Could be very interesting in combination with
 configuration nodes in the SVN filesystem.

 Rationale: Don't expect all the world to switch to SVN after 1.0. And
 we'll still want to pull APR into the SVN working copy, so this would
 be a direct benefit to the project (once /we/ switch).

 Question; Would it be better to pull the code out of the CVS sources,
 or write it to the CVSclient spec?

   Brane Cibej <brane@xbc.nu>

 * Server Side Plugins

 ** Annotating individual files

   We need to provide annotation of individual files (i.e. who wrote which
   line in which revision), because CVS does this reasonably well.

   The information needed to annotate files should be stored by the
   filesystem as properties.

   The ``annotate'' method itself should be implemented as a server-side
   plug-in.  This makes the output much more hackable.

 ** Server grep

   Write a server-side plug-in which can quickly search a filesystem's text
   and properties.

 ** Scripting language

   Write a server-side plug-in which provides glue between svn_main
   and libguile.so, libperl.so, or libpython.so.  The server becomes
   *very* extensible if it has an interpreted language built into it; it's
   also very nice for writing test suites!

 * Server side project policy enforcement

   From: Branko Cibej <brane@xbc.nu>
   Date: Mon, 23 Oct 2000 18:37:49 +0200

   I'd be quite happy if we could "promise" that someday it'll be possible
   to write a script on the SVN server side to, e.g., reject an export
   if an explicit tag wasn't supplied; or reject a commit if there were
   merge conflicts; etc., etc. Then all of this boils down to putting
   examples into contrib/ (once we have it).

 * Removing obsolete versions to archive

 A project repository can grow huge over several years. Would be nice
 to have a way to move obsolete data to archive or offline
 storage. Some possibilities:

   - Move just the file text offline (e.g., support "offline" file
     nodes in addition to full-text and delta storage), but keep
     history and filesystem metadata intact. We can do this with the
     current filesystem design.

   - Remove whole branches or parts of branches (this could mean whole
     nodes) from the filesystem. The current filesystem design does not
     support this; right now we could do that by copying part of the
     repository (via xml export?) to a new database and removing the
     old one, but that would change node numbers and filesystem
     revision numbers, which we probably don't want.

   Brane Cibej <brane@xbc.nu>

 * Revise the editor interface to use paths rather than components?

 Either revise the interface itself to pass around complete paths [from
 the root], or possibly have a "wrapping editor" that assembles paths
 and calls a secondary editor interface which uses the path-style
 API. The wrapping editor would have something like:

     struct dir_baton {
       svn_string_t *path;
       void *user_dir_baton;
       struct edit_baton *edit_baton;
     }

     change_dir_prop(void *baton, ...)
     {
       struct dir_baton *my_baton = baton;
       delta_fns_t editor = my_baton->edit_baton->editor;

       (*editor->change_dir_prop)(my_baton->user_dir_baton, ...);
     }

 This idea is premised on the number of editors that compose the
 components into larger paths before working on them. Depending on
 whether that is "typical", switching the interface might be
 interesting.

 * Mirroring servers

   This is like the ClearCase multisite feature.  Essentially, it is a
   redundant distributed repository. The repository exists on two or more
   cooperatively mirroring servers (each one presumably being close,
   network-wise, to its intended users). A commit from any user is visible on
   all the servers.

   The best way to implement this is by creating a ``hierarchy'' of
   Subversion servers, much like the DNS or NIS system. We can define a
   server master to contain the ``authoritative'' repository. We can then set
   up any number of slave servers to mirror the master. The slave servers
   exist primarily as local caches; it makes reads and updates faster for
   geographically disperse users. When a user wishes to commit, however, her
   delta is always sent to the master server. After the master accepts the
   change, the delta is automatically ``pushed'' out to the caching slave
   servers.

 * Inter-repository Communication

   This is one that people request a lot: the ability to commit changes
   first to a local "working repository" (not visible to the rest of the
   world), and then commit what's in the working repository to the real
   repository (with the several commits maybe being folded into one
   commit).

   Why do people want this?  It may be the psychological comfort of making
   a snapshot whenever one reaches a good stopping point, but not
   necessarily wanting all those ``comfort points'' to become
   publically-visible commits.

   The best way to implement this is to allow ``clones'' to cross between
   repositories. (See the Bubble-Up Method.)

   In other words, Joe Hacker sets up a personal Subversion repository on
   his desktop machine; he creates a local ``clone'' of a subtree from a
   public repository.  He commits to his clone (which turns it into a
   branch), and when he's done, he performs a branch-merge back into the
   public repository.

 * SQL Back-End

   The initial release of the Subversion filesystem will use Berkeley DB to
   store data on disk. However, in the future, a SQL database may be used to
   add sophisticated query support across the filesystem (e.g. "what
   revisions of what files has property @code{xyz}?").

   To support this feature, Someone could rewrite the filesystem back-end to
   speak SQL, in addition to the default Berkeley DB support.

 * Digital Signatures

   A few people have mentioned cryptographic signing of deltas.  It's a
   cool idea, and we should leave the door open for it.

 * SMTP Access

   Write a totally independent Network Layer which is an SMTP server on one
   end, and speaks to the Subversion server library on the back end. It would
   be neat to be able to "mail in" commits, or receive working-copy updates
   through e-mail.

 * split libsvn_WC into two components

   split into: one for managing the metadata (the stuff in ./SVN/), and one
   for the rest of the WC functionality (crawling and updating). This would
   provide a way to use multiple metadata storage and management options,
   possibly optimized for the platform at hand.

   Alternate storage options: separate file streams (NTFS streams, MacOS
   resource forks), a single database of data (rather than per-dir), or
   server-based storage (via DeltaV)

   Benefit of a single database: apps/systems that nuke a directory and
   rewrite it will no longer "unhook" the working copy from SVN (since
   the data is stored elsewhere). Fred Sanchez has pointed out this
   could be important for MacOS Bundles (the OS nukes and rebuilds
   subdirs, taking the CVS/ or SVN/ subdir with it)

   Benefit of a "true" database: there is a lot of effort expended in
   the current libsvn_wc to transact the changes, record logs of
   changes, and to perform them in an atomic manner. A transacted
   database on the client side could potentially simplify much of that
   work (and the corresponding I/O to track it).

 * shell-like interactive mode for client (like ftp)

   From: John P Cavanaugh <cavanaug@soco.agilent.com>
   Subject: A feature request for command line client
   To: dev@subversion.tigris.org
   Date: Tue, 17 Oct 2000 09:46:58 -0700

   I know people are beginning to work on the command line client, so I
   hope Im not too late with this request.

   Basically I would like svn to behave like ftp (actually more like lftp
   or ncftp).

   What this means is that if you type just "svn<return>" you get dropped
   into an interactive svn shell where you can type a litany of commands.
   Preferably this happens without relogging into the server etc.

   Example:

   cavanaug@lajolla ~ 1%  svn

   svn> status foo.c

     output of command

   svn> commit bar.h

     output of command

   svn> cd subdir
   svn> status README

     output of command

   svn> quit

   cavanaug@lajolla ~ 2%  svn

   Personally I think this is a really convenient way to operate
   interactively on a directory without having to continually re-run the
   svn client.

   Plus this is method is a godsend if you want to write scripts that
   automate svn. (ie. Perhaps like the cvs2svn or xxx2svn)


       John Cavanaugh                          Agilent Technologies
       R&D Program Manager                     1400 Fountaingrove Pkwy
       CAD Data Store                          Santa Rosa, CA 95403-1799

       Email: cavanaug@soco.agilent.com    Phone:  707-577-4780
                                                   707-577-3948 (Fax)

 ** Command-line editing libraries with more liberal licenses

    From: Dave Glowacki <dglo@ssec.wisc.edu>

    <snip>

    There's 'getline', which doesn't have all of readline's features,
    but which does basic editing.  ncftp and tin can be built with
    getline instead of readline, so it does a decent enough job.

    You can find it at ftp://ftp.sco.com/skunkware/src/lib/getline-39-src.tar.gz

    getline.c says:

    /*
     * Copyright (C) 1991, 1992, 1993 by Chris Thewalt (thewalt@ce.berkeley.edu)
     *
     * Permission to use, copy, modify, and distribute this software
     * for any purpose and without fee is hereby granted, provided
     * that the above copyright notices appear in all copies and that both the
     * copyright notice and this permission notice appear in supporting
     * documentation.  This software is provided "as is" without express or
     * implied warranty.
     *
    <names snipped>


 ** Useful for integration with Emacs, etc.

    When integrating Subversion support into Emacs vc-mode, using the
    interactive mode wold avoid having to start a new process for each
    operation. This wold be useful even without readline or getline.

    -- Brane Cibej <brane@xbc.nu>