| |
| The Subversion Project: Building a Better CVS |
| ============================================== |
| |
| Ben Collins-Sussman <sussman@collab.net> |
| |
| Written in August 2001 |
| Published in Linux Journal, January 2002 |
| |
| Abstract |
| -------- |
| |
| This article discusses the history, goals, features and design of |
| Subversion (http://subversion.tigris.org), an open-source project that |
| aims to produce a compelling replacement for CVS. |
| |
| |
| Introduction |
| ------------ |
| |
| If you work on any kind of open-source project, you've probably worked |
| with CVS. You probably remember the first time you learned to do an |
| anonymous checkout of a source tree over the net -- or your first |
| commit, or learning how to look at CVS diffs. And then the fateful |
| day came: you asked your friend how to rename a file. |
| |
| "You can't", was the reply. |
| |
| What? What do you mean? |
| |
| "Well, you can delete the file from the repository and then re-add it |
| under a new name." |
| |
| Yes, but then nobody would know it had been renamed... |
| |
| "Let's call the CVS administrator. She can hand-edit the repository's |
| RCS files for us and possibly make things work." |
| |
| What? |
| |
| "And by the way, don't try to delete a directory either." |
| |
| You rolled your eyes and groaned. How could such simple tasks be |
| difficult? |
| |
| |
| The Legacy of CVS |
| ----------------- |
| |
| No doubt about it, CVS has evolved into the standard Software |
| Configuration Management (SCM) system of the open source community. |
| And rightly so! CVS itself is Free software, and its wonderful "non |
| locking" development model -- whereby dozens of far-flung programmers |
| collaborate -- fits the open-source world very well. In fact, one |
| might argue that without CVS, it's doubtful whether sites like |
| Freshmeat or Sourceforge would ever have flourished as they do now. |
| CVS and its semi-chaotic development model have become an essential |
| part of open source culture. |
| |
| So what's wrong with CVS? |
| |
| Because it uses the RCS storage-system under the hood, CVS can only |
| track file contents, not tree structures. As a result, the user has |
| no way to copy, move, or rename items without losing history. Tree |
| rearrangements are always ugly server-side tweaks. |
| |
| The RCS back-end cannot store binary files efficiently, and branching |
| and tagging operations can grow to be very slow. CVS also uses the |
| network inefficiently; many users are annoyed by long waits, because |
| file differeces are sent in only one direction (from server to client, |
| but not from client to server), and binary files are always |
| transmitted in their entirety. |
| |
| From a developer's standpoint, the CVS codebase is the result of |
| layers upon layers of historical "hacks". (Remember that CVS began |
| life as a collection of shell-scripts to drive RCS.) This makes the |
| code difficult to understand, maintain, or extend. For example: CVS's |
| networking ability was essentially "stapled on". It was never |
| designed to be a native client-server system. |
| |
| Rectifying CVS's problems is a huge task -- and we've only listed just |
| a few of the many common complaints here. |
| |
| |
| Enter Subversion |
| ---------------- |
| |
| In 1995, Karl Fogel and Jim Blandy founded Cyclic Software, a company |
| for commercially supporting and improving CVS. Cyclic made the first |
| public release of a network-enabled CVS (contributed by Cygnus |
| software.) In 1999, Karl Fogel published a book about CVS and the |
| open-source development model it enables (cvsbook.red-bean.com). Karl |
| and Jim had long talked about writing a replacement for CVS; Jim had |
| even drafted a new, theoretical repository design. Finally, in |
| February of 2000, Brian Behlendorf of CollabNet (www.collab.net) |
| offered Karl a full-time job to write a CVS replacement. Karl |
| gathered a team together and work began in May. |
| |
| The team settled on a few simple goals: it was decided that Subversion |
| would be designed as a functional replacement for CVS. It would do |
| everything that CVS does -- preserving the same development model |
| while fixing the flaws in CVS's (lack-of) design. Existing CVS users |
| would be the target audience: any CVS user should be able to start |
| using Subversion with little effort. Any other SCM "bonus features" |
| were decided to be of secondary importance (at least before a 1.0 |
| release.) |
| |
| At the time of writing, the original team has been coding for a little |
| over a year, and we have a number of excellent volunteer contributors. |
| (Subversion, like CVS, is a open-source project!) |
| |
| |
| Subversion's Features |
| ---------------------- |
| |
| Here's a quick run-down of some of the reasons you should be excited |
| about Subversion: |
| |
| * Real copies and renames. The Subversion repository doesn't use |
| RCS files at all; instead, it implements a 'virtual' versioned |
| filesystem that tracks tree-structures over time (described |
| below). Files *and* directories are versioned. At last, there |
| are real client-side `mv' and `cp' commands that behave just as |
| you think. |
| |
| * Atomic commits. A commit either goes into the repository |
| completely, or not all. |
| |
| * Advanced network layer. The Subversion network server is Apache, |
| and client and server speak WebDAV(2) to one another. (See the |
| 'design' section below.) |
| |
| * Faster network access. A binary diffing algorithm is used to |
| store and transmit deltas in *both* directions, regardless of |
| whether a file is of text or binary type. |
| |
| * Filesystem "properties". Each file or directory has an invisible |
| hashtable attached. You can invent and store any arbitrary |
| key/value pairs you wish: owner, perms, icons, app-creator, |
| mime-type, personal notes, etc. This is a general-purpose feature |
| for users. Properties are versioned, just like file contents. |
| And some properties are auto-detected, like the mime-type of a |
| file (no more remembering to use the '-kb' switch!) |
| |
| * Extensible and hackable. Subversion has no historical baggage; it |
| was designed and then implemented as a collection of shared C |
| libraries with well-defined APIs. This makes Subversion extremely |
| maintainable and usable by other applications and languages. |
| |
| * Easy migration. The Subversion command-line client is very |
| similar to CVS; the development model is the same, so CVS users |
| should have little trouble making the switch. Development of a |
| 'cvs2svn' repository converter is in progress. |
| |
| * It's Free. Subversion is released under a Apache/BSD-style |
| open-source license. |
| |
| |
| Subversion's Design |
| ------------------- |
| |
| Subversion has a modular design; it's implemented as a collection of C |
| libraries. Each layer has a well-defined purpose and interface. In |
| general, code flow begins at the top of the diagram and flows |
| "downward" -- each layer provides an interface to the layer above it. |
| |
| <<insert diagram here: svn.tiff>> |
| |
| |
| Let's take a short tour of these layers, starting at the bottom. |
| |
| |
| --> The Subversion filesystem. |
| |
| The Subversion Filesystem is *not* a kernel-level filesystem that one |
| would install in an operating system (like the Linux ext2 fs.) |
| Instead, it refers to the design of Subversion's repository. The |
| repository is built on top of a database -- currently Berkeley DB -- |
| and thus is a collection of .db files. However, a library accesses |
| these files and exports a C API that simulates a filesystem -- |
| specifically, a "versioned" filesystem. |
| |
| This means that writing a program to access the repository is like |
| writing against other filesystem APIs: you can open files and |
| directories for reading and writing as usual. The main difference is |
| that this particular filesystem never loses data when written to; old |
| versions of files and directories are always saved as historical |
| artifacts. |
| |
| Whereas CVS's backend (RCS) stores revision numbers on a per-file |
| basis, Subversion numbers entire trees. Each atomic 'commit' to the |
| repository creates a completely new filesystem tree, and is |
| individually labeled with a single, global revision number. Files and |
| directories which have changed are rewritten (and older versions are |
| backed up and stored as differences against the latest version), while |
| unchanged entries are pointed to via a shared-storage mechanism. This |
| is how the repository is able to version tree structures, not just |
| file contents. |
| |
| Finally, it should be mentioned that using a database like Berkeley DB |
| immediately provides other nice features that Subversion needs: data |
| integrity, atomic writes, recoverability, and hot backups. (See |
| www.sleepycat.com for more information.) |
| |
| |
| --> The network layer. |
| |
| Subversion has the mark of Apache all over it. At its very core, the |
| client uses the Apache Portable Runtime (APR) library. (In fact, this |
| means that Subversion client should compile and run anywhere Apache |
| httpd does -- right now, this list includes all flavors of Unix, |
| Win32, BeOS, OS/2, Mac OS X, and possibly Netware.) |
| |
| However, Subversion depends on more than just APR -- the Subversion |
| "server" is Apache httpd itself. |
| |
| Why was Apache chosen? Ultimately, the decision was about not |
| reinventing the wheel. Apache is a time-tested, open-source server |
| process that ready for serious use, yet is still extensible. It can |
| sustain a high network load. It runs on many platforms and can |
| operate through firewalls. It's able to use a number of different |
| authentication protocols. It can do network pipelining and caching. |
| By using Apache as a server, Subversion gets all these features for |
| free. Why start from scratch? |
| |
| Subversion uses WebDAV as its network protocol. DAV (Distributed |
| Authoring and Versioning) is a whole discussion in itself (see |
| www.webdav.org) -- but in short, it's an extension to HTTP that allows |
| reads/writes and "versioning" of files over the web. The Subversion |
| project is hoping to ride a slowly rising tide of support for this |
| protocol: all of the latest file-browsers for Win32, MacOS, and GNOME |
| speak this protocol already. Interoperability will (hopefully) become |
| more and more of a bonus over time. |
| |
| For users who simply wish to access Subversion repositories on local |
| disk, the client can do this too; no network is required. The |
| "Repository Access" layer (RA) is an abstract API implemented by both |
| the DAV and local-access RA libraries. This is a specific benefit of |
| writing a "librarized" version control system; it's a big win over |
| CVS, which has two very different, difficult-to-maintain codepaths for |
| local vs. network repository-access. Feel like writing a new network |
| protocol for Subversion? Just write a new library that implements the |
| RA API! |
| |
| |
| --> The client libraries. |
| |
| On the client side, the Subversion "working copy" library maintains |
| administrative information within special SVN/ subdirectories, similar |
| in purpose to the CVS/ administrative directories found in CVS working |
| copies. |
| |
| A glance inside the typical SVN/ directory turns up a bit more than |
| usual, however. The `entries' file contains XML which describes the |
| current state of the working copy directory (and which basically |
| serves the purposes of CVS's Entries, Root, and Repository files |
| combined). But other items present (and not found in CVS/) include |
| storage locations for the versioned "properties" (the metadata |
| mentioned in 'Subversion Features' above) and private caches of |
| pristine versions of each file. This latter feature provides the |
| ability to report local modifications -- and do reversions -- |
| *without* network access. Authentication data is also stored within |
| SVN/, rather than in a single .cvspass-like file. |
| |
| The Subversion "client" library has the broadest responsibility; its |
| job is to mingle the functionality of the working-copy library with |
| that of the repository-access library, and then to provide a |
| highest-level API to any application that wishes to perform general |
| version control actions. |
| |
| For example: the C routine `svn_client_checkout()' takes a URL as an |
| argument. It passes this URL to the repository-access library and |
| opens an authenticated session with a particular repository. It then |
| asks the repository for a certain tree, and sends this tree into the |
| working-copy library, which then writes a full working copy to disk |
| (SVN/ directories and all.) |
| |
| The client library is designed to be used by any application. While |
| the Subversion source code includes a standard command-line client, it |
| should be very easy to write any number of GUI clients on top of the |
| client library. Hopefully, these GUIs should someday prove to be much |
| better than the current crop of CVS GUI applications (the majority of |
| which are no more than fragile "wrappers" around the CVS command-line |
| client.) |
| |
| In addition, proper SWIG bindings (www.swig.org) should make |
| the Subversion API available to any number of languages: java, perl, |
| python, guile, and so on. In order to Subvert CVS, it helps to be |
| ubiquitous! |
| |
| |
| Subversion's Future |
| ------------------- |
| |
| The release of Subversion 1.0 is currently planned for early 2002. |
| After the release of 1.0, Subversion is slated for additions such as |
| i18n support, "intelligent" merging, better "changeset" manipulation, |
| client-side plugins, and improved features for server administration. |
| (Also on the wishlist is an eclectic collection of ideas, such as |
| distributed, replicating repositories.) |
| |
| A final thought from Subversion's FAQ: |
| |
| "We aren't (yet) attempting to break new ground in SCM systems, nor |
| are we attempting to imitate all the best features of every SCM |
| system out there. We're trying to replace CVS." |
| |
| If, in three years, Subversion is widely presumed to be the "standard" |
| SCM system in the open-source community, then the project will have |
| succeeded. But the future is still hazy: ultimately, Subversion |
| will have to win this position on its own technical merits. |
| |
| Patches are welcome. |
| |
| |
| For More Information |
| -------------------- |
| |
| Please visit the Subversion project website at |
| http://subversion.tigris.org. There are discussion lists to join, and |
| the source code is available via anonymous CVS -- and soon through |
| Subversion itself. |
| |