blob: 76679dc61bd8ada24dbc1f85671812c2007c4a6d [file] [log] [blame]
-*-text-*-
If you are contributing code to the Subversion project, please read
this first.
============================
HACKER'S GUIDE TO SUBVERSION
============================
$LastChangedDate$
TABLE OF CONTENTS
* Participating in the community
* Theory and documentation
* What to read
* Directory layout
* Coding style
* Secure coding guidelines
* Document everything
* Using page breaks
* Other conventions
* APR pool usage conventions
* APR status codes
* Automated tests
* Writing test cases before code
* Debugging the server
* Tracking down memory leaks
* Writing log messages
* Patch submission guidelines
* Filing bugs / issues
* Commit access
* The configuration/build system under unix
* How to release a distribution tarball
* Release numbering
* Compatibility
Participating in the community
==============================
Although Subversion is originally sponsored and hosted by CollabNet
(http://www.collab.net), it's a true open-source project under a
BSD-style license. A number of developers work for CollabNet, some
work for other large companies (such as RedHat), and many others are
simply excellent volunteers who are interested in building a better
version control system.
The community exists mainly through mailing lists and a Subversion
repository. To participate:
Go to http://subversion.tigris.org/ and
* Join the "dev", "svn", and "announce" mailing lists.
The dev list, dev@subversion.tigris.org, is where almost all
discussion takes place. All questions should go there, though
you might want to check the list archives first. The "svn"
list receives automated commit emails.
* Print out and digest the Subversion design specification.
This document can be found in the "Documents & files" area of the
website in the Documentation/Design folder. A PDF and an HTML
version are available on the web. The design spec will give you
a theoretical overview of Subversion's design.
There are many ways to join the project, either by writing code, or by
testing and/or helping to manage the bug database. If you'd like to
contribute, then look at:
* The bugs/issues database
http://subversion.tigris.org/project_issues.html
* The bite-sized tasks page
http://subversion.tigris.org/project_tasks.html
To submit code, simply send your patches to dev@subversion.tigris.org.
No, wait, first read the rest of this file, _then_ start sending
patches to dev@subversion.tigris.org. :-)
To help manage the issues database, read over the issue summaries,
looking and testing for issues that are either invalid, or are
duplicates of other issues. Both kinds are very common, the first
because bugs often get unknowingly fixed as side effects of other
changes in the code, and the second because people sometimes file an
issue without noticing that it has already been reported. If you are
not sure about an issue, post a question to dev@subversion.tigris.org.
("Subversion: We're here to help you help us!")
Another way to help is to set up automated builds and test suite runs
of Subversion on some platform, and have the output sent to the
svn-breakage@subversion.tigris.org mailing list. See more details at
http://subversion.tigris.org/servlets/ProjectMailingListList, in the
description for the svn-breakage list.
Theory and documentation
========================
1. DESIGN
A design spec was written in June 2000, and is a bit out of
date. But it still gives a good theoretical introduction to the
inner workings of the repository, and to Subversion's various
layers. The Texinfo source is in 'doc/programmer/design/', or
look at one of the versions posted on the Subversion website.
2. WEBDAV
Greg Stein has written an introduction to Subversion's network
protocol, which is an extended dialect of HTTP. The document is
'www/webdav-usage.html', and is also posted on the website in
the "File sharing" section.
3. USER MANUAL
Subversion: The Definitive Guide is an upcoming title to be
published by O'Reilly that shows in detail how to effectively
use Subversion. This book is still being written, so the
contents are still in flux. You may find it in the
Documentation/Book folder on the "File sharing" section of the
Subversion website. The XML source is in 'doc/book/book'.
4. SYSTEM NOTES
A lot of the design ideas for particular aspects of the system
have been documented in individual files in the 'notes'
directory.
What to read
============
Before you can contribute code, you'll need to familiarize yourself
with the existing code base and interfaces.
Check out a copy of Subversion (anonymously, if you don't yet have an
account with commit-access) -- so you can look at the code base.
Within 'subversion/include/' are a bunch of header files with huge
doc comments. If you read through these, you'll have a pretty good
understanding of the implementation details. Here's a suggested
perusal order:
* the basic building blocks: svn_string.h, svn_error.h, svn_types.h
* useful utilities: svn_io.h, svn_path.h, svn_hash.h, svn_xml.h
* the critical interface: svn_delta.h
* client-side interfaces: svn_ra.h, svn_wc.h, svn_client.h
* the repository and versioned filesystem: svn_repos.h, svn_fs.h
Subversion tries to stay portable by using only ANSI/ISO C and by
using the Apache Portable Runtime (APR) library. APR is the
portability layer used by the Apache httpd server, and more
information can be found at http://apr.apache.org.
Because Subversion depends so heavily on APR, it may be hard to
understand Subversion without first glancing over certain header files
in APR (look in 'apr/include/'):
* memory pools: apr_pools.h
* filesystem access: apr_file_io.h
* hashes and arrays: apr_hash.h, apr_tables.h
Subversion also tries to deliver reliable and secure software. This can
only be achieved by educating the developers on secure programming in
the C programming language. Please see 'notes/assurance.txt' for the
full rationale behind this. Specifically, you should make it a point to
carefully read David Wheeler's Secure Programming (as mentioned in
'notes/assurance.txt'). If at any point you have questions about the
security implications of a change, you are urged to ask for review on
the developer mailing list.
Directory layout
================
A rough guide to the source tree:
doc/
User and Developer documentation.
www/
Subversion web pages (live content at http://subversion.tigris.org/).
tools/
Stuff that works with Subversion, but that Subversion doesn't depend on.
packages/
Stuff to help packaging systems, like rpm and dpkg.
subversion/
Source code to subversion itself (as opposed to external libraries).
subversion/include/
Public header files for users of subversion libraries.
subversion/libsvn_fs/
The versioning "filesystem" API.
subversion/libsvn_repos/
Repository functionality built around the `libsvn_fs' core.
subversion/libsvn_delta/
Common code for tree deltas, text deltas, and property deltas.
subversion/libsvn_wc/
Common code for working copies.
subversion/libsvn_ra/
Common code for repository access.
subversion/libsvn_client/
Common code for client operations.
subversion/clients/cmdline/
The command line client.
subversion/tests/
Automated test suite.
apr/
Apache Portable Runtime library.
*Note*: This is not in the same repository as Subversion. Read
INSTALL for instructions on how to get it if you don't
already have it.
neon/
Neon library from Joe Orton.
*Note*: This is not in the same repository as Subversion. Read
INSTALL for instructions on how to get it if you don't
already have it.
Coding style
============
To understand how things work, read doc/svn-design.{texi,info,ps,pdf},
and read the header files, which tend to have thoroughly-commented
data structures.
We're using ANSI C, and following the GNU coding standards. Emacs
users can just load svn-dev.el to get the right indentation behavior
(most source files here will load it automatically, if
`enable-local-eval' is set appropriately).
Read http://www.gnu.org/prep/standards.html for a full description of
the GNU coding standards; but here is a short example demonstrating
the most important formatting guidelines:
char * /* func type on own line */
argblarg (char *arg1, int arg2) /* func name on own line */
{ /* first brace on own line */
if ((some_very_long_condition && arg2) /* indent 2 cols */
|| remaining_condition) /* new line before operator */
{ /* brace on own line, indent 2 */
arg1 = some_func (arg1, arg2); /* space before opening paren */
} /* close brace on own line */
else
{
do /* format do-while like this */
{
arg1 = another_func (arg1);
}
while (*arg1);
}
}
In general, be generous with parentheses even when you're sure about
the operator precedence, and be willing to add spaces and newlines to
avoid "code crunch". Don't worry too much about vertical density;
it's more important to make code readable than to fit that extra line
on the screen.
The controversial GNU convention of putting a space between a function
name and its opening parenthesis is optional in Subversion. If you're
editing an area of code that already seems to have a consistent
preference about this, then just stick with that; otherwise, pick
whichever way you like.
Secure coding guidelines
========================
Just like almost any other programming language, C has undesirable
features which enables an attacker to make your program fail in
predictable ways, often to the attacker's benefit. The goal of these
guidelines is to make you aware of the pitfalls of C as they apply to
the Subversion project. You are encouraged to keep these pitfalls in
mind when reviewing code of your peers, as even the most skilled and
paranoid programmers make occasional mistakes.
Input validation is the act of defining legal input and rejecting
everything else. The code must perform input validation on all untrusted
input.
Security boundaries
A security boundary in the Subversion server code must be identified as
such as this enables auditors to quickly determine the quality of the
boundary. Security boundaries exist where the running code has access
to information the user does not or where the code runs with privileges
above those of the user making the request. Typical examples of such is
code that does access control or an application with the SUID bit set.
Functions which make calls to a security boundary must include
validation checks of the arguments passed. Functions which themselves
are security boundaries should audit the input received and alarm when
invoked with improper values.
[### todo: need some examples from Subversion here...]
String operations
Use only the string functions provided in apr_strings.h. These are
replacements for standard C library string functions, except safer
because they do bounds-checking and dest allocation automatically.
Although there may be circumstances where it's theoretically safe to
use plain C string functions (such as when you already know the
lengths of the source and dest), please use the APR functions anyway,
so the code is less brittle and more reviewable.
Password storage
Help users keep their passwords secret: When the client reads or writes
password locally, it should ensure that the file is mode 0600. If the
file is readable by other users, the client should exit with a message
that tells the user to change the filemode due to the risk of exposure.
Document everything
===================
Every function, whether public or internal, must start out with a
documentation comment that describes what the function does. The
documentation should mention every parameter received by the function,
every possible return value, and (if not obvious) the conditions under
which the function could return an error. Put the parameter names in
upper case in the doc string, even when they are not upper case in the
actual declaration, so that they stand out to human readers.
For public or semi-public API functions, the doc string should go
above the function in the .h file where it is declared; otherwise, it
goes above the function definition in the .c file.
For structure types, document each individual member of the structure
as well as the structure itself.
Use the doxygen (www.doxygen.org) format for interface documentation.
There is still some legacy documentation in Subversion that uses a
different style, from before we adopted doxygen, but it will
eventually be converted. New documentation should start out in
doxygen format.
Read over the Subversion code to get an overview of how documentation
looks in practice; in particular, see subversion/include/*.h for
doxygen examples.
Using page breaks
=================
We're using page breaks (the Ctrl-L character, ASCII 12) for section
boundaries in both code and plaintext prose files. This file is a
good example of how it's done: each section starts with a page break,
and immediately after the page break comes the title of the section.
This helps out people who use the Emacs page commands, such as
`pages-directory' and `narrow-to-page'. Such people are not as scarce
as you might think, and if you'd like to become one of them, then type
C-x C-p C-h in Emacs sometime.
Other conventions
=================
In addition to the GNU standards, Subversion uses these conventions:
* Use only spaces for indenting code, never tabs. Tab display
width is not standardized enough, and anyway it's easier to
manually adjust indentation that uses spaces.
* Stay within 80 columns, the width of a minimal standard display
window.
* Signify internal variables by two underscores after the prefix.
That is, when a symbol must (for technical reasons) reside in the
global namespace despite not being part of a published interface,
then use two underscores following the module prefix. For
example:
svn_fs_get_rev_prop () /* Part of published API. */
svn_fs__parse_props () /* For internal use only. */
* In text strings that might be printed out (or otherwise made
available) to users, use only forward quotes around paths and
other quotable things. For example:
$ svn revert foo
svn: warning: svn_wc_is_wc_root: 'foo' is not a versioned resource
$
There used to be a lot of strings that used a backtick for the
first quote (`foo' instead of 'foo'), but that looked bad in
some fonts, and also messed up some people's auto-highlighting,
so we settled on the convention of always using forward quotes.
* If you use Emacs, put something like this in your .emacs file,
so you get svn-dev.el and svnbook.el when needed:
;;; Begin Subversion development section
(defun my-find-file-hook ()
(if (string-match "projects/svn/" buffer-file-name)
(load "~/projects/svn/tools/dev/svn-dev"))
(if (string-match "projects/svn/doc/book" buffer-file-name)
(load "~/projects/svn/doc/book/tools/svnbook")))
(add-hook 'find-file-hooks 'my-find-file-hook)
;;; End Subversion development section
You'll need to customize the paths for your setup, of course.
You can also make the regexp to string-match more selective; for
example, one developer says:
> Here's the regexp I'm using:
>
> "src/svn/[^/]*/\\(subversion\\|tools\\|build\\)/"
>
> Two things to notice there: (1) I sometimes have several
> working copies checked out under ...src/svn, and I want the
> regexp to match all of them; (2) I want the hook to catch only
> in "our" directories within the working copy, so I match
> "subversion", "tools" and "build" explicitly; I don't want to
> use GNU style in the APR that's checked out into my repo. :-)
* We have a tradition of not marking files with the names of
individual authors (i.e., we don't put lines like "Author: foo"
or "@author foo" in a special position at the top of a source
file). This is to discourage territoriality -- even when a file
has only one author, we want to make sure others feel free to
make changes. People might be unnecessarily hesitant if someone
appears to have staked ownership on the file.
* There are many other unspoken conventions maintained throughout
the code, that are only noticed when someone unintentionally
fails to follow them. Just try to have a sensitive eye for the
way things are done, and when in doubt, ask.
APR pool usage conventions
==========================
(This assumes you already basically understand how APR pools work; see
apr_pools.h for details.)
Applications using the Subversion libraries must call apr_initialize()
before calling any Subversion functions.
Subversion's general pool usage strategy can be summed up in two
principles:
1. The call level that created a pool is the only place to clear or
destroy that pool.
2. When iterating an unbounded number of times, create a subpool
before entering the iteration, use it inside the loop and clear
it at the start of each iteration, then destroy it after the loop
is done, like so:
subpool = svn_pool_create(pool);
for (i = 0; i < n; ++i)
{
svn_pool_clear(subpool);
do_operation(..., subpool);
}
svn_pool_destroy(subpool);
Except for some legacy code, which was written before these principles
were fully understood, virtually all pool usage in Subversion follows
the above guidelines.
One such legacy pattern is a tendency to allocate an object inside a
pool, store the pool in the object, and then free that pool (either
directly or through a close_foo() function) to destroy the object.
For example:
/*** Example of how NOT to use pools. Don't be like this. ***/
static foo_t *
make_foo_object (arg1, arg2, apr_pool_t *pool)
{
apr_pool_t *subpool = svn_pool_create (pool);
foo_t *foo == apr_palloc (subpool, sizeof (*foo));
foo->field1 = arg1;
foo->field2 = arg2;
foo->pool = subpool;
}
[...]
[Now some function calls make_foo_object() and returns, passing
back a new foo object.]
[...]
[Now someone, at some random call level, decides that the foo's
lifetime is over, and calls svn_pool_destroy(foo->pool).]
This is tempting, but it defeats the point of using pools, which is to
not worry so much about individual allocations, but rather about
overall performance and lifetime groups. Instead, foo_t generally
should not have a `pool' field. Just allocate as many foo objects as
you need in the current pool -- when that pool gets cleared or
destroyed, they will all go away simultaneously.
In summary:
- Objects should not have their own pools. An object is allocated
into a pool defined by the constructor's caller. The caller
knows the lifetime of the object and will manage it via the pool.
- Functions should not create/destroy pools for their operation;
they should use a pool provided by the caller. Again, the
caller knows more about how the function will be used, how often,
how many times, etc. thus, it should be in charge of the
function's memory usage.
For example, the caller might know that the app will exit upon
the function's return. Thus, the function would create extra
work if it built/destroyed a pool. Instead, it should use the
passed-in pool, which the caller is going to be tossing as part
of app-exit anyway.
- Whenever an unbounded iteration occurs, an iteration subpool
should be used.
- Given all of the above, it is pretty well mandatory to pass a
pool to every function. Since objects are not recording pools
for themselves, and the caller is always supposed to be managing
memory, then each function needs a pool, rather than relying on
some hidden magic pool. In limited cases, objects may record the
pool used for their construction so that they can construct
sub-parts, but these cases should be examined carefully.
See also the section "Tracking down memory leaks" for tips on
diagnosing pool usage problems.
APR status codes
=================
Always check for APR status codes (except APR_SUCCESS) with the
APR_STATUS_IS_...() macros, not by direct comparison. This is
required for portability to non-Unix platforms.
Automated tests
===============
For a description of how to use and add tests to Subversion's
automated test framework, please read 'subversion/tests/README'.
Writing test cases before code
==============================
From: Karl Fogel <kfogel@collab.net>
Subject: writing test cases
To: dev@subversion.tigris.org
Date: Mon, 5 Mar 2001 15:58:46 -0600
Many of us implementing the filesystem interface have now gotten into
the habit of writing the test cases (see fs-test.c) *before* writing
the actual code. It's really helping us out a lot -- for one thing,
it forces one to define the task precisely in advance, and also it
speedily reveals the bugs in one's first try (and second, and
third...).
I'd like to recommend this practice to everyone. If you're
implementing an interface, or adding an entirely new feature, or even
just fixing a bug, a test for it is a good idea. And if you're going
to write the test anyway, you might as well write it first. :-)
Yoshiki Hayashi's been sending test cases with all his patches lately,
which is what inspired me to write this mail to encourage everyone to
do the same. Having those test cases makes patches easier to examine,
because they show the patch's purpose very clearly. It's like having
a second log message, one whose accuracy is verified at run-time.
That said, I don't think we want a rigid policy about this, at least
not yet. If you encounter a bug somewhere in the code, but you only
have time to write a patch with no test case, that's okay -- having
the patch is still useful; someone else can write the test case.
As Subversion gets more complex, though, the automated test suite gets
more crucial, so let's all get in the habit of using it early.
-K
Debugging the ra_dav server
===========================
'mod_dav_svn.so' contains the main Subversion server logic; it runs as
a module within mod_dav, which runs as a module within httpd. Since
httpd is probably using dynamic shared modules, you normally won't be
able to set breakpoints in advance when you start Apache in a debugger
such as GDB. Instead, you'll need to start up, then interrupt httpd,
set your breakpoint, and continue:
% gdb httpd
(gdb) run -X
^C
(gdb) break some_func_in_mod_dav_svn
(gdb) continue
The -X switch is equivalent to -DONE_PROCESS and -DNO_DETACH, which
ensure that httpd runs as a single thread and remains attached to the
tty, respectively. As soon as it starts, it sits and waits for
requests; that's when you hit control-C and set your breakpoint.
You'll probably want to watch Apache's run-time logs
/usr/local/apache2/logs/error_log
/usr/local/apache2/logs/access_log
to help determine what might be going wrong and where to set
breakpoints.
Debugging the ra_svn client and server, on Unix
===============================================
Bugs in ra_svn usually manifest themselves with one of the following
cryptic error messages:
svn: Malformed network data
svn: Connection closed unexpectedly
(The first message can also mean the data stream was corrupted in
tunnel mode by user dotfiles or hook scripts; see issue 1145.) The
first message generally means you to have to debug the client; the
second message generally means you have to debug the server.
It is easiest to debug ra_svn using a build with --disable-shared
--enable-maintainer-mode. With the latter option, the error message
will tell you exactly what line to set a breakpoint at; otherwise,
look up the line number at the end of marshal.c:vparse_tuple() where
it returns the "Malformed network data" error.
To debug the client, simply pull it up in gdb, set a breakpoint, and
run the offending command:
% gdb svn
(gdb) break marshal.c:NNN
(gdb) run ARGS
Breakpoint 1, vparse_tuple (list=___, pool=___, fmt=___,
ap=___) at subversion/libsvn_ra_svn/marshal.c:NNN
NNN "Malformed network data");
There are several bits of useful information:
* A backtrace will tell you exactly what protocol exchange is
failing.
* "print *conn" will show you the connection buffer. read_buf,
read_ptr, and read_end represent the read buffer, which can show
you the data the marshaller is looking at. (Since read_buf isn't
generally 0-terminated at read_end, be careful of falsely assuming
that there's garbage data in the buffer.)
* The format string determines what the marshaller was expecting to
see.
To debug the server in daemon mode, pull it up in gdb, set a
breakpoint (usually a "Connection closed unexpectedly" error on the
client indicates a "Malformed network data" error on the server,
although it can also indicate a core dump), and run it with the "-X"
option to serve a single connection:
% gdb svnserve
(gdb) break marshal.c:NNN
(gdb) run -X
Then run the offending client command. From there, it's just like
debugging the client.
Debugging the server in tunnel mode is more of a pain. You'll need to
stick something like "{ int x = 1; while (x); }" near the top of
svnserve's main() and put the resulting svnserve in the user path on
the server. Then start an operation, gdb attach the process on the
server, "set x = 0", and step through the code as desired.
Tracking down memory leaks
==========================
Our use of APR pools makes it unusual for us to have memory leaks in
the strictest sense; all the memory we allocate will be cleaned up
eventually. But sometimes an operation takes more memory than it
should; for instance, a checkout of a large source tree should not use
much more memory than a checkout of a small source tree. When that
happens, it generally means we're allocating memory from a pool whose
lifetime is too long.
If you have a favorite memory leak tracking tool, you can configure
with --enable-pool-debug (which will make every pool allocation use
its own malloc()), arrange to exit in the middle of the operation, and
go to it. If not, here's another way:
* Configure with --enable-pool-debug=verbose-alloc. Make sure to
rebuild all of APR and Subversion so that every allocation gets
file-and-line information.
* Run the operation, piping stderr to a file. Hopefully you have
lots of disk space.
* In the file, you'll see lots of lines which look like:
POOL DEBUG: [5383/1024] PCALLOC ( 2763/ 2763/ 5419) 0x08102D48 "subversion/clients/cmdline/main.c:612" <subversion/libsvn_subr/auth.c:122> (118/118/0)
What you care about most is the tenth field (the one in quotes),
which gives you the file and line number where the pool for this
allocation was created. Go to that file and line and determine
the lifetime of the pool. In the example above, main.c:612
indicates that this allocation was made in the top-level pool of
the svn client. If this were an allocation which was repeated
many times during the course of an operation, that would indicate
a source of a memory leak. The eleventh field (the one in
brackets) gives the file and line number of the allocation itself.
Writing log messages
====================
Certain guidelines should be adhered to when writing log messages:
Make a log message for every change. The value of the log becomes
much less if developers cannot rely on its completeness. Even if
you've only changed comments, write a log that says "Doc fix." or
something.
Use full sentences, not sentence fragments. Fragments are more often
ambiguous, and it takes only a few more seconds to write out what you
mean. Fragments like "Doc fix", "New file", or "New function" are
acceptable because they are standard idioms, and all further details
should appear in the source code.
The log message should name every affected function, variable, macro,
makefile target, grammar rule, etc, including the names of symbols
that are being removed in this commit. This helps people searching
through the logs later. Don't hide names in wildcards, because the
globbed portion may be what someone searches for later. For example,
this is bad:
* twirl.c
(twirling_baton_*): Removed these obsolete structures.
(handle_parser_warning): Pass data directly to callees, instead
of storing in twirling_baton_*.
* twirl.h: Fix indentation.
Later on, when someone is trying to figure out what happened to
`twirling_baton_fast', they may not find it if they just search for
"_fast". A better entry would be:
* twirl.c
(twirling_baton_fast, twirling_baton_slow): Removed these
obsolete structures.
(handle_parser_warning): Pass data directly to callees, instead
of storing in twirling_baton_*.
* twirl.h: Fix indentation.
The wildcard is okay in the description for `handle_parser_warning',
but only because the two structures were mentioned by full name
elsewhere in the log entry.
Note how each file gets its own entry prefixed with an "*", and the
changes within a file are grouped by symbol, with the symbols listed
in parentheses followed by a colon, followed by text describing the
change. Please adhere to this format -- not only does consistency aid
readability, it also allows software to colorize log entries
automatically.
If your change is related to a specific issue in the issue tracker,
then include a string like "issue #N" in the log message. For
example, if a patch resolves issue 1729, then the log message might
be:
Fix issue #1729: Don't crash because of a missing file.
* get_editor.c
(frobnicate_file): Check that file exists before frobnicating.
For large changes or change groups, group the log entry into
paragraphs separated by blank lines. Each paragraph should be a set
of changes that accomplishes a single goal, and each group should
start with a sentence or two summarizing the change. Truly
independent changes should be made in separate commits, of course.
One should never need the log entries to understand the current code.
If you find yourself writing a significant explanation in the log, you
should consider carefully whether your text doesn't actually belong in
a comment, alongside the code it explains. Here's an example of doing
it right:
(consume_count): If `count' is unreasonable, return 0 and don't
advance input pointer.
And then, in `consume_count' in `cplus-dem.c':
while (isdigit ((unsigned char)**type))
{
count *= 10;
count += **type - '0';
/* A sanity check. Otherwise a symbol like
`_Utf390_1__1_9223372036854775807__9223372036854775'
can cause this function to return a negative value.
In this case we just consume until the end of the string. */
if (count > strlen (*type))
{
*type = save;
return 0;
}
This is why a new function, for example, needs only a log entry saying
"New Function" --- all the details should be in the source.
There are some common-sense exceptions to the need to name everything
that was changed:
* If you have made a change which requires trivial changes
throughout the rest of the program (e.g., renaming a variable),
you needn't name all the functions affected, you can just say
"All callers changed".
* If you have rewritten a file completely, the reader understands
that everything in it has changed, so your log entry may simply
give the file name, and say "Rewritten".
* If your change was only to one file, or was the same change to
multiple files, then there's no need to list their paths in the
log message (because "svn log" can show the changed paths for
that revision anyway). Only when you need to describe how the
change affected different areas in different ways is it
necessary to organize the log message by paths and symbols, as
in the examples above.
* If your change was to multiple files, provide a brief summary of
the change at the top of the log message (before the paths and
symbols list).
In general, there is a tension between making entries easy to find by
searching for identifiers, and wasting time or producing unreadable
entries by being exhaustive. Use your best judgment --- and be
considerate of your fellow developers. (Also, run "svn log" to see
how others have been writing their log entries.)
Patch submission guidelines
===========================
Mail patches to `dev@subversion.tigris.org', with a subject line
that contains the word "PATCH" in all uppercase, for example
Subject: [PATCH] fix for rev printing bug in svn status
A patch submission should contain one logical change; please don't mix
N unrelated changes in one submission -- send N separate emails
instead.
The email message should start off with a log message, as described in
"Writing log messages" above. The patch itself should be in unified
diff format (e.g., with "svn diff"), preferably inserted directly into
the body of your message (rather than MIME-attached, uuencoded, or
otherwise opaqified). If your mailer wraps long lines, then you will
need to attach your patch. Please ensure the MIME type of the attachment
is text/plain (some mailers allow you to set the MIME type; for some
others, you might have to use a .txt extension on your patch file). Do
not compress or otherwise encode the attached patch.
If the patch implements a new feature, make sure to describe the
feature completely in your mail; if the patch fixes a bug, describe
the bug in detail and give a reproduction recipe. An exception to
these guidelines is when the patch addresses a specific issue in the
issues database -- in that case, just make sure to refer to the issue
number in your log message, as described in "Writing log messages".
It is normal for patches to undergo several rounds of feedback and
change before being applied. Don't be discouraged if your patch is
not accepted immediately -- it doesn't mean you goofed, it just means
that there are a *lot* of eyes looking at every code submission, and
it's a rare patch that doesn't have at least a little room for
improvement. After reading people's responses to your patch, make the
appropriate changes and resubmit, wait for the next round of feedback,
and lather, rinse, repeat, until some committer applies it.
If you don't get a response for a while, and don't see the patch
applied, it may just mean that people are really busy. Go ahead and
repost, and don't hesitate to point out that you're still waiting for
a response. One way to think of it is that patch management is highly
parallizable, and we need you to shoulder your share of the management
as well as the coding. Every patch needs someone to shepherd it
through the process, and the person best qualified to do that is the
original submitter.
Filing bugs / issues
====================
This pretty much says it all:
From: Karl Fogel <kfogel@collab.net>
Subject: Please ask on the list before filing a new issue.
To: dev@subversion.tigris.org
Date: Tue, 30 Jul 2002 10:51:24 (CDT)
Folks, we're getting tons of new issues, which is a Good Thing in
general, but some of them don't really belong in the issue tracker.
They're things that would be better solved by a quick conversation
here on the dev list. Compilation problems, behavior questions,
feature ideas that have been discussed before, that sort of thing.
*Please* be more conservative about filing issues. The issues
database is physically much more cumbersome than email. It wastes
people's time to have conversations in the issues database that should
be had in email. (This is not a libel against the issue tracker, it's
just a result of the fact that the issues database is for permanent
storage and flow annotation, not for real-time conversation.)
If you encounter a situation where Subversion is clearly behaving
wrongly, or behaving opposite to what the documentation says, then
it's okay to file the issue right away (after searching to make sure
it isn't already filed, of course!). But if you're
a) Requesting a new feature, or
b) Having build problems, or
c) Not sure what the behavior should be, or
d) Disagreeing with current intended behavior, or
e) Not TOTALLY sure that others would agree this is a bug, or
f) For any reason at all not sure this should be filed,
...then please post to the dev list first. You'll get a faster
response, and others won't be forced to use the issues database to
have the initial real-time conversations.
Nothing is lost this way. If we eventually conclude that it should be
in the issue tracker, then we can still file it later, after the
description and reproduction recipe have been honed on the dev list.
Thank you,
-Karl
Commit access
=============
There are two types of commit access: full and partial. Full means
anywhere in the tree, partial means only in that committer's specific
area(s) of expertise. The COMMITTERS file lists all committers, both
full and partial, and says the domains for each partial committer.
How full commit access is granted:
After someone has successfully contributed a few non-trivial patches,
some full committer, usually whoever has reviewed and applied the most
patches from that contributor, proposes them for commit access. This
proposal is sent only to the other full committers -- the ensuing
discussion is private, so that everyone can feel comfortable speaking
their minds. Assuming there are no objections, the contributor is
granted commit access. The decision is made by consensus; there are
no formal rules governing the procedure, though generally if someone
strongly objects the access is not offered, or is offered on a
provisional basis.
The criteria for commit access are that the person's patches adhere to
the guidelines in this file, adhere to all the usual unquantifiable
rules of coding (code should be readable, robust, maintainable, etc),
and that the person respects the "Hippocratic Principle": first, do no
harm. In other words, what is significant is not the size or quantity
of patches submitted, but the degree of care shown in avoiding bugs
and minimizing unnecessary impact on the rest of the code. Many full
committers are people who have not made major code contributions, but
rather lots of small, clean fixes, each of which was an unambiguous
improvement to the code.
How partial commit access is granted:
A full committer sponsors the partial committer. Usually this means
the full committer has applied several patches to the same area from
the proposed partial committer, and realizes things would be easier if
the person were just committing directly. Approval is not required
from the full committers; it is assumed that sponsors know what
they're doing and will watch the partial committer's first few commits
to make sure everything's going smoothly.
The configuration/build system under unix
=========================================
Greg Stein wrote a custom build system for Subversion, which had been
using `automake' and recursive Makefiles. Now it uses a single,
top-level Makefile, generated from Makefile.in (which is kept under
revision control). `Makefile.in' in turn includes `build-outputs.mk',
which is automatically generated from `build.conf' by the
`gen-make.py' script. Thus, the latter two are under revision
control, but `build-outputs.mk' is not.
Here is Greg's original mail describing the system, followed by some
advice about hacking it:
From: Greg Stein <gstein@lyra.org>
Subject: new build system (was: Re: CVS update: MODIFIED: ac-helpers ...)
To: dev@subversion.tigris.org
Date: Thu, 24 May 2001 07:20:55 -0700
Message-ID: <20010524072055.F5402@lyra.org>
On Thu, May 24, 2001 at 01:40:17PM -0000, gstein@tigris.org wrote:
> User: gstein
> Date: 01/05/24 06:40:17
>
> Modified: ac-helpers .cvsignore svn-apache.m4
> Added: . Makefile.in
> Log:
> Switch over to the new non-recursive build system.
>...
Okay... this is it. We're now on the build system.
"It works on my machine."
I suspect there may be some tweaks to make on different OSs. I'd be
interested to hear if Ben can really build with normal BSD make. It
should be possible.
The code supports building, installation, checking, and
dependencies. It does *NOT* yet deal with the doc/ subdirectory. That
is next; I figured this could be rolled out and get the kinks worked
out while I do the doc/ stuff. Oh, it doesn't build Neon or APR yet
either. I also saw a problem where libsvn_fs wasn't getting built
before linking one of the test proggies (see below).
Basic operation: same as before.
$ ./autogen.sh
$ ./configure OPTIONS
$ make
$ make check
$ make install
There are some "make check" scripts that need to be fixed up. That'll
happen RSN. Some of them create their own log, rather than spewing to
stdout (where the top-level make will place the output into
[TOP]/tests.log).
The old Makefile.am files are still around, but I'll be tossing those
along with a bunch of tweaks to all the .cvsignore files. There are a
few other cleanups, too. But that can happen as a step two.
[ $ cvs rm -f `find . -name Makefile.rm`
See the mistake in that line? I didn't when I typed it. The find
returned nothing, so cvs rm -f proceeded to delete my entire
tree. And the -f made sure to delete all my source files, too. Good
fugging thing that I had my mods in some Emacs buffers, or I'd be
bitching.
I am *so* glad that Ben coded SVN to *not* delete locally modified
files *and* that we have an "undel" command. I had to go and tweak a
bazillion Entries files to undo the delete...
]
The top-level make has a number of shortcuts in it (well, actually in
build-outputs.mk):
$ make subversion/libsvn_fs/libsvn_fs.la
or
$ make libsvn_fs
The two are the same. So... when your test proggie fails to link
because libsvn_fs isn't around, just run "make libsvn_fs" to build it
immediately, then go back to the regular "make".
Note that the system still conditionally builds the FS stuff based
on whether DB (See 'Building on Unix' below) is available, and
mod_dav_svn if Apache is available.
Handy hint: if you don't like dependencies, then you can do:
$ ./autogen.sh -s
That will skip the dependency generation that goes into
build-outputs.mk. It makes the script run quite a bit faster (48 secs
vs 2 secs on my poor little Pentium 120).
Note that if you change build.conf, you can simply run:
$ ./gen-make.py build.conf
to regen build-outputs.mk. You don't have to go back through the whole
autogen.sh / configure process.
You should also note that autogen.sh and configure run much faster now
that we don't have the automake crap. Oh, and our makefiles never
re-run configure on you out of the blue (gawd, I hated when automake
did that to me).
Obviously, there are going to be some tweaky things going on. I also
think that the "shadow" builds or whatever they're called (different
source and build dirs) are totally broken. Something tweaky will have
to happen there. But, thankfully, we only have one Makefile to deal
with.
Note that I arrange things so that we have one generated file
(build-outputs.mk), and one autoconf-generated file (Makefile from
.in). I also tried to shove as much logic/rules into
Makefile.in. Keeping build-outputs.mk devoid of rules (thus, implying
gen-make.py devoid of rules in its output generation) manes that
tweaking rules in Makefile.in is much more approachable to people.
I think that is about it. Send problems to the dev@ list and/or feel
free to dig in and fix them yourself. My next steps are mostly
cleanup. After that, I'm going to toss out our use of libtool and rely
on APR's libtool setup (no need for us to replicate what APR already
did).
Cheers,
-g
--
Greg Stein, http://www.lyra.org/
And here is some advice for those changing or testing the
configuration/build system:
From: Karl Fogel <kfogel@collab.net>
To: dev@subversion.tigris.org
Subject: when changing build/config stuff, always do this first
Date: Wed 28 Nov 2001
Yo everyone: if you change part of the configuration/build system,
please make sure to clean out any old installed Subversion libs
*before* you try building with your changes. If you don't do this,
your changes may appear to work fine, when in fact they would fail if
run on a truly pristine system.
This script demonstrates what I mean by "clean out". This is
`/usr/local/cleanup.sh' on my system. It cleans out the Subversion
libs (and the installed httpd-2.0 libs, since I'm often reinstalling
that too):
#!/bin/sh
# Take care of libs
cd /usr/local/lib
rm -f APRVARS
rm -f libapr*
rm -f libexpat*
rm -f libneon*
rm -f libsvn*
# Take care of headers
cd /usr/local/include
rm -f apr*
rm -f svn*
rm -f neon/*
# Take care of headers
cd /usr/local/apache2/lib
rm -f *
When someone reports a configuration bug and you're trying to
reproduce it, run this first. :-)
The voice of experience,
-Karl
How to release a distribution tarball
====================================
See notes/releases.txt.
Release numbering
=================
Please see http://apr.apache.org/versioning.html -- Subversion uses
the same guidelines.
Compatibility
=============
Before 1.0, we require only that each interim release be compatible
with the one before it. For example, a 0.20 server must work with
both 0.19 and 0.21 clients (and, obviously, a 0.20 client as well),
but may be incompatible with a 0.18 client (though gratuitous
incompatibility should be avoided whenever possible, of course).
Patch levels don't count: 0.20.1, 0.20.2, etc, must be compatible with
0.19.*.
The purpose of this policy is to help early adopters make good
decisions about when to upgrade, without impeding development too
much. After 1.0 is released, we will probably adopt a more
conservative policy.
Changes to the ra_lib, editor, or reporter API may result in
incompatibilities in ra_dav or ra_svn. Observe these principles when
updating the ra_svn code:
(1) Fields can be added to any tuple; old clients will simply ignore
them. (Right now, the marshalling implementation does not let
you put number or boolean values in the optional part of a tuple,
but changing that will not affect the protocol.)
We can use this mechanism when information is added to an API
call.
(2) At connection establishment time, clients and servers exchange a
list of capability keywords.
We can use this mechanism for more complicated changes, like
introducing pipelining or removing information from API calls.
(3) New commands can be added; trying to use an unsupported command
will result in an error which can be checked and dealt with.
(4) The protocol version number can be bumped to allow graceful
refusal of old clients or servers, or to allow a client or
server to detect when it has to do things the old way.
This mechanism is a last resort, to be used when capability
keywords would be too hard to manage.