| -*-text-*- |
| |
| If you are contributing code to the Subversion project, please read |
| this first. |
| |
| ============================ |
| HACKER'S GUIDE TO SUBVERSION |
| ============================ |
| |
| $LastChangedDate$ |
| |
| TABLE OF CONTENTS |
| |
| * Participating in the community |
| * Theory and documentation |
| * What to read |
| * Directory layout |
| * Coding style |
| * Secure coding guidelines |
| * Document everything |
| * Using page breaks |
| * Other conventions |
| * APR pool usage conventions |
| * APR status codes |
| * Automated tests |
| * Writing test cases before code |
| * Debugging the server |
| * Tracking down memory leaks |
| * Writing log messages |
| * Patch submission guidelines |
| * Filing bugs / issues |
| * Commit access |
| * The configuration/build system under unix |
| * How to release a distribution tarball |
| * Release numbering |
| * Compatibility |
| |
| |
| Participating in the community |
| ============================== |
| |
| Although Subversion is originally sponsored and hosted by CollabNet |
| (http://www.collab.net), it's a true open-source project under a |
| BSD-style license. A number of developers work for CollabNet, some |
| work for other large companies (such as RedHat), and many others are |
| simply excellent volunteers who are interested in building a better |
| version control system. |
| |
| The community exists mainly through mailing lists and a Subversion |
| repository. To participate: |
| |
| Go to http://subversion.tigris.org/ and |
| |
| * Join the "dev", "svn", and "announce" mailing lists. |
| The dev list, dev@subversion.tigris.org, is where almost all |
| discussion takes place. All questions should go there, though |
| you might want to check the list archives first. The "svn" |
| list receives automated commit emails. |
| |
| * Print out and digest the Subversion design specification. |
| This document can be found in the "Documents & files" area of the |
| website in the Documentation/Design folder. A PDF and an HTML |
| version are available on the web. The design spec will give you |
| a theoretical overview of Subversion's design. |
| |
| There are many ways to join the project, either by writing code, or by |
| testing and/or helping to manage the bug database. If you'd like to |
| contribute, then look at: |
| |
| * The bugs/issues database |
| http://subversion.tigris.org/project_issues.html |
| |
| * The bite-sized tasks page |
| http://subversion.tigris.org/project_tasks.html |
| |
| To submit code, simply send your patches to dev@subversion.tigris.org. |
| No, wait, first read the rest of this file, _then_ start sending |
| patches to dev@subversion.tigris.org. :-) |
| |
| To help manage the issues database, read over the issue summaries, |
| looking and testing for issues that are either invalid, or are |
| duplicates of other issues. Both kinds are very common, the first |
| because bugs often get unknowingly fixed as side effects of other |
| changes in the code, and the second because people sometimes file an |
| issue without noticing that it has already been reported. If you are |
| not sure about an issue, post a question to dev@subversion.tigris.org. |
| ("Subversion: We're here to help you help us!") |
| |
| Another way to help is to set up automated builds and test suite runs |
| of Subversion on some platform, and have the output sent to the |
| svn-breakage@subversion.tigris.org mailing list. See more details at |
| http://subversion.tigris.org/servlets/ProjectMailingListList, in the |
| description for the svn-breakage list. |
| |
| |
| |
| Theory and documentation |
| ======================== |
| |
| 1. DESIGN |
| A design spec was written in June 2000, and is a bit out of |
| date. But it still gives a good theoretical introduction to the |
| inner workings of the repository, and to Subversion's various |
| layers. The Texinfo source is in 'doc/programmer/design/', or |
| look at one of the versions posted on the Subversion website. |
| |
| |
| 2. WEBDAV |
| Greg Stein has written an introduction to Subversion's network |
| protocol, which is an extended dialect of HTTP. The document is |
| 'www/webdav-usage.html', and is also posted on the website in |
| the "File sharing" section. |
| |
| |
| 3. USER MANUAL |
| Subversion: The Definitive Guide is an upcoming title to be |
| published by O'Reilly that shows in detail how to effectively |
| use Subversion. This book is still being written, so the |
| contents are still in flux. You may find it in the |
| Documentation/Book folder on the "File sharing" section of the |
| Subversion website. The XML source is in 'doc/book/book'. |
| |
| |
| 4. SYSTEM NOTES |
| A lot of the design ideas for particular aspects of the system |
| have been documented in individual files in the 'notes' |
| directory. |
| |
| |
| |
| What to read |
| ============ |
| |
| Before you can contribute code, you'll need to familiarize yourself |
| with the existing code base and interfaces. |
| |
| Check out a copy of Subversion (anonymously, if you don't yet have an |
| account with commit-access) -- so you can look at the code base. |
| |
| Within 'subversion/include/' are a bunch of header files with huge |
| doc comments. If you read through these, you'll have a pretty good |
| understanding of the implementation details. Here's a suggested |
| perusal order: |
| |
| * the basic building blocks: svn_string.h, svn_error.h, svn_types.h |
| |
| * useful utilities: svn_io.h, svn_path.h, svn_hash.h, svn_xml.h |
| |
| * the critical interface: svn_delta.h |
| |
| * client-side interfaces: svn_ra.h, svn_wc.h, svn_client.h |
| |
| * the repository and versioned filesystem: svn_repos.h, svn_fs.h |
| |
| |
| Subversion tries to stay portable by using only ANSI/ISO C and by |
| using the Apache Portable Runtime (APR) library. APR is the |
| portability layer used by the Apache httpd server, and more |
| information can be found at http://apr.apache.org. |
| |
| Because Subversion depends so heavily on APR, it may be hard to |
| understand Subversion without first glancing over certain header files |
| in APR (look in 'apr/include/'): |
| |
| * memory pools: apr_pools.h |
| |
| * filesystem access: apr_file_io.h |
| |
| * hashes and arrays: apr_hash.h, apr_tables.h |
| |
| |
| Subversion also tries to deliver reliable and secure software. This can |
| only be achieved by educating the developers on secure programming in |
| the C programming language. Please see 'notes/assurance.txt' for the |
| full rationale behind this. Specifically, you should make it a point to |
| carefully read David Wheeler's Secure Programming (as mentioned in |
| 'notes/assurance.txt'). If at any point you have questions about the |
| security implications of a change, you are urged to ask for review on |
| the developer mailing list. |
| |
| |
| Directory layout |
| ================ |
| |
| A rough guide to the source tree: |
| |
| doc/ |
| User and Developer documentation. |
| www/ |
| Subversion web pages (live content at http://subversion.tigris.org/). |
| tools/ |
| Stuff that works with Subversion, but that Subversion doesn't depend on. |
| packages/ |
| Stuff to help packaging systems, like rpm and dpkg. |
| subversion/ |
| Source code to subversion itself (as opposed to external libraries). |
| subversion/include/ |
| Public header files for users of subversion libraries. |
| subversion/libsvn_fs/ |
| The versioning "filesystem" API. |
| subversion/libsvn_repos/ |
| Repository functionality built around the `libsvn_fs' core. |
| subversion/libsvn_delta/ |
| Common code for tree deltas, text deltas, and property deltas. |
| subversion/libsvn_wc/ |
| Common code for working copies. |
| subversion/libsvn_ra/ |
| Common code for repository access. |
| subversion/libsvn_client/ |
| Common code for client operations. |
| subversion/clients/cmdline/ |
| The command line client. |
| subversion/tests/ |
| Automated test suite. |
| apr/ |
| Apache Portable Runtime library. |
| *Note*: This is not in the same repository as Subversion. Read |
| INSTALL for instructions on how to get it if you don't |
| already have it. |
| neon/ |
| Neon library from Joe Orton. |
| *Note*: This is not in the same repository as Subversion. Read |
| INSTALL for instructions on how to get it if you don't |
| already have it. |
| |
| |
| |
| Coding style |
| ============ |
| |
| To understand how things work, read doc/svn-design.{texi,info,ps,pdf}, |
| and read the header files, which tend to have thoroughly-commented |
| data structures. |
| |
| We're using ANSI C, and following the GNU coding standards. Emacs |
| users can just load svn-dev.el to get the right indentation behavior |
| (most source files here will load it automatically, if |
| `enable-local-eval' is set appropriately). |
| |
| Read http://www.gnu.org/prep/standards.html for a full description of |
| the GNU coding standards; but here is a short example demonstrating |
| the most important formatting guidelines: |
| |
| char * /* func type on own line */ |
| argblarg (char *arg1, int arg2) /* func name on own line */ |
| { /* first brace on own line */ |
| if ((some_very_long_condition && arg2) /* indent 2 cols */ |
| || remaining_condition) /* new line before operator */ |
| { /* brace on own line, indent 2 */ |
| arg1 = some_func (arg1, arg2); /* space before opening paren */ |
| } /* close brace on own line */ |
| else |
| { |
| do /* format do-while like this */ |
| { |
| arg1 = another_func (arg1); |
| } |
| while (*arg1); |
| } |
| } |
| |
| In general, be generous with parentheses even when you're sure about |
| the operator precedence, and be willing to add spaces and newlines to |
| avoid "code crunch". Don't worry too much about vertical density; |
| it's more important to make code readable than to fit that extra line |
| on the screen. |
| |
| The controversial GNU convention of putting a space between a function |
| name and its opening parenthesis is optional in Subversion. If you're |
| editing an area of code that already seems to have a consistent |
| preference about this, then just stick with that; otherwise, pick |
| whichever way you like. |
| |
| |
| |
| Secure coding guidelines |
| ======================== |
| |
| Just like almost any other programming language, C has undesirable |
| features which enables an attacker to make your program fail in |
| predictable ways, often to the attacker's benefit. The goal of these |
| guidelines is to make you aware of the pitfalls of C as they apply to |
| the Subversion project. You are encouraged to keep these pitfalls in |
| mind when reviewing code of your peers, as even the most skilled and |
| paranoid programmers make occasional mistakes. |
| |
| Input validation is the act of defining legal input and rejecting |
| everything else. The code must perform input validation on all untrusted |
| input. |
| |
| |
| Security boundaries |
| |
| A security boundary in the Subversion server code must be identified as |
| such as this enables auditors to quickly determine the quality of the |
| boundary. Security boundaries exist where the running code has access |
| to information the user does not or where the code runs with privileges |
| above those of the user making the request. Typical examples of such is |
| code that does access control or an application with the SUID bit set. |
| |
| Functions which make calls to a security boundary must include |
| validation checks of the arguments passed. Functions which themselves |
| are security boundaries should audit the input received and alarm when |
| invoked with improper values. |
| |
| [### todo: need some examples from Subversion here...] |
| |
| |
| String operations |
| |
| Use only the string functions provided in apr_strings.h. These are |
| replacements for standard C library string functions, except safer |
| because they do bounds-checking and dest allocation automatically. |
| Although there may be circumstances where it's theoretically safe to |
| use plain C string functions (such as when you already know the |
| lengths of the source and dest), please use the APR functions anyway, |
| so the code is less brittle and more reviewable. |
| |
| |
| Password storage |
| |
| Help users keep their passwords secret: When the client reads or writes |
| password locally, it should ensure that the file is mode 0600. If the |
| file is readable by other users, the client should exit with a message |
| that tells the user to change the filemode due to the risk of exposure. |
| |
| |
| |
| Document everything |
| =================== |
| |
| Every function, whether public or internal, must start out with a |
| documentation comment that describes what the function does. The |
| documentation should mention every parameter received by the function, |
| every possible return value, and (if not obvious) the conditions under |
| which the function could return an error. Put the parameter names in |
| upper case in the doc string, even when they are not upper case in the |
| actual declaration, so that they stand out to human readers. |
| |
| For public or semi-public API functions, the doc string should go |
| above the function in the .h file where it is declared; otherwise, it |
| goes above the function definition in the .c file. |
| |
| For structure types, document each individual member of the structure |
| as well as the structure itself. |
| |
| Use the doxygen (www.doxygen.org) format for interface documentation. |
| There is still some legacy documentation in Subversion that uses a |
| different style, from before we adopted doxygen, but it will |
| eventually be converted. New documentation should start out in |
| doxygen format. |
| |
| Read over the Subversion code to get an overview of how documentation |
| looks in practice; in particular, see subversion/include/*.h for |
| doxygen examples. |
| |
| |
| |
| Using page breaks |
| ================= |
| |
| We're using page breaks (the Ctrl-L character, ASCII 12) for section |
| boundaries in both code and plaintext prose files. This file is a |
| good example of how it's done: each section starts with a page break, |
| and immediately after the page break comes the title of the section. |
| |
| This helps out people who use the Emacs page commands, such as |
| `pages-directory' and `narrow-to-page'. Such people are not as scarce |
| as you might think, and if you'd like to become one of them, then type |
| C-x C-p C-h in Emacs sometime. |
| |
| |
| |
| Other conventions |
| ================= |
| |
| In addition to the GNU standards, Subversion uses these conventions: |
| |
| * Use only spaces for indenting code, never tabs. Tab display |
| width is not standardized enough, and anyway it's easier to |
| manually adjust indentation that uses spaces. |
| |
| * Stay within 80 columns, the width of a minimal standard display |
| window. |
| |
| * Signify internal variables by two underscores after the prefix. |
| That is, when a symbol must (for technical reasons) reside in the |
| global namespace despite not being part of a published interface, |
| then use two underscores following the module prefix. For |
| example: |
| |
| svn_fs_get_rev_prop () /* Part of published API. */ |
| svn_fs__parse_props () /* For internal use only. */ |
| |
| * In text strings that might be printed out (or otherwise made |
| available) to users, use only forward quotes around paths and |
| other quotable things. For example: |
| |
| $ svn revert foo |
| svn: warning: svn_wc_is_wc_root: 'foo' is not a versioned resource |
| $ |
| |
| There used to be a lot of strings that used a backtick for the |
| first quote (`foo' instead of 'foo'), but that looked bad in |
| some fonts, and also messed up some people's auto-highlighting, |
| so we settled on the convention of always using forward quotes. |
| |
| * If you use Emacs, put something like this in your .emacs file, |
| so you get svn-dev.el and svnbook.el when needed: |
| |
| ;;; Begin Subversion development section |
| (defun my-find-file-hook () |
| (if (string-match "projects/svn/" buffer-file-name) |
| (load "~/projects/svn/tools/dev/svn-dev")) |
| (if (string-match "projects/svn/doc/book" buffer-file-name) |
| (load "~/projects/svn/doc/book/tools/svnbook"))) |
| |
| (add-hook 'find-file-hooks 'my-find-file-hook) |
| ;;; End Subversion development section |
| |
| You'll need to customize the paths for your setup, of course. |
| You can also make the regexp to string-match more selective; for |
| example, one developer says: |
| |
| > Here's the regexp I'm using: |
| > |
| > "src/svn/[^/]*/\\(subversion\\|tools\\|build\\)/" |
| > |
| > Two things to notice there: (1) I sometimes have several |
| > working copies checked out under ...src/svn, and I want the |
| > regexp to match all of them; (2) I want the hook to catch only |
| > in "our" directories within the working copy, so I match |
| > "subversion", "tools" and "build" explicitly; I don't want to |
| > use GNU style in the APR that's checked out into my repo. :-) |
| |
| * We have a tradition of not marking files with the names of |
| individual authors (i.e., we don't put lines like "Author: foo" |
| or "@author foo" in a special position at the top of a source |
| file). This is to discourage territoriality -- even when a file |
| has only one author, we want to make sure others feel free to |
| make changes. People might be unnecessarily hesitant if someone |
| appears to have staked ownership on the file. |
| |
| * There are many other unspoken conventions maintained throughout |
| the code, that are only noticed when someone unintentionally |
| fails to follow them. Just try to have a sensitive eye for the |
| way things are done, and when in doubt, ask. |
| |
| |
| |
| APR pool usage conventions |
| ========================== |
| |
| (This assumes you already basically understand how APR pools work; see |
| apr_pools.h for details.) |
| |
| Applications using the Subversion libraries must call apr_initialize() |
| before calling any Subversion functions. |
| |
| Subversion's general pool usage strategy can be summed up in two |
| principles: |
| |
| 1. The call level that created a pool is the only place to clear or |
| destroy that pool. |
| |
| 2. When iterating an unbounded number of times, create a subpool |
| before entering the iteration, use it inside the loop and clear |
| it at the start of each iteration, then destroy it after the loop |
| is done, like so: |
| |
| subpool = svn_pool_create(pool); |
| |
| for (i = 0; i < n; ++i) |
| { |
| svn_pool_clear(subpool); |
| do_operation(..., subpool); |
| } |
| |
| svn_pool_destroy(subpool); |
| |
| Except for some legacy code, which was written before these principles |
| were fully understood, virtually all pool usage in Subversion follows |
| the above guidelines. |
| |
| One such legacy pattern is a tendency to allocate an object inside a |
| pool, store the pool in the object, and then free that pool (either |
| directly or through a close_foo() function) to destroy the object. |
| |
| For example: |
| |
| /*** Example of how NOT to use pools. Don't be like this. ***/ |
| |
| static foo_t * |
| make_foo_object (arg1, arg2, apr_pool_t *pool) |
| { |
| apr_pool_t *subpool = svn_pool_create (pool); |
| foo_t *foo == apr_palloc (subpool, sizeof (*foo)); |
| |
| foo->field1 = arg1; |
| foo->field2 = arg2; |
| foo->pool = subpool; |
| } |
| |
| [...] |
| |
| [Now some function calls make_foo_object() and returns, passing |
| back a new foo object.] |
| |
| [...] |
| |
| [Now someone, at some random call level, decides that the foo's |
| lifetime is over, and calls svn_pool_destroy(foo->pool).] |
| |
| This is tempting, but it defeats the point of using pools, which is to |
| not worry so much about individual allocations, but rather about |
| overall performance and lifetime groups. Instead, foo_t generally |
| should not have a `pool' field. Just allocate as many foo objects as |
| you need in the current pool -- when that pool gets cleared or |
| destroyed, they will all go away simultaneously. |
| |
| In summary: |
| |
| - Objects should not have their own pools. An object is allocated |
| into a pool defined by the constructor's caller. The caller |
| knows the lifetime of the object and will manage it via the pool. |
| |
| - Functions should not create/destroy pools for their operation; |
| they should use a pool provided by the caller. Again, the |
| caller knows more about how the function will be used, how often, |
| how many times, etc. thus, it should be in charge of the |
| function's memory usage. |
| |
| For example, the caller might know that the app will exit upon |
| the function's return. Thus, the function would create extra |
| work if it built/destroyed a pool. Instead, it should use the |
| passed-in pool, which the caller is going to be tossing as part |
| of app-exit anyway. |
| |
| - Whenever an unbounded iteration occurs, an iteration subpool |
| should be used. |
| |
| - Given all of the above, it is pretty well mandatory to pass a |
| pool to every function. Since objects are not recording pools |
| for themselves, and the caller is always supposed to be managing |
| memory, then each function needs a pool, rather than relying on |
| some hidden magic pool. In limited cases, objects may record the |
| pool used for their construction so that they can construct |
| sub-parts, but these cases should be examined carefully. |
| |
| See also the section "Tracking down memory leaks" for tips on |
| diagnosing pool usage problems. |
| |
| |
| |
| APR status codes |
| ================= |
| |
| Always check for APR status codes (except APR_SUCCESS) with the |
| APR_STATUS_IS_...() macros, not by direct comparison. This is |
| required for portability to non-Unix platforms. |
| |
| |
| |
| Automated tests |
| =============== |
| |
| For a description of how to use and add tests to Subversion's |
| automated test framework, please read 'subversion/tests/README'. |
| |
| |
| |
| Writing test cases before code |
| ============================== |
| |
| From: Karl Fogel <kfogel@collab.net> |
| Subject: writing test cases |
| To: dev@subversion.tigris.org |
| Date: Mon, 5 Mar 2001 15:58:46 -0600 |
| |
| Many of us implementing the filesystem interface have now gotten into |
| the habit of writing the test cases (see fs-test.c) *before* writing |
| the actual code. It's really helping us out a lot -- for one thing, |
| it forces one to define the task precisely in advance, and also it |
| speedily reveals the bugs in one's first try (and second, and |
| third...). |
| |
| I'd like to recommend this practice to everyone. If you're |
| implementing an interface, or adding an entirely new feature, or even |
| just fixing a bug, a test for it is a good idea. And if you're going |
| to write the test anyway, you might as well write it first. :-) |
| |
| Yoshiki Hayashi's been sending test cases with all his patches lately, |
| which is what inspired me to write this mail to encourage everyone to |
| do the same. Having those test cases makes patches easier to examine, |
| because they show the patch's purpose very clearly. It's like having |
| a second log message, one whose accuracy is verified at run-time. |
| |
| That said, I don't think we want a rigid policy about this, at least |
| not yet. If you encounter a bug somewhere in the code, but you only |
| have time to write a patch with no test case, that's okay -- having |
| the patch is still useful; someone else can write the test case. |
| |
| As Subversion gets more complex, though, the automated test suite gets |
| more crucial, so let's all get in the habit of using it early. |
| |
| -K |
| |
| |
| |
| Debugging the ra_dav server |
| =========================== |
| |
| 'mod_dav_svn.so' contains the main Subversion server logic; it runs as |
| a module within mod_dav, which runs as a module within httpd. Since |
| httpd is probably using dynamic shared modules, you normally won't be |
| able to set breakpoints in advance when you start Apache in a debugger |
| such as GDB. Instead, you'll need to start up, then interrupt httpd, |
| set your breakpoint, and continue: |
| |
| % gdb httpd |
| (gdb) run -X |
| ^C |
| (gdb) break some_func_in_mod_dav_svn |
| (gdb) continue |
| |
| The -X switch is equivalent to -DONE_PROCESS and -DNO_DETACH, which |
| ensure that httpd runs as a single thread and remains attached to the |
| tty, respectively. As soon as it starts, it sits and waits for |
| requests; that's when you hit control-C and set your breakpoint. |
| |
| You'll probably want to watch Apache's run-time logs |
| |
| /usr/local/apache2/logs/error_log |
| /usr/local/apache2/logs/access_log |
| |
| to help determine what might be going wrong and where to set |
| breakpoints. |
| |
| |
| |
| Debugging the ra_svn client and server, on Unix |
| =============================================== |
| |
| Bugs in ra_svn usually manifest themselves with one of the following |
| cryptic error messages: |
| |
| svn: Malformed network data |
| svn: Connection closed unexpectedly |
| |
| (The first message can also mean the data stream was corrupted in |
| tunnel mode by user dotfiles or hook scripts; see issue 1145.) The |
| first message generally means you to have to debug the client; the |
| second message generally means you have to debug the server. |
| |
| It is easiest to debug ra_svn using a build with --disable-shared |
| --enable-maintainer-mode. With the latter option, the error message |
| will tell you exactly what line to set a breakpoint at; otherwise, |
| look up the line number at the end of marshal.c:vparse_tuple() where |
| it returns the "Malformed network data" error. |
| |
| To debug the client, simply pull it up in gdb, set a breakpoint, and |
| run the offending command: |
| |
| % gdb svn |
| (gdb) break marshal.c:NNN |
| (gdb) run ARGS |
| Breakpoint 1, vparse_tuple (list=___, pool=___, fmt=___, |
| ap=___) at subversion/libsvn_ra_svn/marshal.c:NNN |
| NNN "Malformed network data"); |
| |
| There are several bits of useful information: |
| |
| * A backtrace will tell you exactly what protocol exchange is |
| failing. |
| |
| * "print *conn" will show you the connection buffer. read_buf, |
| read_ptr, and read_end represent the read buffer, which can show |
| you the data the marshaller is looking at. (Since read_buf isn't |
| generally 0-terminated at read_end, be careful of falsely assuming |
| that there's garbage data in the buffer.) |
| |
| * The format string determines what the marshaller was expecting to |
| see. |
| |
| To debug the server in daemon mode, pull it up in gdb, set a |
| breakpoint (usually a "Connection closed unexpectedly" error on the |
| client indicates a "Malformed network data" error on the server, |
| although it can also indicate a core dump), and run it with the "-X" |
| option to serve a single connection: |
| |
| % gdb svnserve |
| (gdb) break marshal.c:NNN |
| (gdb) run -X |
| |
| Then run the offending client command. From there, it's just like |
| debugging the client. |
| |
| Debugging the server in tunnel mode is more of a pain. You'll need to |
| stick something like "{ int x = 1; while (x); }" near the top of |
| svnserve's main() and put the resulting svnserve in the user path on |
| the server. Then start an operation, gdb attach the process on the |
| server, "set x = 0", and step through the code as desired. |
| |
| |
| |
| Tracking down memory leaks |
| ========================== |
| |
| Our use of APR pools makes it unusual for us to have memory leaks in |
| the strictest sense; all the memory we allocate will be cleaned up |
| eventually. But sometimes an operation takes more memory than it |
| should; for instance, a checkout of a large source tree should not use |
| much more memory than a checkout of a small source tree. When that |
| happens, it generally means we're allocating memory from a pool whose |
| lifetime is too long. |
| |
| If you have a favorite memory leak tracking tool, you can configure |
| with --enable-pool-debug (which will make every pool allocation use |
| its own malloc()), arrange to exit in the middle of the operation, and |
| go to it. If not, here's another way: |
| |
| * Configure with --enable-pool-debug=verbose-alloc. Make sure to |
| rebuild all of APR and Subversion so that every allocation gets |
| file-and-line information. |
| |
| * Run the operation, piping stderr to a file. Hopefully you have |
| lots of disk space. |
| |
| * In the file, you'll see lots of lines which look like: |
| |
| POOL DEBUG: [5383/1024] PCALLOC ( 2763/ 2763/ 5419) 0x08102D48 "subversion/clients/cmdline/main.c:612" <subversion/libsvn_subr/auth.c:122> (118/118/0) |
| |
| What you care about most is the tenth field (the one in quotes), |
| which gives you the file and line number where the pool for this |
| allocation was created. Go to that file and line and determine |
| the lifetime of the pool. In the example above, main.c:612 |
| indicates that this allocation was made in the top-level pool of |
| the svn client. If this were an allocation which was repeated |
| many times during the course of an operation, that would indicate |
| a source of a memory leak. The eleventh field (the one in |
| brackets) gives the file and line number of the allocation itself. |
| |
| |
| |
| Writing log messages |
| ==================== |
| |
| Certain guidelines should be adhered to when writing log messages: |
| |
| Make a log message for every change. The value of the log becomes |
| much less if developers cannot rely on its completeness. Even if |
| you've only changed comments, write a log that says "Doc fix." or |
| something. |
| |
| Use full sentences, not sentence fragments. Fragments are more often |
| ambiguous, and it takes only a few more seconds to write out what you |
| mean. Fragments like "Doc fix", "New file", or "New function" are |
| acceptable because they are standard idioms, and all further details |
| should appear in the source code. |
| |
| The log message should name every affected function, variable, macro, |
| makefile target, grammar rule, etc, including the names of symbols |
| that are being removed in this commit. This helps people searching |
| through the logs later. Don't hide names in wildcards, because the |
| globbed portion may be what someone searches for later. For example, |
| this is bad: |
| |
| * twirl.c |
| (twirling_baton_*): Removed these obsolete structures. |
| (handle_parser_warning): Pass data directly to callees, instead |
| of storing in twirling_baton_*. |
| |
| * twirl.h: Fix indentation. |
| |
| Later on, when someone is trying to figure out what happened to |
| `twirling_baton_fast', they may not find it if they just search for |
| "_fast". A better entry would be: |
| |
| * twirl.c |
| (twirling_baton_fast, twirling_baton_slow): Removed these |
| obsolete structures. |
| (handle_parser_warning): Pass data directly to callees, instead |
| of storing in twirling_baton_*. |
| |
| * twirl.h: Fix indentation. |
| |
| The wildcard is okay in the description for `handle_parser_warning', |
| but only because the two structures were mentioned by full name |
| elsewhere in the log entry. |
| |
| Note how each file gets its own entry prefixed with an "*", and the |
| changes within a file are grouped by symbol, with the symbols listed |
| in parentheses followed by a colon, followed by text describing the |
| change. Please adhere to this format -- not only does consistency aid |
| readability, it also allows software to colorize log entries |
| automatically. |
| |
| If your change is related to a specific issue in the issue tracker, |
| then include a string like "issue #N" in the log message. For |
| example, if a patch resolves issue 1729, then the log message might |
| be: |
| |
| Fix issue #1729: Don't crash because of a missing file. |
| |
| * get_editor.c |
| (frobnicate_file): Check that file exists before frobnicating. |
| |
| For large changes or change groups, group the log entry into |
| paragraphs separated by blank lines. Each paragraph should be a set |
| of changes that accomplishes a single goal, and each group should |
| start with a sentence or two summarizing the change. Truly |
| independent changes should be made in separate commits, of course. |
| |
| One should never need the log entries to understand the current code. |
| If you find yourself writing a significant explanation in the log, you |
| should consider carefully whether your text doesn't actually belong in |
| a comment, alongside the code it explains. Here's an example of doing |
| it right: |
| |
| (consume_count): If `count' is unreasonable, return 0 and don't |
| advance input pointer. |
| |
| And then, in `consume_count' in `cplus-dem.c': |
| |
| while (isdigit ((unsigned char)**type)) |
| { |
| count *= 10; |
| count += **type - '0'; |
| /* A sanity check. Otherwise a symbol like |
| `_Utf390_1__1_9223372036854775807__9223372036854775' |
| can cause this function to return a negative value. |
| In this case we just consume until the end of the string. */ |
| if (count > strlen (*type)) |
| { |
| *type = save; |
| return 0; |
| } |
| |
| This is why a new function, for example, needs only a log entry saying |
| "New Function" --- all the details should be in the source. |
| |
| There are some common-sense exceptions to the need to name everything |
| that was changed: |
| |
| * If you have made a change which requires trivial changes |
| throughout the rest of the program (e.g., renaming a variable), |
| you needn't name all the functions affected, you can just say |
| "All callers changed". |
| |
| * If you have rewritten a file completely, the reader understands |
| that everything in it has changed, so your log entry may simply |
| give the file name, and say "Rewritten". |
| |
| * If your change was only to one file, or was the same change to |
| multiple files, then there's no need to list their paths in the |
| log message (because "svn log" can show the changed paths for |
| that revision anyway). Only when you need to describe how the |
| change affected different areas in different ways is it |
| necessary to organize the log message by paths and symbols, as |
| in the examples above. |
| |
| * If your change was to multiple files, provide a brief summary of |
| the change at the top of the log message (before the paths and |
| symbols list). |
| |
| In general, there is a tension between making entries easy to find by |
| searching for identifiers, and wasting time or producing unreadable |
| entries by being exhaustive. Use your best judgment --- and be |
| considerate of your fellow developers. (Also, run "svn log" to see |
| how others have been writing their log entries.) |
| |
| |
| |
| Patch submission guidelines |
| =========================== |
| |
| Mail patches to `dev@subversion.tigris.org', with a subject line |
| that contains the word "PATCH" in all uppercase, for example |
| |
| Subject: [PATCH] fix for rev printing bug in svn status |
| |
| A patch submission should contain one logical change; please don't mix |
| N unrelated changes in one submission -- send N separate emails |
| instead. |
| |
| The email message should start off with a log message, as described in |
| "Writing log messages" above. The patch itself should be in unified |
| diff format (e.g., with "svn diff"), preferably inserted directly into |
| the body of your message (rather than MIME-attached, uuencoded, or |
| otherwise opaqified). If your mailer wraps long lines, then you will |
| need to attach your patch. Please ensure the MIME type of the attachment |
| is text/plain (some mailers allow you to set the MIME type; for some |
| others, you might have to use a .txt extension on your patch file). Do |
| not compress or otherwise encode the attached patch. |
| |
| If the patch implements a new feature, make sure to describe the |
| feature completely in your mail; if the patch fixes a bug, describe |
| the bug in detail and give a reproduction recipe. An exception to |
| these guidelines is when the patch addresses a specific issue in the |
| issues database -- in that case, just make sure to refer to the issue |
| number in your log message, as described in "Writing log messages". |
| |
| It is normal for patches to undergo several rounds of feedback and |
| change before being applied. Don't be discouraged if your patch is |
| not accepted immediately -- it doesn't mean you goofed, it just means |
| that there are a *lot* of eyes looking at every code submission, and |
| it's a rare patch that doesn't have at least a little room for |
| improvement. After reading people's responses to your patch, make the |
| appropriate changes and resubmit, wait for the next round of feedback, |
| and lather, rinse, repeat, until some committer applies it. |
| |
| If you don't get a response for a while, and don't see the patch |
| applied, it may just mean that people are really busy. Go ahead and |
| repost, and don't hesitate to point out that you're still waiting for |
| a response. One way to think of it is that patch management is highly |
| parallizable, and we need you to shoulder your share of the management |
| as well as the coding. Every patch needs someone to shepherd it |
| through the process, and the person best qualified to do that is the |
| original submitter. |
| |
| |
| |
| Filing bugs / issues |
| ==================== |
| |
| This pretty much says it all: |
| |
| From: Karl Fogel <kfogel@collab.net> |
| Subject: Please ask on the list before filing a new issue. |
| To: dev@subversion.tigris.org |
| Date: Tue, 30 Jul 2002 10:51:24 (CDT) |
| |
| Folks, we're getting tons of new issues, which is a Good Thing in |
| general, but some of them don't really belong in the issue tracker. |
| They're things that would be better solved by a quick conversation |
| here on the dev list. Compilation problems, behavior questions, |
| feature ideas that have been discussed before, that sort of thing. |
| |
| *Please* be more conservative about filing issues. The issues |
| database is physically much more cumbersome than email. It wastes |
| people's time to have conversations in the issues database that should |
| be had in email. (This is not a libel against the issue tracker, it's |
| just a result of the fact that the issues database is for permanent |
| storage and flow annotation, not for real-time conversation.) |
| |
| If you encounter a situation where Subversion is clearly behaving |
| wrongly, or behaving opposite to what the documentation says, then |
| it's okay to file the issue right away (after searching to make sure |
| it isn't already filed, of course!). But if you're |
| |
| a) Requesting a new feature, or |
| b) Having build problems, or |
| c) Not sure what the behavior should be, or |
| d) Disagreeing with current intended behavior, or |
| e) Not TOTALLY sure that others would agree this is a bug, or |
| f) For any reason at all not sure this should be filed, |
| |
| ...then please post to the dev list first. You'll get a faster |
| response, and others won't be forced to use the issues database to |
| have the initial real-time conversations. |
| |
| Nothing is lost this way. If we eventually conclude that it should be |
| in the issue tracker, then we can still file it later, after the |
| description and reproduction recipe have been honed on the dev list. |
| |
| Thank you, |
| -Karl |
| |
| |
| |
| Commit access |
| ============= |
| |
| There are two types of commit access: full and partial. Full means |
| anywhere in the tree, partial means only in that committer's specific |
| area(s) of expertise. The COMMITTERS file lists all committers, both |
| full and partial, and says the domains for each partial committer. |
| |
| How full commit access is granted: |
| |
| After someone has successfully contributed a few non-trivial patches, |
| some full committer, usually whoever has reviewed and applied the most |
| patches from that contributor, proposes them for commit access. This |
| proposal is sent only to the other full committers -- the ensuing |
| discussion is private, so that everyone can feel comfortable speaking |
| their minds. Assuming there are no objections, the contributor is |
| granted commit access. The decision is made by consensus; there are |
| no formal rules governing the procedure, though generally if someone |
| strongly objects the access is not offered, or is offered on a |
| provisional basis. |
| |
| The criteria for commit access are that the person's patches adhere to |
| the guidelines in this file, adhere to all the usual unquantifiable |
| rules of coding (code should be readable, robust, maintainable, etc), |
| and that the person respects the "Hippocratic Principle": first, do no |
| harm. In other words, what is significant is not the size or quantity |
| of patches submitted, but the degree of care shown in avoiding bugs |
| and minimizing unnecessary impact on the rest of the code. Many full |
| committers are people who have not made major code contributions, but |
| rather lots of small, clean fixes, each of which was an unambiguous |
| improvement to the code. |
| |
| How partial commit access is granted: |
| |
| A full committer sponsors the partial committer. Usually this means |
| the full committer has applied several patches to the same area from |
| the proposed partial committer, and realizes things would be easier if |
| the person were just committing directly. Approval is not required |
| from the full committers; it is assumed that sponsors know what |
| they're doing and will watch the partial committer's first few commits |
| to make sure everything's going smoothly. |
| |
| |
| |
| The configuration/build system under unix |
| ========================================= |
| |
| Greg Stein wrote a custom build system for Subversion, which had been |
| using `automake' and recursive Makefiles. Now it uses a single, |
| top-level Makefile, generated from Makefile.in (which is kept under |
| revision control). `Makefile.in' in turn includes `build-outputs.mk', |
| which is automatically generated from `build.conf' by the |
| `gen-make.py' script. Thus, the latter two are under revision |
| control, but `build-outputs.mk' is not. |
| |
| Here is Greg's original mail describing the system, followed by some |
| advice about hacking it: |
| |
| From: Greg Stein <gstein@lyra.org> |
| Subject: new build system (was: Re: CVS update: MODIFIED: ac-helpers ...) |
| To: dev@subversion.tigris.org |
| Date: Thu, 24 May 2001 07:20:55 -0700 |
| Message-ID: <20010524072055.F5402@lyra.org> |
| |
| On Thu, May 24, 2001 at 01:40:17PM -0000, gstein@tigris.org wrote: |
| > User: gstein |
| > Date: 01/05/24 06:40:17 |
| > |
| > Modified: ac-helpers .cvsignore svn-apache.m4 |
| > Added: . Makefile.in |
| > Log: |
| > Switch over to the new non-recursive build system. |
| >... |
| |
| Okay... this is it. We're now on the build system. |
| |
| "It works on my machine." |
| |
| I suspect there may be some tweaks to make on different OSs. I'd be |
| interested to hear if Ben can really build with normal BSD make. It |
| should be possible. |
| |
| The code supports building, installation, checking, and |
| dependencies. It does *NOT* yet deal with the doc/ subdirectory. That |
| is next; I figured this could be rolled out and get the kinks worked |
| out while I do the doc/ stuff. Oh, it doesn't build Neon or APR yet |
| either. I also saw a problem where libsvn_fs wasn't getting built |
| before linking one of the test proggies (see below). |
| |
| Basic operation: same as before. |
| |
| $ ./autogen.sh |
| $ ./configure OPTIONS |
| $ make |
| $ make check |
| $ make install |
| |
| There are some "make check" scripts that need to be fixed up. That'll |
| happen RSN. Some of them create their own log, rather than spewing to |
| stdout (where the top-level make will place the output into |
| [TOP]/tests.log). |
| |
| The old Makefile.am files are still around, but I'll be tossing those |
| along with a bunch of tweaks to all the .cvsignore files. There are a |
| few other cleanups, too. But that can happen as a step two. |
| |
| [ $ cvs rm -f `find . -name Makefile.rm` |
| |
| See the mistake in that line? I didn't when I typed it. The find |
| returned nothing, so cvs rm -f proceeded to delete my entire |
| tree. And the -f made sure to delete all my source files, too. Good |
| fugging thing that I had my mods in some Emacs buffers, or I'd be |
| bitching. |
| |
| I am *so* glad that Ben coded SVN to *not* delete locally modified |
| files *and* that we have an "undel" command. I had to go and tweak a |
| bazillion Entries files to undo the delete... |
| ] |
| |
| The top-level make has a number of shortcuts in it (well, actually in |
| build-outputs.mk): |
| |
| $ make subversion/libsvn_fs/libsvn_fs.la |
| |
| or |
| |
| $ make libsvn_fs |
| |
| The two are the same. So... when your test proggie fails to link |
| because libsvn_fs isn't around, just run "make libsvn_fs" to build it |
| immediately, then go back to the regular "make". |
| |
| Note that the system still conditionally builds the FS stuff based |
| on whether DB (See 'Building on Unix' below) is available, and |
| mod_dav_svn if Apache is available. |
| |
| Handy hint: if you don't like dependencies, then you can do: |
| |
| $ ./autogen.sh -s |
| |
| That will skip the dependency generation that goes into |
| build-outputs.mk. It makes the script run quite a bit faster (48 secs |
| vs 2 secs on my poor little Pentium 120). |
| |
| Note that if you change build.conf, you can simply run: |
| |
| $ ./gen-make.py build.conf |
| |
| to regen build-outputs.mk. You don't have to go back through the whole |
| autogen.sh / configure process. |
| |
| You should also note that autogen.sh and configure run much faster now |
| that we don't have the automake crap. Oh, and our makefiles never |
| re-run configure on you out of the blue (gawd, I hated when automake |
| did that to me). |
| |
| Obviously, there are going to be some tweaky things going on. I also |
| think that the "shadow" builds or whatever they're called (different |
| source and build dirs) are totally broken. Something tweaky will have |
| to happen there. But, thankfully, we only have one Makefile to deal |
| with. |
| |
| Note that I arrange things so that we have one generated file |
| (build-outputs.mk), and one autoconf-generated file (Makefile from |
| .in). I also tried to shove as much logic/rules into |
| Makefile.in. Keeping build-outputs.mk devoid of rules (thus, implying |
| gen-make.py devoid of rules in its output generation) manes that |
| tweaking rules in Makefile.in is much more approachable to people. |
| |
| I think that is about it. Send problems to the dev@ list and/or feel |
| free to dig in and fix them yourself. My next steps are mostly |
| cleanup. After that, I'm going to toss out our use of libtool and rely |
| on APR's libtool setup (no need for us to replicate what APR already |
| did). |
| |
| Cheers, |
| -g |
| |
| -- |
| Greg Stein, http://www.lyra.org/ |
| |
| |
| And here is some advice for those changing or testing the |
| configuration/build system: |
| |
| From: Karl Fogel <kfogel@collab.net> |
| To: dev@subversion.tigris.org |
| Subject: when changing build/config stuff, always do this first |
| Date: Wed 28 Nov 2001 |
| |
| Yo everyone: if you change part of the configuration/build system, |
| please make sure to clean out any old installed Subversion libs |
| *before* you try building with your changes. If you don't do this, |
| your changes may appear to work fine, when in fact they would fail if |
| run on a truly pristine system. |
| |
| This script demonstrates what I mean by "clean out". This is |
| `/usr/local/cleanup.sh' on my system. It cleans out the Subversion |
| libs (and the installed httpd-2.0 libs, since I'm often reinstalling |
| that too): |
| |
| #!/bin/sh |
| |
| # Take care of libs |
| cd /usr/local/lib |
| rm -f APRVARS |
| rm -f libapr* |
| rm -f libexpat* |
| rm -f libneon* |
| rm -f libsvn* |
| |
| # Take care of headers |
| cd /usr/local/include |
| rm -f apr* |
| rm -f svn* |
| rm -f neon/* |
| |
| # Take care of headers |
| cd /usr/local/apache2/lib |
| rm -f * |
| |
| When someone reports a configuration bug and you're trying to |
| reproduce it, run this first. :-) |
| |
| The voice of experience, |
| -Karl |
| |
| |
| |
| How to release a distribution tarball |
| ==================================== |
| |
| See notes/releases.txt. |
| |
| |
| |
| Release numbering |
| ================= |
| |
| Please see http://apr.apache.org/versioning.html -- Subversion uses |
| the same guidelines. |
| |
| |
| |
| Compatibility |
| ============= |
| |
| Before 1.0, we require only that each interim release be compatible |
| with the one before it. For example, a 0.20 server must work with |
| both 0.19 and 0.21 clients (and, obviously, a 0.20 client as well), |
| but may be incompatible with a 0.18 client (though gratuitous |
| incompatibility should be avoided whenever possible, of course). |
| Patch levels don't count: 0.20.1, 0.20.2, etc, must be compatible with |
| 0.19.*. |
| |
| The purpose of this policy is to help early adopters make good |
| decisions about when to upgrade, without impeding development too |
| much. After 1.0 is released, we will probably adopt a more |
| conservative policy. |
| |
| Changes to the ra_lib, editor, or reporter API may result in |
| incompatibilities in ra_dav or ra_svn. Observe these principles when |
| updating the ra_svn code: |
| |
| (1) Fields can be added to any tuple; old clients will simply ignore |
| them. (Right now, the marshalling implementation does not let |
| you put number or boolean values in the optional part of a tuple, |
| but changing that will not affect the protocol.) |
| |
| We can use this mechanism when information is added to an API |
| call. |
| |
| (2) At connection establishment time, clients and servers exchange a |
| list of capability keywords. |
| |
| We can use this mechanism for more complicated changes, like |
| introducing pipelining or removing information from API calls. |
| |
| (3) New commands can be added; trying to use an unsupported command |
| will result in an error which can be checked and dealt with. |
| |
| (4) The protocol version number can be bumped to allow graceful |
| refusal of old clients or servers, or to allow a client or |
| server to detect when it has to do things the old way. |
| |
| This mechanism is a last resort, to be used when capability |
| keywords would be too hard to manage. |