|  | <html><head> | 
|  | <title>Apache API notes</title> | 
|  | </head> | 
|  | <body> | 
|  | <!--#include virtual="header.html" --> | 
|  | <h1>Apache API notes</h1> | 
|  |  | 
|  | These are some notes on the Apache API and the data structures you | 
|  | have to deal with, etc.  They are not yet nearly complete, but | 
|  | hopefully, they will help you get your bearings.  Keep in mind that | 
|  | the API is still subject to change as we gain experience with it. | 
|  | (See the TODO file for what <em>might</em> be coming).  However, | 
|  | it will be easy to adapt modules to any changes that are made. | 
|  | (We have more modules to adapt than you do). | 
|  | <p> | 
|  |  | 
|  | A few notes on general pedagogical style here.  In the interest of | 
|  | conciseness, all structure declarations here are incomplete --- the | 
|  | real ones have more slots that I'm not telling you about.  For the | 
|  | most part, these are reserved to one component of the server core or | 
|  | another, and should be altered by modules with caution.  However, in | 
|  | some cases, they really are things I just haven't gotten around to | 
|  | yet.  Welcome to the bleeding edge.<p> | 
|  |  | 
|  | Finally, here's an outline, to give you some bare idea of what's | 
|  | coming up, and in what order: | 
|  |  | 
|  | <ul> | 
|  | <li> <a href="#basics">Basic concepts.</a> | 
|  | <menu> | 
|  | <li> <a href="#HMR">Handlers, Modules, and Requests</a> | 
|  | <li> <a href="#moduletour">A brief tour of a module</a> | 
|  | </menu> | 
|  | <li> <a href="#handlers">How handlers work</a> | 
|  | <menu> | 
|  | <li> <a href="#req_tour">A brief tour of the <code>request_rec</code></a> | 
|  | <li> <a href="#req_orig">Where request_rec structures come from</a> | 
|  | <li> <a href="#req_return">Handling requests, declining, and returning error codes</a> | 
|  | <li> <a href="#resp_handlers">Special considerations for response handlers</a> | 
|  | <li> <a href="#auth_handlers">Special considerations for authentication handlers</a> | 
|  | <li> <a href="#log_handlers">Special considerations for logging handlers</a> | 
|  | </menu> | 
|  | <li> <a href="#pools">Resource allocation and resource pools</a> | 
|  | <li> <a href="#config">Configuration, commands and the like</a> | 
|  | <menu> | 
|  | <li> <a href="#per-dir">Per-directory configuration structures</a> | 
|  | <li> <a href="#commands">Command handling</a> | 
|  | <li> <a href="#servconf">Side notes --- per-server configuration, virtual servers, etc.</a> | 
|  | </menu> | 
|  | </ul> | 
|  |  | 
|  | <h2><a name="basics">Basic concepts.</a></h2> | 
|  |  | 
|  | We begin with an overview of the basic concepts behind the | 
|  | API, and how they are manifested in the code. | 
|  |  | 
|  | <h3><a name="HMR">Handlers, Modules, and Requests</a></h3> | 
|  |  | 
|  | Apache breaks down request handling into a series of steps, more or | 
|  | less the same way the Netscape server API does (although this API has | 
|  | a few more stages than NetSite does, as hooks for stuff I thought | 
|  | might be useful in the future).  These are: | 
|  |  | 
|  | <ul> | 
|  | <li> URI -> Filename translation | 
|  | <li> Auth ID checking [is the user who they say they are?] | 
|  | <li> Auth access checking [is the user authorized <em>here</em>?] | 
|  | <li> Access checking other than auth | 
|  | <li> Determining MIME type of the object requested | 
|  | <li> `Fixups' --- there aren't any of these yet, but the phase is | 
|  | intended as a hook for possible extensions like | 
|  | <code>SetEnv</code>, which don't really fit well elsewhere. | 
|  | <li> Actually sending a response back to the client. | 
|  | <li> Logging the request | 
|  | </ul> | 
|  |  | 
|  | These phases are handled by looking at each of a succession of | 
|  | <em>modules</em>, looking to see if each of them has a handler for the | 
|  | phase, and attempting invoking it if so.  The handler can typically do | 
|  | one of three things: | 
|  |  | 
|  | <ul> | 
|  | <li> <em>Handle</em> the request, and indicate that it has done so | 
|  | by returning the magic constant <code>OK</code>. | 
|  | <li> <em>Decline</em> to handle the request, by returning the magic | 
|  | integer constant <code>DECLINED</code>.  In this case, the | 
|  | server behaves in all respects as if the handler simply hadn't | 
|  | been there. | 
|  | <li> Signal an error, by returning one of the HTTP error codes. | 
|  | This terminates normal handling of the request, although an | 
|  | ErrorDocument may be invoked to try to mop up, and it will be | 
|  | logged in any case. | 
|  | </ul> | 
|  |  | 
|  | Most phases are terminated by the first module that handles them; | 
|  | however, for logging, `fixups', and non-access authentication | 
|  | checking, all handlers always run (barring an error).  Also, the | 
|  | response phase is unique in that modules may declare multiple handlers | 
|  | for it, via a dispatch table keyed on the MIME type of the requested | 
|  | object.  Modules may declare a response-phase handler which can handle | 
|  | <em>any</em> request, by giving it the key <code>*/*</code> (i.e., a | 
|  | wildcard MIME type specification).  However, wildcard handlers are | 
|  | only invoked if the server has already tried and failed to find a more | 
|  | specific response handler for the MIME type of the requested object | 
|  | (either none existed, or they all declined).<p> | 
|  |  | 
|  | The handlers themselves are functions of one argument (a | 
|  | <code>request_rec</code> structure. vide infra), which returns an | 
|  | integer, as above.<p> | 
|  |  | 
|  | <h3><a name="moduletour">A brief tour of a module</a></h3> | 
|  |  | 
|  | At this point, we need to explain the structure of a module.  Our | 
|  | candidate will be one of the messier ones, the CGI module --- this | 
|  | handles both CGI scripts and the <code>ScriptAlias</code> config file | 
|  | command.  It's actually a great deal more complicated than most | 
|  | modules, but if we're going to have only one example, it might as well | 
|  | be the one with its fingers in every place.<p> | 
|  |  | 
|  | Let's begin with handlers.  In order to handle the CGI scripts, the | 
|  | module declares a response handler for them. Because of | 
|  | <code>ScriptAlias</code>, it also has handlers for the name | 
|  | translation phase (to recognise <code>ScriptAlias</code>ed URIs), the | 
|  | type-checking phase (any <code>ScriptAlias</code>ed request is typed | 
|  | as a CGI script).<p> | 
|  |  | 
|  | The module needs to maintain some per (virtual) | 
|  | server information, namely, the <code>ScriptAlias</code>es in effect; | 
|  | the module structure therefore contains pointers to a functions which | 
|  | builds these structures, and to another which combines two of them (in | 
|  | case the main server and a virtual server both have | 
|  | <code>ScriptAlias</code>es declared).<p> | 
|  |  | 
|  | Finally, this module contains code to handle the | 
|  | <code>ScriptAlias</code> command itself.  This particular module only | 
|  | declares one command, but there could be more, so modules have | 
|  | <em>command tables</em> which declare their commands, and describe | 
|  | where they are permitted, and how they are to be invoked.  <p> | 
|  |  | 
|  | A final note on the declared types of the arguments of some of these | 
|  | commands: a <code>pool</code> is a pointer to a <em>resource pool</em> | 
|  | structure; these are used by the server to keep track of the memory | 
|  | which has been allocated, files opened, etc., either to service a | 
|  | particular request, or to handle the process of configuring itself. | 
|  | That way, when the request is over (or, for the configuration pool, | 
|  | when the server is restarting), the memory can be freed, and the files | 
|  | closed, <i>en masse</i>, without anyone having to write explicit code to | 
|  | track them all down and dispose of them.  Also, a | 
|  | <code>cmd_parms</code> structure contains various information about | 
|  | the config file being read, and other status information, which is | 
|  | sometimes of use to the function which processes a config-file command | 
|  | (such as <code>ScriptAlias</code>). | 
|  |  | 
|  | With no further ado, the module itself: | 
|  |  | 
|  | <pre> | 
|  | /* Declarations of handlers. */ | 
|  |  | 
|  | int translate_scriptalias (request_rec *); | 
|  | int type_scriptalias (request_rec *); | 
|  | int cgi_handler (request_rec *); | 
|  |  | 
|  | /* Subsidiary dispatch table for response-phase handlers, by MIME type */ | 
|  |  | 
|  | handler_rec cgi_handlers[] = { | 
|  | { "application/x-httpd-cgi", cgi_handler }, | 
|  | { NULL } | 
|  | }; | 
|  |  | 
|  | /* Declarations of routines to manipulate the module's configuration | 
|  | * info.  Note that these are returned, and passed in, as void *'s; | 
|  | * the server core keeps track of them, but it doesn't, and can't, | 
|  | * know their internal structure. | 
|  | */ | 
|  |  | 
|  | void *make_cgi_server_config (pool *); | 
|  | void *merge_cgi_server_config (pool *, void *, void *); | 
|  |  | 
|  | /* Declarations of routines to handle config-file commands */ | 
|  |  | 
|  | extern char *script_alias(cmd_parms *, void *per_dir_config, char *fake, | 
|  | char *real); | 
|  |  | 
|  | command_rec cgi_cmds[] = { | 
|  | { "ScriptAlias", script_alias, NULL, RSRC_CONF, TAKE2, | 
|  | "a fakename and a realname"}, | 
|  | { NULL } | 
|  | }; | 
|  |  | 
|  | module cgi_module = { | 
|  | STANDARD_MODULE_STUFF, | 
|  | NULL,                     /* initializer */ | 
|  | NULL,                     /* dir config creator */ | 
|  | NULL,                     /* dir merger --- default is to override */ | 
|  | make_cgi_server_config,   /* server config */ | 
|  | merge_cgi_server_config,  /* merge server config */ | 
|  | cgi_cmds,                 /* command table */ | 
|  | cgi_handlers,             /* handlers */ | 
|  | translate_scriptalias,    /* filename translation */ | 
|  | NULL,                     /* check_user_id */ | 
|  | NULL,                     /* check auth */ | 
|  | NULL,                     /* check access */ | 
|  | type_scriptalias,         /* type_checker */ | 
|  | NULL,                     /* fixups */ | 
|  | NULL                      /* logger */ | 
|  | }; | 
|  | </pre> | 
|  |  | 
|  | <h2><a name="handlers">How handlers work</a></h2> | 
|  |  | 
|  | The sole argument to handlers is a <code>request_rec</code> structure. | 
|  | This structure describes a particular request which has been made to | 
|  | the server, on behalf of a client.  In most cases, each connection to | 
|  | the client generates only one <code>request_rec</code> structure.<p> | 
|  |  | 
|  | <h3><a name="req_tour">A brief tour of the <code>request_rec</code></a></h3> | 
|  |  | 
|  | The <code>request_rec</code> contains pointers to a resource pool | 
|  | which will be cleared when the server is finished handling the | 
|  | request; to structures containing per-server and per-connection | 
|  | information, and most importantly, information on the request itself.<p> | 
|  |  | 
|  | The most important such information is a small set of character | 
|  | strings describing attributes of the object being requested, including | 
|  | its URI, filename, content-type and content-encoding (these being filled | 
|  | in by the translation and type-check handlers which handle the | 
|  | request, respectively). <p> | 
|  |  | 
|  | Other commonly used data items are tables giving the MIME headers on | 
|  | the client's original request, MIME headers to be sent back with the | 
|  | response (which modules can add to at will), and environment variables | 
|  | for any subprocesses which are spawned off in the course of servicing | 
|  | the request.  These tables are manipulated using the | 
|  | <code>table_get</code> and <code>table_set</code> routines. <p> | 
|  |  | 
|  | Finally, there are pointers to two data structures which, in turn, | 
|  | point to per-module configuration structures.  Specifically, these | 
|  | hold pointers to the data structures which the module has built to | 
|  | describe the way it has been configured to operate in a given | 
|  | directory (via <code>.htaccess</code> files or | 
|  | <code><Directory></code> sections), for private data it has | 
|  | built in the course of servicing the request (so modules' handlers for | 
|  | one phase can pass `notes' to their handlers for other phases).  There | 
|  | is another such configuration vector in the <code>server_rec</code> | 
|  | data structure pointed to by the <code>request_rec</code>, which | 
|  | contains per (virtual) server configuration data.<p> | 
|  |  | 
|  | Here is an abridged declaration, giving the fields most commonly used:<p> | 
|  |  | 
|  | <pre> | 
|  | struct request_rec { | 
|  |  | 
|  | pool *pool; | 
|  | conn_rec *connection; | 
|  | server_rec *server; | 
|  |  | 
|  | /* What object is being requested */ | 
|  |  | 
|  | char *uri; | 
|  | char *filename; | 
|  | char *path_info; | 
|  | char *args;           /* QUERY_ARGS, if any */ | 
|  | struct stat finfo;    /* Set by server core; | 
|  | * st_mode set to zero if no such file */ | 
|  |  | 
|  | char *content_type; | 
|  | char *content_encoding; | 
|  |  | 
|  | /* MIME header environments, in and out.  Also, an array containing | 
|  | * environment variables to be passed to subprocesses, so people can | 
|  | * write modules to add to that environment. | 
|  | * | 
|  | * The difference between headers_out and err_headers_out is that | 
|  | * the latter are printed even on error, and persist across internal | 
|  | * redirects (so the headers printed for ErrorDocument handlers will | 
|  | * have them). | 
|  | */ | 
|  |  | 
|  | table *headers_in; | 
|  | table *headers_out; | 
|  | table *err_headers_out; | 
|  | table *subprocess_env; | 
|  |  | 
|  | /* Info about the request itself... */ | 
|  |  | 
|  | int header_only;     /* HEAD request, as opposed to GET */ | 
|  | char *protocol;      /* Protocol, as given to us, or HTTP/0.9 */ | 
|  | char *method;        /* GET, HEAD, POST, etc. */ | 
|  | int method_number;   /* M_GET, M_POST, etc. */ | 
|  |  | 
|  | /* Info for logging */ | 
|  |  | 
|  | char *the_request; | 
|  | int bytes_sent; | 
|  |  | 
|  | /* A flag which modules can set, to indicate that the data being | 
|  | * returned is volatile, and clients should be told not to cache it. | 
|  | */ | 
|  |  | 
|  | int no_cache; | 
|  |  | 
|  | /* Various other config info which may change with .htaccess files | 
|  | * These are config vectors, with one void* pointer for each module | 
|  | * (the thing pointed to being the module's business). | 
|  | */ | 
|  |  | 
|  | void *per_dir_config;   /* Options set in config files, etc. */ | 
|  | void *request_config;   /* Notes on *this* request */ | 
|  |  | 
|  | }; | 
|  |  | 
|  | </pre> | 
|  |  | 
|  | <h3><a name="req_orig">Where request_rec structures come from</a></h3> | 
|  |  | 
|  | Most <code>request_rec</code> structures are built by reading an HTTP | 
|  | request from a client, and filling in the fields.  However, there are | 
|  | a few exceptions: | 
|  |  | 
|  | <ul> | 
|  | <li> If the request is to an imagemap, a type map (i.e., a | 
|  | <code>*.var</code> file), or a CGI script which returned a | 
|  | local `Location:', then the resource which the user requested | 
|  | is going to be ultimately located by some URI other than what | 
|  | the client originally supplied.  In this case, the server does | 
|  | an <em>internal redirect</em>, constructing a new | 
|  | <code>request_rec</code> for the new URI, and processing it | 
|  | almost exactly as if the client had requested the new URI | 
|  | directly. <p> | 
|  |  | 
|  | <li> If some handler signaled an error, and an | 
|  | <code>ErrorDocument</code> is in scope, the same internal | 
|  | redirect machinery comes into play.<p> | 
|  |  | 
|  | <li> Finally, a handler occasionally needs to investigate `what | 
|  | would happen if' some other request were run.  For instance, | 
|  | the directory indexing module needs to know what MIME type | 
|  | would be assigned to a request for each directory entry, in | 
|  | order to figure out what icon to use.<p> | 
|  |  | 
|  | Such handlers can construct a <em>sub-request</em>, using the | 
|  | functions <code>sub_req_lookup_file</code> and | 
|  | <code>sub_req_lookup_uri</code>; this constructs a new | 
|  | <code>request_rec</code> structure and processes it as you | 
|  | would expect, up to but not including the point of actually | 
|  | sending a response.  (These functions skip over the access | 
|  | checks if the sub-request is for a file in the same directory | 
|  | as the original request).<p> | 
|  |  | 
|  | (Server-side includes work by building sub-requests and then | 
|  | actually invoking the response handler for them, via the | 
|  | function <code>run_sub_request</code>). | 
|  | </ul> | 
|  |  | 
|  | <h3><a name="req_return">Handling requests, declining, and returning error codes</a></h3> | 
|  |  | 
|  | As discussed above, each handler, when invoked to handle a particular | 
|  | <code>request_rec</code>, has to return an <code>int</code> to | 
|  | indicate what happened.  That can either be | 
|  |  | 
|  | <ul> | 
|  | <li> OK --- the request was handled successfully.  This may or may | 
|  | not terminate the phase. | 
|  | <li> DECLINED --- no erroneous condition exists, but the module | 
|  | declines to handle the phase; the server tries to find another. | 
|  | <li> an HTTP error code, which aborts handling of the request. | 
|  | </ul> | 
|  |  | 
|  | Note that if the error code returned is <code>REDIRECT</code>, then | 
|  | the module should put a <code>Location</code> in the request's | 
|  | <code>headers_out</code>, to indicate where the client should be | 
|  | redirected <em>to</em>. <p> | 
|  |  | 
|  | <h3><a name="resp_handlers">Special considerations for response handlers</a></h3> | 
|  |  | 
|  | Handlers for most phases do their work by simply setting a few fields | 
|  | in the <code>request_rec</code> structure (or, in the case of access | 
|  | checkers, simply by returning the correct error code).  However, | 
|  | response handlers have to actually send a request back to the client. <p> | 
|  |  | 
|  | They should begin by sending an HTTP response header, using the | 
|  | function <code>send_http_header</code>.  (You don't have to do | 
|  | anything special to skip sending the header for HTTP/0.9 requests; the | 
|  | function figures out on its own that it shouldn't do anything).  If | 
|  | the request is marked <code>header_only</code>, that's all they should | 
|  | do; they should return after that, without attempting any further | 
|  | output.  <p> | 
|  |  | 
|  | Otherwise, they should produce a request body which responds to the | 
|  | client as appropriate.  The primitives for this are <code>rputc</code> | 
|  | and <code>rprintf</code>, for internally generated output, and | 
|  | <code>send_fd</code>, to copy the contents of some <code>FILE *</code> | 
|  | straight to the client.  <p> | 
|  |  | 
|  | At this point, you should more or less understand the following piece | 
|  | of code, which is the handler which handles <code>GET</code> requests | 
|  | which have no more specific handler; it also shows how conditional | 
|  | <code>GET</code>s can be handled, if it's desirable to do so in a | 
|  | particular response handler --- <code>set_last_modified</code> checks | 
|  | against the <code>If-modified-since</code> value supplied by the | 
|  | client, if any, and returns an appropriate code (which will, if | 
|  | nonzero, be USE_LOCAL_COPY).   No similar considerations apply for | 
|  | <code>set_content_length</code>, but it returns an error code for | 
|  | symmetry.<p> | 
|  |  | 
|  | <pre> | 
|  | int default_handler (request_rec *r) | 
|  | { | 
|  | int errstatus; | 
|  | FILE *f; | 
|  |  | 
|  | if (r->method_number != M_GET) return DECLINED; | 
|  | if (r->finfo.st_mode == 0) return NOT_FOUND; | 
|  |  | 
|  | if ((errstatus = set_content_length (r, r->finfo.st_size)) | 
|  | || (errstatus = set_last_modified (r, r->finfo.st_mtime))) | 
|  | return errstatus; | 
|  |  | 
|  | f = fopen (r->filename, "r"); | 
|  |  | 
|  | if (f == NULL) { | 
|  | log_reason("file permissions deny server access", | 
|  | r->filename, r); | 
|  | return FORBIDDEN; | 
|  | } | 
|  |  | 
|  | register_timeout ("send", r); | 
|  | send_http_header (r); | 
|  |  | 
|  | if (!r->header_only) send_fd (f, r); | 
|  | pfclose (r->pool, f); | 
|  | return OK; | 
|  | } | 
|  | </pre> | 
|  |  | 
|  | Finally, if all of this is too much of a challenge, there are a few | 
|  | ways out of it.  First off, as shown above, a response handler which | 
|  | has not yet produced any output can simply return an error code, in | 
|  | which case the server will automatically produce an error response. | 
|  | Secondly, it can punt to some other handler by invoking | 
|  | <code>internal_redirect</code>, which is how the internal redirection | 
|  | machinery discussed above is invoked.  A response handler which has | 
|  | internally redirected should always return <code>OK</code>. <p> | 
|  |  | 
|  | (Invoking <code>internal_redirect</code> from handlers which are | 
|  | <em>not</em> response handlers will lead to serious confusion). | 
|  |  | 
|  | <h3><a name="auth_handlers">Special considerations for authentication handlers</a></h3> | 
|  |  | 
|  | Stuff that should be discussed here in detail: | 
|  |  | 
|  | <ul> | 
|  | <li> Authentication-phase handlers not invoked unless auth is | 
|  | configured for the directory. | 
|  | <li> Common auth configuration stored in the core per-dir | 
|  | configuration; it has accessors <code>auth_type</code>, | 
|  | <code>auth_name</code>, and <code>requires</code>. | 
|  | <li> Common routines, to handle the protocol end of things, at least | 
|  | for HTTP basic authentication (<code>get_basic_auth_pw</code>, | 
|  | which sets the <code>connection->user</code> structure field | 
|  | automatically, and <code>note_basic_auth_failure</code>, which | 
|  | arranges for the proper <code>WWW-Authenticate:</code> header | 
|  | to be sent back). | 
|  | </ul> | 
|  |  | 
|  | <h3><a name="log_handlers">Special considerations for logging handlers</a></h3> | 
|  |  | 
|  | When a request has internally redirected, there is the question of | 
|  | what to log.  Apache handles this by bundling the entire chain of | 
|  | redirects into a list of <code>request_rec</code> structures which are | 
|  | threaded through the <code>r->prev</code> and <code>r->next</code> | 
|  | pointers.  The <code>request_rec</code> which is passed to the logging | 
|  | handlers in such cases is the one which was originally built for the | 
|  | initial request from the client; note that the bytes_sent field will | 
|  | only be correct in the last request in the chain (the one for which a | 
|  | response was actually sent). | 
|  |  | 
|  | <h2><a name="pools">Resource allocation and resource pools</a></h2> | 
|  |  | 
|  | One of the problems of writing and designing a server-pool server is | 
|  | that of preventing leakage, that is, allocating resources (memory, | 
|  | open files, etc.), without subsequently releasing them.  The resource | 
|  | pool machinery is designed to make it easy to prevent this from | 
|  | happening, by allowing resource to be allocated in such a way that | 
|  | they are <em>automatically</em> released when the server is done with | 
|  | them. <p> | 
|  |  | 
|  | The way this works is as follows:  the memory which is allocated, file | 
|  | opened, etc., to deal with a particular request are tied to a | 
|  | <em>resource pool</em> which is allocated for the request.  The pool | 
|  | is a data structure which itself tracks the resources in question. <p> | 
|  |  | 
|  | When the request has been processed, the pool is <em>cleared</em>.  At | 
|  | that point, all the memory associated with it is released for reuse, | 
|  | all files associated with it are closed, and any other clean-up | 
|  | functions which are associated with the pool are run.  When this is | 
|  | over, we can be confident that all the resource tied to the pool have | 
|  | been released, and that none of them have leaked. <p> | 
|  |  | 
|  | Server restarts, and allocation of memory and resources for per-server | 
|  | configuration, are handled in a similar way.  There is a | 
|  | <em>configuration pool</em>, which keeps track of resources which were | 
|  | allocated while reading the server configuration files, and handling | 
|  | the commands therein (for instance, the memory that was allocated for | 
|  | per-server module configuration, log files and other files that were | 
|  | opened, and so forth).  When the server restarts, and has to reread | 
|  | the configuration files, the configuration pool is cleared, and so the | 
|  | memory and file descriptors which were taken up by reading them the | 
|  | last time are made available for reuse. <p> | 
|  |  | 
|  | It should be noted that use of the pool machinery isn't generally | 
|  | obligatory, except for situations like logging handlers, where you | 
|  | really need to register cleanups to make sure that the log file gets | 
|  | closed when the server restarts (this is most easily done by using the | 
|  | function <code><a href="#pool-files">pfopen</a></code>, which also | 
|  | arranges for the underlying file descriptor to be closed before any | 
|  | child processes, such as for CGI scripts, are <code>exec</code>ed), or | 
|  | in case you are using the timeout machinery (which isn't yet even | 
|  | documented here).  However, there are two benefits to using it: | 
|  | resources allocated to a pool never leak (even if you allocate a | 
|  | scratch string, and just forget about it); also, for memory | 
|  | allocation, <code>palloc</code> is generally faster than | 
|  | <code>malloc</code>.<p> | 
|  |  | 
|  | We begin here by describing how memory is allocated to pools, and then | 
|  | discuss how other resources are tracked by the resource pool | 
|  | machinery. | 
|  |  | 
|  | <h3>Allocation of memory in pools</h3> | 
|  |  | 
|  | Memory is allocated to pools by calling the function | 
|  | <code>palloc</code>, which takes two arguments, one being a pointer to | 
|  | a resource pool structure, and the other being the amount of memory to | 
|  | allocate (in <code>char</code>s).  Within handlers for handling | 
|  | requests, the most common way of getting a resource pool structure is | 
|  | by looking at the <code>pool</code> slot of the relevant | 
|  | <code>request_rec</code>; hence the repeated appearance of the | 
|  | following idiom in module code: | 
|  |  | 
|  | <pre> | 
|  | int my_handler(request_rec *r) | 
|  | { | 
|  | struct my_structure *foo; | 
|  | ... | 
|  |  | 
|  | foo = (foo *)palloc (r->pool, sizeof(my_structure)); | 
|  | } | 
|  | </pre> | 
|  |  | 
|  | Note that <em>there is no <code>pfree</code></em> --- | 
|  | <code>palloc</code>ed memory is freed only when the associated | 
|  | resource pool is cleared.  This means that <code>palloc</code> does not | 
|  | have to do as much accounting as <code>malloc()</code>; all it does in | 
|  | the typical case is to round up the size, bump a pointer, and do a | 
|  | range check.<p> | 
|  |  | 
|  | (It also raises the possibility that heavy use of <code>palloc</code> | 
|  | could cause a server process to grow excessively large.  There are | 
|  | two ways to deal with this, which are dealt with below; briefly, you | 
|  | can use <code>malloc</code>, and try to be sure that all of the memory | 
|  | gets explicitly <code>free</code>d, or you can allocate a sub-pool of | 
|  | the main pool, allocate your memory in the sub-pool, and clear it out | 
|  | periodically.  The latter technique is discussed in the section on | 
|  | sub-pools below, and is used in the directory-indexing code, in order | 
|  | to avoid excessive storage allocation when listing directories with | 
|  | thousands of files). | 
|  |  | 
|  | <h3>Allocating initialized memory</h3> | 
|  |  | 
|  | There are functions which allocate initialized memory, and are | 
|  | frequently useful.  The function <code>pcalloc</code> has the same | 
|  | interface as <code>palloc</code>, but clears out the memory it | 
|  | allocates before it returns it.  The function <code>pstrdup</code> | 
|  | takes a resource pool and a <code>char *</code> as arguments, and | 
|  | allocates memory for a copy of the string the pointer points to, | 
|  | returning a pointer to the copy.  Finally <code>pstrcat</code> is a | 
|  | varargs-style function, which takes a pointer to a resource pool, and | 
|  | at least two <code>char *</code> arguments, the last of which must be | 
|  | <code>NULL</code>.  It allocates enough memory to fit copies of each | 
|  | of the strings, as a unit; for instance: | 
|  |  | 
|  | <pre> | 
|  | pstrcat (r->pool, "foo", "/", "bar", NULL); | 
|  | </pre> | 
|  |  | 
|  | returns a pointer to 8 bytes worth of memory, initialized to | 
|  | <code>"foo/bar"</code>. | 
|  |  | 
|  | <h3>Tracking open files, etc.</h3> | 
|  |  | 
|  | As indicated above, resource pools are also used to track other sorts | 
|  | of resources besides memory.  The most common are open files.  The | 
|  | routine which is typically used for this is <code>pfopen</code>, which | 
|  | takes a resource pool and two strings as arguments; the strings are | 
|  | the same as the typical arguments to <code>fopen</code>, e.g., | 
|  |  | 
|  | <pre> | 
|  | ... | 
|  | FILE *f = pfopen (r->pool, r->filename, "r"); | 
|  |  | 
|  | if (f == NULL) { ... } else { ... } | 
|  | </pre> | 
|  |  | 
|  | There is also a <code>popenf</code> routine, which parallels the | 
|  | lower-level <code>open</code> system call.  Both of these routines | 
|  | arrange for the file to be closed when the resource pool in question | 
|  | is cleared.  <p> | 
|  |  | 
|  | Unlike the case for memory, there <em>are</em> functions to close | 
|  | files allocated with <code>pfopen</code>, and <code>popenf</code>, | 
|  | namely <code>pfclose</code> and <code>pclosef</code>.  (This is | 
|  | because, on many systems, the number of files which a single process | 
|  | can have open is quite limited).  It is important to use these | 
|  | functions to close files allocated with <code>pfopen</code> and | 
|  | <code>popenf</code>, since to do otherwise could cause fatal errors on | 
|  | systems such as Linux, which react badly if the same | 
|  | <code>FILE*</code> is closed more than once. <p> | 
|  |  | 
|  | (Using the <code>close</code> functions is not mandatory, since the | 
|  | file will eventually be closed regardless, but you should consider it | 
|  | in cases where your module is opening, or could open, a lot of files). | 
|  |  | 
|  | <h3>Other sorts of resources --- cleanup functions</h3> | 
|  |  | 
|  | More text goes here.  Describe the the cleanup primitives in terms of | 
|  | which the file stuff is implemented; also, <code>spawn_process</code>. | 
|  |  | 
|  | <h3>Fine control --- creating and dealing with sub-pools, with a note | 
|  | on sub-requests</h3> | 
|  |  | 
|  | On rare occasions, too-free use of <code>palloc()</code> and the | 
|  | associated primitives may result in undesirably profligate resource | 
|  | allocation.  You can deal with such a case by creating a | 
|  | <em>sub-pool</em>, allocating within the sub-pool rather than the main | 
|  | pool, and clearing or destroying the sub-pool, which releases the | 
|  | resources which were associated with it.  (This really <em>is</em> a | 
|  | rare situation; the only case in which it comes up in the standard | 
|  | module set is in case of listing directories, and then only with | 
|  | <em>very</em> large directories.  Unnecessary use of the primitives | 
|  | discussed here can hair up your code quite a bit, with very little | 
|  | gain). <p> | 
|  |  | 
|  | The primitive for creating a sub-pool is <code>make_sub_pool</code>, | 
|  | which takes another pool (the parent pool) as an argument.  When the | 
|  | main pool is cleared, the sub-pool will be destroyed.  The sub-pool | 
|  | may also be cleared or destroyed at any time, by calling the functions | 
|  | <code>clear_pool</code> and <code>destroy_pool</code>, respectively. | 
|  | (The difference is that <code>clear_pool</code> frees resources | 
|  | associated with the pool, while <code>destroy_pool</code> also | 
|  | deallocates the pool itself.  In the former case, you can allocate new | 
|  | resources within the pool, and clear it again, and so forth; in the | 
|  | latter case, it is simply gone). <p> | 
|  |  | 
|  | One final note --- sub-requests have their own resource pools, which | 
|  | are sub-pools of the resource pool for the main request.  The polite | 
|  | way to reclaim the resources associated with a sub request which you | 
|  | have allocated (using the <code>sub_req_lookup_...</code> functions) | 
|  | is <code>destroy_sub_request</code>, which frees the resource pool. | 
|  | Before calling this function, be sure to copy anything that you care | 
|  | about which might be allocated in the sub-request's resource pool into | 
|  | someplace a little less volatile (for instance, the filename in its | 
|  | <code>request_rec</code> structure). <p> | 
|  |  | 
|  | (Again, under most circumstances, you shouldn't feel obliged to call | 
|  | this function; only 2K of memory or so are allocated for a typical sub | 
|  | request, and it will be freed anyway when the main request pool is | 
|  | cleared.  It is only when you are allocating many, many sub-requests | 
|  | for a single main request that you should seriously consider the | 
|  | <code>destroy...</code> functions). | 
|  |  | 
|  | <h2><a name="config">Configuration, commands and the like</a></h2> | 
|  |  | 
|  | One of the design goals for this server was to maintain external | 
|  | compatibility with the NCSA 1.3 server --- that is, to read the same | 
|  | configuration files, to process all the directives therein correctly, | 
|  | and in general to be a drop-in replacement for NCSA.  On the other | 
|  | hand, another design goal was to move as much of the server's | 
|  | functionality into modules which have as little as possible to do with | 
|  | the monolithic server core.  The only way to reconcile these goals is | 
|  | to move the handling of most commands from the central server into the | 
|  | modules.  <p> | 
|  |  | 
|  | However, just giving the modules command tables is not enough to | 
|  | divorce them completely from the server core.  The server has to | 
|  | remember the commands in order to act on them later.  That involves | 
|  | maintaining data which is private to the modules, and which can be | 
|  | either per-server, or per-directory.  Most things are per-directory, | 
|  | including in particular access control and authorization information, | 
|  | but also information on how to determine file types from suffixes, | 
|  | which can be modified by <code>AddType</code> and | 
|  | <code>DefaultType</code> directives, and so forth.  In general, the | 
|  | governing philosophy is that anything which <em>can</em> be made | 
|  | configurable by directory should be; per-server information is | 
|  | generally used in the standard set of modules for information like | 
|  | <code>Alias</code>es and <code>Redirect</code>s which come into play | 
|  | before the request is tied to a particular place in the underlying | 
|  | file system. <p> | 
|  |  | 
|  | Another requirement for emulating the NCSA server is being able to | 
|  | handle the per-directory configuration files, generally called | 
|  | <code>.htaccess</code> files, though even in the NCSA server they can | 
|  | contain directives which have nothing at all to do with access | 
|  | control.  Accordingly, after URI -> filename translation, but before | 
|  | performing any other phase, the server walks down the directory | 
|  | hierarchy of the underlying filesystem, following the translated | 
|  | pathname, to read any <code>.htaccess</code> files which might be | 
|  | present.  The information which is read in then has to be | 
|  | <em>merged</em> with the applicable information from the server's own | 
|  | config files (either from the <code><Directory></code> sections | 
|  | in <code>access.conf</code>, or from defaults in | 
|  | <code>srm.conf</code>, which actually behaves for most purposes almost | 
|  | exactly like <code><Directory /></code>).<p> | 
|  |  | 
|  | Finally, after having served a request which involved reading | 
|  | <code>.htaccess</code> files, we need to discard the storage allocated | 
|  | for handling them.  That is solved the same way it is solved wherever | 
|  | else similar problems come up, by tying those structures to the | 
|  | per-transaction resource pool.  <p> | 
|  |  | 
|  | <h3><a name="per-dir">Per-directory configuration structures</a></h3> | 
|  |  | 
|  | Let's look out how all of this plays out in <code>mod_mime.c</code>, | 
|  | which defines the file typing handler which emulates the NCSA server's | 
|  | behavior of determining file types from suffixes.  What we'll be | 
|  | looking at, here, is the code which implements the | 
|  | <code>AddType</code> and <code>AddEncoding</code> commands.  These | 
|  | commands can appear in <code>.htaccess</code> files, so they must be | 
|  | handled in the module's private per-directory data, which in fact, | 
|  | consists of two separate <code>table</code>s for MIME types and | 
|  | encoding information, and is declared as follows: | 
|  |  | 
|  | <pre> | 
|  | typedef struct { | 
|  | table *forced_types;      /* Additional AddTyped stuff */ | 
|  | table *encoding_types;    /* Added with AddEncoding... */ | 
|  | } mime_dir_config; | 
|  | </pre> | 
|  |  | 
|  | When the server is reading a configuration file, or | 
|  | <code><Directory></code> section, which includes one of the MIME | 
|  | module's commands, it needs to create a <code>mime_dir_config</code> | 
|  | structure, so those commands have something to act on.  It does this | 
|  | by invoking the function it finds in the module's `create per-dir | 
|  | config slot', with two arguments: the name of the directory to which | 
|  | this configuration information applies (or <code>NULL</code> for | 
|  | <code>srm.conf</code>), and a pointer to a resource pool in which the | 
|  | allocation should happen. <p> | 
|  |  | 
|  | (If we are reading a <code>.htaccess</code> file, that resource pool | 
|  | is the per-request resource pool for the request; otherwise it is a | 
|  | resource pool which is used for configuration data, and cleared on | 
|  | restarts.  Either way, it is important for the structure being created | 
|  | to vanish when the pool is cleared, by registering a cleanup on the | 
|  | pool if necessary). <p> | 
|  |  | 
|  | For the MIME module, the per-dir config creation function just | 
|  | <code>palloc</code>s the structure above, and a creates a couple of | 
|  | <code>table</code>s to fill it.  That looks like this: | 
|  |  | 
|  | <pre> | 
|  | void *create_mime_dir_config (pool *p, char *dummy) | 
|  | { | 
|  | mime_dir_config *new = | 
|  | (mime_dir_config *) palloc (p, sizeof(mime_dir_config)); | 
|  |  | 
|  | new->forced_types = make_table (p, 4); | 
|  | new->encoding_types = make_table (p, 4); | 
|  |  | 
|  | return new; | 
|  | } | 
|  | </pre> | 
|  |  | 
|  | Now, suppose we've just read in a <code>.htaccess</code> file.  We | 
|  | already have the per-directory configuration structure for the next | 
|  | directory up in the hierarchy.  If the <code>.htaccess</code> file we | 
|  | just read in didn't have any <code>AddType</code> or | 
|  | <code>AddEncoding</code> commands, its per-directory config structure | 
|  | for the MIME module is still valid, and we can just use it. | 
|  | Otherwise, we need to merge the two structures somehow. <p> | 
|  |  | 
|  | To do that, the server invokes the module's per-directory config merge | 
|  | function, if one is present.  That function takes three arguments: | 
|  | the two structures being merged, and a resource pool in which to | 
|  | allocate the result.  For the MIME module, all that needs to be done | 
|  | is overlay the tables from the new per-directory config structure with | 
|  | those from the parent: | 
|  |  | 
|  | <pre> | 
|  | void *merge_mime_dir_configs (pool *p, void *parent_dirv, void *subdirv) | 
|  | { | 
|  | mime_dir_config *parent_dir = (mime_dir_config *)parent_dirv; | 
|  | mime_dir_config *subdir = (mime_dir_config *)subdirv; | 
|  | mime_dir_config *new = | 
|  | (mime_dir_config *)palloc (p, sizeof(mime_dir_config)); | 
|  |  | 
|  | new->forced_types = overlay_tables (p, subdir->forced_types, | 
|  | parent_dir->forced_types); | 
|  | new->encoding_types = overlay_tables (p, subdir->encoding_types, | 
|  | parent_dir->encoding_types); | 
|  |  | 
|  | return new; | 
|  | } | 
|  | </pre> | 
|  |  | 
|  | As a note --- if there is no per-directory merge function present, the | 
|  | server will just use the subdirectory's configuration info, and ignore | 
|  | the parent's.  For some modules, that works just fine (e.g., for the | 
|  | includes module, whose per-directory configuration information | 
|  | consists solely of the state of the <code>XBITHACK</code>), and for | 
|  | those modules, you can just not declare one, and leave the | 
|  | corresponding structure slot in the module itself <code>NULL</code>.<p> | 
|  |  | 
|  | <h3><a name="commands">Command handling</a></h3> | 
|  |  | 
|  | Now that we have these structures, we need to be able to figure out | 
|  | how to fill them.  That involves processing the actual | 
|  | <code>AddType</code> and <code>AddEncoding</code> commands.  To find | 
|  | commands, the server looks in the module's <code>command table</code>. | 
|  | That table contains information on how many arguments the commands | 
|  | take, and in what formats, where it is permitted, and so forth.  That | 
|  | information is sufficient to allow the server to invoke most | 
|  | command-handling functions with pre-parsed arguments.  Without further | 
|  | ado, let's look at the <code>AddType</code> command handler, which | 
|  | looks like this (the <code>AddEncoding</code> command looks basically | 
|  | the same, and won't be shown here): | 
|  |  | 
|  | <pre> | 
|  | char *add_type(cmd_parms *cmd, mime_dir_config *m, char *ct, char *ext) | 
|  | { | 
|  | if (*ext == '.') ++ext; | 
|  | table_set (m->forced_types, ext, ct); | 
|  | return NULL; | 
|  | } | 
|  | </pre> | 
|  |  | 
|  | This command handler is unusually simple.  As you can see, it takes | 
|  | four arguments, two of which are pre-parsed arguments, the third being | 
|  | the per-directory configuration structure for the module in question, | 
|  | and the fourth being a pointer to a <code>cmd_parms</code> structure. | 
|  | That structure contains a bunch of arguments which are frequently of | 
|  | use to some, but not all, commands, including a resource pool (from | 
|  | which memory can be allocated, and to which cleanups should be tied), | 
|  | and the (virtual) server being configured, from which the module's | 
|  | per-server configuration data can be obtained if required.<p> | 
|  |  | 
|  | Another way in which this particular command handler is unusually | 
|  | simple is that there are no error conditions which it can encounter. | 
|  | If there were, it could return an error message instead of | 
|  | <code>NULL</code>; this causes an error to be printed out on the | 
|  | server's <code>stderr</code>, followed by a quick exit, if it is in | 
|  | the main config files; for a <code>.htaccess</code> file, the syntax | 
|  | error is logged in the server error log (along with an indication of | 
|  | where it came from), and the request is bounced with a server error | 
|  | response (HTTP error status, code 500). <p> | 
|  |  | 
|  | The MIME module's command table has entries for these commands, which | 
|  | look like this: | 
|  |  | 
|  | <pre> | 
|  | command_rec mime_cmds[] = { | 
|  | { "AddType", add_type, NULL, OR_FILEINFO, TAKE2, | 
|  | "a mime type followed by a file extension" }, | 
|  | { "AddEncoding", add_encoding, NULL, OR_FILEINFO, TAKE2, | 
|  | "an encoding (e.g., gzip), followed by a file extension" }, | 
|  | { NULL } | 
|  | }; | 
|  | </pre> | 
|  |  | 
|  | The entries in these tables are: | 
|  |  | 
|  | <ul> | 
|  | <li> The name of the command | 
|  | <li> The function which handles it | 
|  | <li> a <code>(void *)</code> pointer, which is passed in the | 
|  | <code>cmd_parms</code> structure to the command handler --- | 
|  | this is useful in case many similar commands are handled by the | 
|  | same function. | 
|  | <li> A bit mask indicating where the command may appear.  There are | 
|  | mask bits corresponding to each <code>AllowOverride</code> | 
|  | option, and an additional mask bit, <code>RSRC_CONF</code>, | 
|  | indicating that the command may appear in the server's own | 
|  | config files, but <em>not</em> in any <code>.htaccess</code> | 
|  | file. | 
|  | <li> A flag indicating how many arguments the command handler wants | 
|  | pre-parsed, and how they should be passed in. | 
|  | <code>TAKE2</code> indicates two pre-parsed arguments.  Other | 
|  | options are <code>TAKE1</code>, which indicates one pre-parsed | 
|  | argument, <code>FLAG</code>, which indicates that the argument | 
|  | should be <code>On</code> or <code>Off</code>, and is passed in | 
|  | as a boolean flag, <code>RAW_ARGS</code>, which causes the | 
|  | server to give the command the raw, unparsed arguments | 
|  | (everything but the command name itself).  There is also | 
|  | <code>ITERATE</code>, which means that the handler looks the | 
|  | same as <code>TAKE1</code>, but that if multiple arguments are | 
|  | present, it should be called multiple times, and finally | 
|  | <code>ITERATE2</code>, which indicates that the command handler | 
|  | looks like a <code>TAKE2</code>, but if more arguments are | 
|  | present, then it should be called multiple times, holding the | 
|  | first argument constant. | 
|  | <li> Finally, we have a string which describes the arguments that | 
|  | should be present.  If the arguments in the actual config file | 
|  | are not as required, this string will be used to help give a | 
|  | more specific error message.  (You can safely leave this | 
|  | <code>NULL</code>). | 
|  | </ul> | 
|  |  | 
|  | Finally, having set this all up, we have to use it.  This is | 
|  | ultimately done in the module's handlers, specifically for its | 
|  | file-typing handler, which looks more or less like this; note that the | 
|  | per-directory configuration structure is extracted from the | 
|  | <code>request_rec</code>'s per-directory configuration vector by using | 
|  | the <code>get_module_config</code> function. | 
|  |  | 
|  | <pre> | 
|  | int find_ct(request_rec *r) | 
|  | { | 
|  | int i; | 
|  | char *fn = pstrdup (r->pool, r->filename); | 
|  | mime_dir_config *conf = (mime_dir_config *) | 
|  | get_module_config(r->per_dir_config, &mime_module); | 
|  | char *type; | 
|  |  | 
|  | if (S_ISDIR(r->finfo.st_mode)) { | 
|  | r->content_type = DIR_MAGIC_TYPE; | 
|  | return OK; | 
|  | } | 
|  |  | 
|  | if((i=rind(fn,'.')) < 0) return DECLINED; | 
|  | ++i; | 
|  |  | 
|  | if ((type = table_get (conf->encoding_types, &fn[i]))) | 
|  | { | 
|  | r->content_encoding = type; | 
|  |  | 
|  | /* go back to previous extension to try to use it as a type */ | 
|  |  | 
|  | fn[i-1] = '\0'; | 
|  | if((i=rind(fn,'.')) < 0) return OK; | 
|  | ++i; | 
|  | } | 
|  |  | 
|  | if ((type = table_get (conf->forced_types, &fn[i]))) | 
|  | { | 
|  | r->content_type = type; | 
|  | } | 
|  |  | 
|  | return OK; | 
|  | } | 
|  |  | 
|  | </pre> | 
|  |  | 
|  | <h3><a name="servconf">Side notes --- per-server configuration, virtual servers, etc.</a></h3> | 
|  |  | 
|  | The basic ideas behind per-server module configuration are basically | 
|  | the same as those for per-directory configuration; there is a creation | 
|  | function and a merge function, the latter being invoked where a | 
|  | virtual server has partially overridden the base server configuration, | 
|  | and a combined structure must be computed.  (As with per-directory | 
|  | configuration, the default if no merge function is specified, and a | 
|  | module is configured in some virtual server, is that the base | 
|  | configuration is simply ignored). <p> | 
|  |  | 
|  | The only substantial difference is that when a command needs to | 
|  | configure the per-server private module data, it needs to go to the | 
|  | <code>cmd_parms</code> data to get at it.  Here's an example, from the | 
|  | alias module, which also indicates how a syntax error can be returned | 
|  | (note that the per-directory configuration argument to the command | 
|  | handler is declared as a dummy, since the module doesn't actually have | 
|  | per-directory config data): | 
|  |  | 
|  | <pre> | 
|  | char *add_redirect(cmd_parms *cmd, void *dummy, char *f, char *url) | 
|  | { | 
|  | server_rec *s = cmd->server; | 
|  | alias_server_conf *conf = (alias_server_conf *) | 
|  | get_module_config(s->module_config,&alias_module); | 
|  | alias_entry *new = push_array (conf->redirects); | 
|  |  | 
|  | if (!is_url (url)) return "Redirect to non-URL"; | 
|  |  | 
|  | new->fake = f; new->real = url; | 
|  | return NULL; | 
|  | } | 
|  | </pre> | 
|  | <!--#include virtual="footer.html" --> | 
|  | </body></html> |