| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> | 
 | <HTML><HEAD> | 
 | <TITLE>Apache API notes</TITLE> | 
 | </HEAD> | 
 | <!-- Background white, links blue (unvisited), navy (visited), red (active) --> | 
 | <BODY | 
 |  BGCOLOR="#FFFFFF" | 
 |  TEXT="#000000" | 
 |  LINK="#0000FF" | 
 |  VLINK="#000080" | 
 |  ALINK="#FF0000" | 
 | > | 
 | <!--#include virtual="header.html" --> | 
 | <H1 ALIGN="CENTER">Apache API notes</H1> | 
 |  | 
 | These are some notes on the Apache API and the data structures you | 
 | have to deal with, <EM>etc.</EM>  They are not yet nearly complete, but | 
 | hopefully, they will help you get your bearings.  Keep in mind that | 
 | the API is still subject to change as we gain experience with it. | 
 | (See the TODO file for what <EM>might</EM> be coming).  However, | 
 | it will be easy to adapt modules to any changes that are made. | 
 | (We have more modules to adapt than you do). | 
 | <P> | 
 |  | 
 | A few notes on general pedagogical style here.  In the interest of | 
 | conciseness, all structure declarations here are incomplete --- the | 
 | real ones have more slots that I'm not telling you about.  For the | 
 | most part, these are reserved to one component of the server core or | 
 | another, and should be altered by modules with caution.  However, in | 
 | some cases, they really are things I just haven't gotten around to | 
 | yet.  Welcome to the bleeding edge.<P> | 
 |  | 
 | Finally, here's an outline, to give you some bare idea of what's | 
 | coming up, and in what order: | 
 |  | 
 | <UL> | 
 | <LI> <A HREF="#basics">Basic concepts.</A> | 
 | <MENU> | 
 |  <LI> <A HREF="#HMR">Handlers, Modules, and Requests</A> | 
 |  <LI> <A HREF="#moduletour">A brief tour of a module</A> | 
 | </MENU> | 
 | <LI> <A HREF="#handlers">How handlers work</A> | 
 | <MENU> | 
 |  <LI> <A HREF="#req_tour">A brief tour of the <CODE>request_rec</CODE></A> | 
 |  <LI> <A HREF="#req_orig">Where request_rec structures come from</A> | 
 |  <LI> <A HREF="#req_return">Handling requests, declining, and returning error | 
 |   codes</A> | 
 |  <LI> <A HREF="#resp_handlers">Special considerations for response handlers</A> | 
 |  <LI> <A HREF="#auth_handlers">Special considerations for authentication | 
 |   handlers</A> | 
 |  <LI> <A HREF="#log_handlers">Special considerations for logging handlers</A> | 
 | </MENU> | 
 | <LI> <A HREF="#pools">Resource allocation and resource pools</A> | 
 | <LI> <A HREF="#config">Configuration, commands and the like</A> | 
 | <MENU> | 
 |  <LI> <A HREF="#per-dir">Per-directory configuration structures</A> | 
 |  <LI> <A HREF="#commands">Command handling</A> | 
 |  <LI> <A HREF="#servconf">Side notes --- per-server configuration, | 
 |   virtual servers, <EM>etc</EM>.</A> | 
 | </MENU> | 
 | </UL> | 
 |  | 
 | <H2><A NAME="basics">Basic concepts.</A></H2> | 
 |  | 
 | We begin with an overview of the basic concepts behind the | 
 | API, and how they are manifested in the code. | 
 |  | 
 | <H3><A NAME="HMR">Handlers, Modules, and Requests</A></H3> | 
 |  | 
 | Apache breaks down request handling into a series of steps, more or | 
 | less the same way the Netscape server API does (although this API has | 
 | a few more stages than NetSite does, as hooks for stuff I thought | 
 | might be useful in the future).  These are: | 
 |  | 
 | <UL> | 
 |   <LI> URI -> Filename translation | 
 |   <LI> Auth ID checking [is the user who they say they are?] | 
 |   <LI> Auth access checking [is the user authorized <EM>here</EM>?] | 
 |   <LI> Access checking other than auth | 
 |   <LI> Determining MIME type of the object requested | 
 |   <LI> `Fixups' --- there aren't any of these yet, but the phase is | 
 |        intended as a hook for possible extensions like | 
 |        <CODE>SetEnv</CODE>, which don't really fit well elsewhere. | 
 |   <LI> Actually sending a response back to the client. | 
 |   <LI> Logging the request | 
 | </UL> | 
 |  | 
 | These phases are handled by looking at each of a succession of | 
 | <EM>modules</EM>, looking to see if each of them has a handler for the | 
 | phase, and attempting invoking it if so.  The handler can typically do | 
 | one of three things: | 
 |  | 
 | <UL> | 
 |   <LI> <EM>Handle</EM> the request, and indicate that it has done so | 
 |        by returning the magic constant <CODE>OK</CODE>. | 
 |   <LI> <EM>Decline</EM> to handle the request, by returning the magic | 
 |        integer constant <CODE>DECLINED</CODE>.  In this case, the | 
 |        server behaves in all respects as if the handler simply hadn't | 
 |        been there. | 
 |   <LI> Signal an error, by returning one of the HTTP error codes. | 
 |        This terminates normal handling of the request, although an | 
 |        ErrorDocument may be invoked to try to mop up, and it will be | 
 |        logged in any case. | 
 | </UL> | 
 |  | 
 | Most phases are terminated by the first module that handles them; | 
 | however, for logging, `fixups', and non-access authentication | 
 | checking, all handlers always run (barring an error).  Also, the | 
 | response phase is unique in that modules may declare multiple handlers | 
 | for it, via a dispatch table keyed on the MIME type of the requested | 
 | object.  Modules may declare a response-phase handler which can handle | 
 | <EM>any</EM> request, by giving it the key <CODE>*/*</CODE> (<EM>i.e.</EM>, a | 
 | wildcard MIME type specification).  However, wildcard handlers are | 
 | only invoked if the server has already tried and failed to find a more | 
 | specific response handler for the MIME type of the requested object | 
 | (either none existed, or they all declined).<P> | 
 |  | 
 | The handlers themselves are functions of one argument (a | 
 | <CODE>request_rec</CODE> structure. vide infra), which returns an | 
 | integer, as above.<P> | 
 |  | 
 | <H3><A NAME="moduletour">A brief tour of a module</A></H3> | 
 |  | 
 | At this point, we need to explain the structure of a module.  Our | 
 | candidate will be one of the messier ones, the CGI module --- this | 
 | handles both CGI scripts and the <CODE>ScriptAlias</CODE> config file | 
 | command.  It's actually a great deal more complicated than most | 
 | modules, but if we're going to have only one example, it might as well | 
 | be the one with its fingers in every place.<P> | 
 |  | 
 | Let's begin with handlers.  In order to handle the CGI scripts, the | 
 | module declares a response handler for them. Because of | 
 | <CODE>ScriptAlias</CODE>, it also has handlers for the name | 
 | translation phase (to recognize <CODE>ScriptAlias</CODE>ed URIs), the | 
 | type-checking phase (any <CODE>ScriptAlias</CODE>ed request is typed | 
 | as a CGI script).<P> | 
 |  | 
 | The module needs to maintain some per (virtual) | 
 | server information, namely, the <CODE>ScriptAlias</CODE>es in effect; | 
 | the module structure therefore contains pointers to a functions which | 
 | builds these structures, and to another which combines two of them (in | 
 | case the main server and a virtual server both have | 
 | <CODE>ScriptAlias</CODE>es declared).<P> | 
 |  | 
 | Finally, this module contains code to handle the | 
 | <CODE>ScriptAlias</CODE> command itself.  This particular module only | 
 | declares one command, but there could be more, so modules have | 
 | <EM>command tables</EM> which declare their commands, and describe | 
 | where they are permitted, and how they are to be invoked.  <P> | 
 |  | 
 | A final note on the declared types of the arguments of some of these | 
 | commands: a <CODE>pool</CODE> is a pointer to a <EM>resource pool</EM> | 
 | structure; these are used by the server to keep track of the memory | 
 | which has been allocated, files opened, <EM>etc.</EM>, either to service a | 
 | particular request, or to handle the process of configuring itself. | 
 | That way, when the request is over (or, for the configuration pool, | 
 | when the server is restarting), the memory can be freed, and the files | 
 | closed, <EM>en masse</EM>, without anyone having to write explicit code to | 
 | track them all down and dispose of them.  Also, a | 
 | <CODE>cmd_parms</CODE> structure contains various information about | 
 | the config file being read, and other status information, which is | 
 | sometimes of use to the function which processes a config-file command | 
 | (such as <CODE>ScriptAlias</CODE>). | 
 |  | 
 | With no further ado, the module itself: | 
 |  | 
 | <PRE> | 
 | /* Declarations of handlers. */ | 
 |  | 
 | int translate_scriptalias (request_rec *); | 
 | int type_scriptalias (request_rec *); | 
 | int cgi_handler (request_rec *); | 
 |  | 
 | /* Subsidiary dispatch table for response-phase handlers, by MIME type */ | 
 |  | 
 | handler_rec cgi_handlers[] = { | 
 | { "application/x-httpd-cgi", cgi_handler }, | 
 | { NULL } | 
 | }; | 
 |  | 
 | /* Declarations of routines to manipulate the module's configuration | 
 |  * info.  Note that these are returned, and passed in, as void *'s; | 
 |  * the server core keeps track of them, but it doesn't, and can't, | 
 |  * know their internal structure. | 
 |  */ | 
 |  | 
 | void *make_cgi_server_config (pool *); | 
 | void *merge_cgi_server_config (pool *, void *, void *); | 
 |  | 
 | /* Declarations of routines to handle config-file commands */ | 
 |  | 
 | extern char *script_alias(cmd_parms *, void *per_dir_config, char *fake, | 
 |                           char *real); | 
 |  | 
 | command_rec cgi_cmds[] = { | 
 | { "ScriptAlias", script_alias, NULL, RSRC_CONF, TAKE2, | 
 |     "a fakename and a realname"}, | 
 | { NULL } | 
 | }; | 
 |  | 
 | module cgi_module = { | 
 |    STANDARD_MODULE_STUFF, | 
 |    NULL,                     /* initializer */ | 
 |    NULL,                     /* dir config creator */ | 
 |    NULL,                     /* dir merger --- default is to override */ | 
 |    make_cgi_server_config,   /* server config */ | 
 |    merge_cgi_server_config,  /* merge server config */ | 
 |    cgi_cmds,                 /* command table */ | 
 |    cgi_handlers,             /* handlers */ | 
 |    translate_scriptalias,    /* filename translation */ | 
 |    NULL,                     /* check_user_id */ | 
 |    NULL,                     /* check auth */ | 
 |    NULL,                     /* check access */ | 
 |    type_scriptalias,         /* type_checker */ | 
 |    NULL,                     /* fixups */ | 
 |    NULL,                     /* logger */ | 
 |    NULL                      /* header parser */ | 
 | }; | 
 | </PRE> | 
 |  | 
 | <H2><A NAME="handlers">How handlers work</A></H2> | 
 |  | 
 | The sole argument to handlers is a <CODE>request_rec</CODE> structure. | 
 | This structure describes a particular request which has been made to | 
 | the server, on behalf of a client.  In most cases, each connection to | 
 | the client generates only one <CODE>request_rec</CODE> structure.<P> | 
 |  | 
 | <H3><A NAME="req_tour">A brief tour of the <CODE>request_rec</CODE></A></H3> | 
 |  | 
 | The <CODE>request_rec</CODE> contains pointers to a resource pool | 
 | which will be cleared when the server is finished handling the | 
 | request; to structures containing per-server and per-connection | 
 | information, and most importantly, information on the request itself.<P> | 
 |  | 
 | The most important such information is a small set of character | 
 | strings describing attributes of the object being requested, including | 
 | its URI, filename, content-type and content-encoding (these being filled | 
 | in by the translation and type-check handlers which handle the | 
 | request, respectively). <P> | 
 |  | 
 | Other commonly used data items are tables giving the MIME headers on | 
 | the client's original request, MIME headers to be sent back with the | 
 | response (which modules can add to at will), and environment variables | 
 | for any subprocesses which are spawned off in the course of servicing | 
 | the request.  These tables are manipulated using the | 
 | <CODE>ap_table_get</CODE> and <CODE>ap_table_set</CODE> routines. <P> | 
 | <BLOCKQUOTE> | 
 |  Note that the <SAMP>Content-type</SAMP> header value <EM>cannot</EM> be | 
 |  set by module content-handlers using the <SAMP>ap_table_*()</SAMP> | 
 |  routines.  Rather, it is set by pointing the <SAMP>content_type</SAMP> | 
 |  field in the <SAMP>request_rec</SAMP> structure to an appropriate | 
 |  string.  <EM>E.g.</EM>, | 
 |  <PRE> | 
 |   r->content_type = "text/html"; | 
 |  </PRE> | 
 | </BLOCKQUOTE> | 
 | Finally, there are pointers to two data structures which, in turn, | 
 | point to per-module configuration structures.  Specifically, these | 
 | hold pointers to the data structures which the module has built to | 
 | describe the way it has been configured to operate in a given | 
 | directory (via <CODE>.htaccess</CODE> files or | 
 | <CODE><Directory></CODE> sections), for private data it has | 
 | built in the course of servicing the request (so modules' handlers for | 
 | one phase can pass `notes' to their handlers for other phases).  There | 
 | is another such configuration vector in the <CODE>server_rec</CODE> | 
 | data structure pointed to by the <CODE>request_rec</CODE>, which | 
 | contains per (virtual) server configuration data.<P> | 
 |  | 
 | Here is an abridged declaration, giving the fields most commonly used:<P> | 
 |  | 
 | <PRE> | 
 | struct request_rec { | 
 |  | 
 |   pool *pool; | 
 |   conn_rec *connection; | 
 |   server_rec *server; | 
 |  | 
 |   /* What object is being requested */ | 
 |  | 
 |   char *uri; | 
 |   char *filename; | 
 |   char *path_info; | 
 |   char *args;           /* QUERY_ARGS, if any */ | 
 |   struct stat finfo;    /* Set by server core; | 
 |                          * st_mode set to zero if no such file */ | 
 |  | 
 |   char *content_type; | 
 |   char *content_encoding; | 
 |  | 
 |   /* MIME header environments, in and out.  Also, an array containing | 
 |    * environment variables to be passed to subprocesses, so people can | 
 |    * write modules to add to that environment. | 
 |    * | 
 |    * The difference between headers_out and err_headers_out is that | 
 |    * the latter are printed even on error, and persist across internal | 
 |    * redirects (so the headers printed for ErrorDocument handlers will | 
 |    * have them). | 
 |    */ | 
 |  | 
 |   table *headers_in; | 
 |   table *headers_out; | 
 |   table *err_headers_out; | 
 |   table *subprocess_env; | 
 |  | 
 |   /* Info about the request itself... */ | 
 |  | 
 |   int header_only;     /* HEAD request, as opposed to GET */ | 
 |   char *protocol;      /* Protocol, as given to us, or HTTP/0.9 */ | 
 |   char *method;        /* GET, HEAD, POST, <EM>etc.</EM> */ | 
 |   int method_number;   /* M_GET, M_POST, <EM>etc.</EM> */ | 
 |  | 
 |   /* Info for logging */ | 
 |  | 
 |   char *the_request; | 
 |   int bytes_sent; | 
 |  | 
 |   /* A flag which modules can set, to indicate that the data being | 
 |    * returned is volatile, and clients should be told not to cache it. | 
 |    */ | 
 |  | 
 |   int no_cache; | 
 |  | 
 |   /* Various other config info which may change with .htaccess files | 
 |    * These are config vectors, with one void* pointer for each module | 
 |    * (the thing pointed to being the module's business). | 
 |    */ | 
 |  | 
 |   void *per_dir_config;   /* Options set in config files, <EM>etc.</EM> */ | 
 |   void *request_config;   /* Notes on *this* request */ | 
 |  | 
 | }; | 
 |  | 
 | </PRE> | 
 |  | 
 | <H3><A NAME="req_orig">Where request_rec structures come from</A></H3> | 
 |  | 
 | Most <CODE>request_rec</CODE> structures are built by reading an HTTP | 
 | request from a client, and filling in the fields.  However, there are | 
 | a few exceptions: | 
 |  | 
 | <UL> | 
 |   <LI> If the request is to an imagemap, a type map (<EM>i.e.</EM>, a | 
 |        <CODE>*.var</CODE> file), or a CGI script which returned a | 
 |        local `Location:', then the resource which the user requested | 
 |        is going to be ultimately located by some URI other than what | 
 |        the client originally supplied.  In this case, the server does | 
 |        an <EM>internal redirect</EM>, constructing a new | 
 |        <CODE>request_rec</CODE> for the new URI, and processing it | 
 |        almost exactly as if the client had requested the new URI | 
 |        directly. <P> | 
 |  | 
 |   <LI> If some handler signaled an error, and an | 
 |        <CODE>ErrorDocument</CODE> is in scope, the same internal | 
 |        redirect machinery comes into play.<P> | 
 |  | 
 |   <LI> Finally, a handler occasionally needs to investigate `what | 
 |        would happen if' some other request were run.  For instance, | 
 |        the directory indexing module needs to know what MIME type | 
 |        would be assigned to a request for each directory entry, in | 
 |        order to figure out what icon to use.<P> | 
 |  | 
 |        Such handlers can construct a <EM>sub-request</EM>, using the | 
 |        functions <CODE>ap_sub_req_lookup_file</CODE>, | 
 |        <CODE>ap_sub_req_lookup_uri</CODE>, and | 
 |        <CODE>ap_sub_req_method_uri</CODE>; these construct a new | 
 |        <CODE>request_rec</CODE> structure and processes it as you | 
 |        would expect, up to but not including the point of actually | 
 |        sending a response.  (These functions skip over the access | 
 |        checks if the sub-request is for a file in the same directory | 
 |        as the original request).<P> | 
 |  | 
 |        (Server-side includes work by building sub-requests and then | 
 |        actually invoking the response handler for them, via the | 
 |        function <CODE>ap_run_sub_req</CODE>). | 
 | </UL> | 
 |  | 
 | <H3><A NAME="req_return">Handling requests, declining, and returning error | 
 |  codes</A></H3> | 
 |  | 
 | As discussed above, each handler, when invoked to handle a particular | 
 | <CODE>request_rec</CODE>, has to return an <CODE>int</CODE> to | 
 | indicate what happened.  That can either be | 
 |  | 
 | <UL> | 
 |   <LI> OK --- the request was handled successfully.  This may or may | 
 |        not terminate the phase. | 
 |   <LI> DECLINED --- no erroneous condition exists, but the module | 
 |        declines to handle the phase; the server tries to find another. | 
 |   <LI> an HTTP error code, which aborts handling of the request. | 
 | </UL> | 
 |  | 
 | Note that if the error code returned is <CODE>REDIRECT</CODE>, then | 
 | the module should put a <CODE>Location</CODE> in the request's | 
 | <CODE>headers_out</CODE>, to indicate where the client should be | 
 | redirected <EM>to</EM>. <P> | 
 |  | 
 | <H3><A NAME="resp_handlers">Special considerations for response | 
 |  handlers</A></H3> | 
 |  | 
 | Handlers for most phases do their work by simply setting a few fields | 
 | in the <CODE>request_rec</CODE> structure (or, in the case of access | 
 | checkers, simply by returning the correct error code).  However, | 
 | response handlers have to actually send a request back to the client. <P> | 
 |  | 
 | They should begin by sending an HTTP response header, using the | 
 | function <CODE>ap_send_http_header</CODE>.  (You don't have to do | 
 | anything special to skip sending the header for HTTP/0.9 requests; the | 
 | function figures out on its own that it shouldn't do anything).  If | 
 | the request is marked <CODE>header_only</CODE>, that's all they should | 
 | do; they should return after that, without attempting any further | 
 | output.  <P> | 
 |  | 
 | Otherwise, they should produce a request body which responds to the | 
 | client as appropriate.  The primitives for this are <CODE>ap_rputc</CODE> | 
 | and <CODE>ap_rprintf</CODE>, for internally generated output, and | 
 | <CODE>ap_send_fd</CODE>, to copy the contents of some <CODE>FILE *</CODE> | 
 | straight to the client.  <P> | 
 |  | 
 | At this point, you should more or less understand the following piece | 
 | of code, which is the handler which handles <CODE>GET</CODE> requests | 
 | which have no more specific handler; it also shows how conditional | 
 | <CODE>GET</CODE>s can be handled, if it's desirable to do so in a | 
 | particular response handler --- <CODE>ap_set_last_modified</CODE> checks | 
 | against the <CODE>If-modified-since</CODE> value supplied by the | 
 | client, if any, and returns an appropriate code (which will, if | 
 | nonzero, be USE_LOCAL_COPY).   No similar considerations apply for | 
 | <CODE>ap_set_content_length</CODE>, but it returns an error code for | 
 | symmetry.<P> | 
 |  | 
 | <PRE> | 
 | int default_handler (request_rec *r) | 
 | { | 
 |     int errstatus; | 
 |     FILE *f; | 
 |  | 
 |     if (r->method_number != M_GET) return DECLINED; | 
 |     if (r->finfo.st_mode == 0) return NOT_FOUND; | 
 |  | 
 |     if ((errstatus = ap_set_content_length (r, r->finfo.st_size)) | 
 | 	|| (errstatus = ap_set_last_modified (r, r->finfo.st_mtime))) | 
 |         return errstatus; | 
 |  | 
 |     f = fopen (r->filename, "r"); | 
 |  | 
 |     if (f == NULL) { | 
 |         log_reason("file permissions deny server access", | 
 |                    r->filename, r); | 
 |         return FORBIDDEN; | 
 |     } | 
 |  | 
 |     register_timeout ("send", r); | 
 |     ap_send_http_header (r); | 
 |  | 
 |     if (!r->header_only) send_fd (f, r); | 
 |     ap_pfclose (r->pool, f); | 
 |     return OK; | 
 | } | 
 | </PRE> | 
 |  | 
 | Finally, if all of this is too much of a challenge, there are a few | 
 | ways out of it.  First off, as shown above, a response handler which | 
 | has not yet produced any output can simply return an error code, in | 
 | which case the server will automatically produce an error response. | 
 | Secondly, it can punt to some other handler by invoking | 
 | <CODE>ap_internal_redirect</CODE>, which is how the internal redirection | 
 | machinery discussed above is invoked.  A response handler which has | 
 | internally redirected should always return <CODE>OK</CODE>. <P> | 
 |  | 
 | (Invoking <CODE>ap_internal_redirect</CODE> from handlers which are | 
 | <EM>not</EM> response handlers will lead to serious confusion). | 
 |  | 
 | <H3><A NAME="auth_handlers">Special considerations for authentication | 
 |  handlers</A></H3> | 
 |  | 
 | Stuff that should be discussed here in detail: | 
 |  | 
 | <UL> | 
 |   <LI> Authentication-phase handlers not invoked unless auth is | 
 |        configured for the directory. | 
 |   <LI> Common auth configuration stored in the core per-dir | 
 |        configuration; it has accessors <CODE>ap_auth_type</CODE>, | 
 |        <CODE>ap_auth_name</CODE>, and <CODE>ap_requires</CODE>. | 
 |   <LI> Common routines, to handle the protocol end of things, at least | 
 |        for HTTP basic authentication (<CODE>ap_get_basic_auth_pw</CODE>, | 
 |        which sets the <CODE>connection->user</CODE> structure field | 
 |        automatically, and <CODE>ap_note_basic_auth_failure</CODE>, which | 
 |        arranges for the proper <CODE>WWW-Authenticate:</CODE> header | 
 |        to be sent back). | 
 | </UL> | 
 |  | 
 | <H3><A NAME="log_handlers">Special considerations for logging handlers</A></H3> | 
 |  | 
 | When a request has internally redirected, there is the question of | 
 | what to log.  Apache handles this by bundling the entire chain of | 
 | redirects into a list of <CODE>request_rec</CODE> structures which are | 
 | threaded through the <CODE>r->prev</CODE> and <CODE>r->next</CODE> | 
 | pointers.  The <CODE>request_rec</CODE> which is passed to the logging | 
 | handlers in such cases is the one which was originally built for the | 
 | initial request from the client; note that the bytes_sent field will | 
 | only be correct in the last request in the chain (the one for which a | 
 | response was actually sent). | 
 |  | 
 | <H2><A NAME="pools">Resource allocation and resource pools</A></H2> | 
 | <P> | 
 | One of the problems of writing and designing a server-pool server is | 
 | that of preventing leakage, that is, allocating resources (memory, | 
 | open files, <EM>etc.</EM>), without subsequently releasing them.  The resource | 
 | pool machinery is designed to make it easy to prevent this from | 
 | happening, by allowing resource to be allocated in such a way that | 
 | they are <EM>automatically</EM> released when the server is done with | 
 | them. | 
 | </P> | 
 | <P> | 
 | The way this works is as follows:  the memory which is allocated, file | 
 | opened, <EM>etc.</EM>, to deal with a particular request are tied to a | 
 | <EM>resource pool</EM> which is allocated for the request.  The pool | 
 | is a data structure which itself tracks the resources in question. | 
 | </P> | 
 | <P> | 
 | When the request has been processed, the pool is <EM>cleared</EM>.  At | 
 | that point, all the memory associated with it is released for reuse, | 
 | all files associated with it are closed, and any other clean-up | 
 | functions which are associated with the pool are run.  When this is | 
 | over, we can be confident that all the resource tied to the pool have | 
 | been released, and that none of them have leaked. | 
 | </P> | 
 | <P> | 
 | Server restarts, and allocation of memory and resources for per-server | 
 | configuration, are handled in a similar way.  There is a | 
 | <EM>configuration pool</EM>, which keeps track of resources which were | 
 | allocated while reading the server configuration files, and handling | 
 | the commands therein (for instance, the memory that was allocated for | 
 | per-server module configuration, log files and other files that were | 
 | opened, and so forth).  When the server restarts, and has to reread | 
 | the configuration files, the configuration pool is cleared, and so the | 
 | memory and file descriptors which were taken up by reading them the | 
 | last time are made available for reuse. | 
 | </P> | 
 | <P> | 
 | It should be noted that use of the pool machinery isn't generally | 
 | obligatory, except for situations like logging handlers, where you | 
 | really need to register cleanups to make sure that the log file gets | 
 | closed when the server restarts (this is most easily done by using the | 
 | function <CODE><A HREF="#pool-files">ap_pfopen</A></CODE>, which also | 
 | arranges for the underlying file descriptor to be closed before any | 
 | child processes, such as for CGI scripts, are <CODE>exec</CODE>ed), or | 
 | in case you are using the timeout machinery (which isn't yet even | 
 | documented here).  However, there are two benefits to using it: | 
 | resources allocated to a pool never leak (even if you allocate a | 
 | scratch string, and just forget about it); also, for memory | 
 | allocation, <CODE>ap_palloc</CODE> is generally faster than | 
 | <CODE>malloc</CODE>. | 
 | </P> | 
 | <P> | 
 | We begin here by describing how memory is allocated to pools, and then | 
 | discuss how other resources are tracked by the resource pool | 
 | machinery. | 
 | </P> | 
 | <H3>Allocation of memory in pools</H3> | 
 | <P> | 
 | Memory is allocated to pools by calling the function | 
 | <CODE>ap_palloc</CODE>, which takes two arguments, one being a pointer to | 
 | a resource pool structure, and the other being the amount of memory to | 
 | allocate (in <CODE>char</CODE>s).  Within handlers for handling | 
 | requests, the most common way of getting a resource pool structure is | 
 | by looking at the <CODE>pool</CODE> slot of the relevant | 
 | <CODE>request_rec</CODE>; hence the repeated appearance of the | 
 | following idiom in module code: | 
 | </P> | 
 | <PRE> | 
 | int my_handler(request_rec *r) | 
 | { | 
 |     struct my_structure *foo; | 
 |     ... | 
 |  | 
 |     foo = (foo *)ap_palloc (r->pool, sizeof(my_structure)); | 
 | } | 
 | </PRE> | 
 | <P> | 
 | Note that <EM>there is no <CODE>ap_pfree</CODE></EM> --- | 
 | <CODE>ap_palloc</CODE>ed memory is freed only when the associated | 
 | resource pool is cleared.  This means that <CODE>ap_palloc</CODE> does not | 
 | have to do as much accounting as <CODE>malloc()</CODE>; all it does in | 
 | the typical case is to round up the size, bump a pointer, and do a | 
 | range check. | 
 | </P> | 
 | <P> | 
 | (It also raises the possibility that heavy use of <CODE>ap_palloc</CODE> | 
 | could cause a server process to grow excessively large.  There are | 
 | two ways to deal with this, which are dealt with below; briefly, you | 
 | can use <CODE>malloc</CODE>, and try to be sure that all of the memory | 
 | gets explicitly <CODE>free</CODE>d, or you can allocate a sub-pool of | 
 | the main pool, allocate your memory in the sub-pool, and clear it out | 
 | periodically.  The latter technique is discussed in the section on | 
 | sub-pools below, and is used in the directory-indexing code, in order | 
 | to avoid excessive storage allocation when listing directories with | 
 | thousands of files). | 
 | </P> | 
 | <H3>Allocating initialized memory</H3> | 
 | <P> | 
 | There are functions which allocate initialized memory, and are | 
 | frequently useful.  The function <CODE>ap_pcalloc</CODE> has the same | 
 | interface as <CODE>ap_palloc</CODE>, but clears out the memory it | 
 | allocates before it returns it.  The function <CODE>ap_pstrdup</CODE> | 
 | takes a resource pool and a <CODE>char *</CODE> as arguments, and | 
 | allocates memory for a copy of the string the pointer points to, | 
 | returning a pointer to the copy.  Finally <CODE>ap_pstrcat</CODE> is a | 
 | varargs-style function, which takes a pointer to a resource pool, and | 
 | at least two <CODE>char *</CODE> arguments, the last of which must be | 
 | <CODE>NULL</CODE>.  It allocates enough memory to fit copies of each | 
 | of the strings, as a unit; for instance: | 
 | </P> | 
 | <PRE> | 
 |      ap_pstrcat (r->pool, "foo", "/", "bar", NULL); | 
 | </PRE> | 
 | <P> | 
 | returns a pointer to 8 bytes worth of memory, initialized to | 
 | <CODE>"foo/bar"</CODE>. | 
 | </P> | 
 | <H3><A NAME="pools-used">Commonly-used pools in the Apache Web server</A></H3> | 
 | <P> | 
 | A pool is really defined by its lifetime more than anything else.  There | 
 | are some static pools in http_main which are passed to various | 
 | non-http_main functions as arguments at opportune times.  Here they are: | 
 | </P> | 
 | <DL COMPACT> | 
 |  <DT>permanent_pool | 
 |  </DT> | 
 |  <DD> | 
 |   <UL> | 
 |    <LI>never passed to anything else, this is the ancestor of all pools | 
 |    </LI> | 
 |   </UL> | 
 |  </DD> | 
 |  <DT>pconf | 
 |  </DT> | 
 |  <DD> | 
 |   <UL> | 
 |    <LI>subpool of permanent_pool | 
 |    </LI> | 
 |    <LI>created at the beginning of a config "cycle"; exists until the | 
 |     server is terminated or restarts; passed to all config-time | 
 |     routines, either via cmd->pool, or as the "pool *p" argument on | 
 |     those which don't take pools | 
 |    </LI> | 
 |    <LI>passed to the module init() functions | 
 |    </LI> | 
 |   </UL> | 
 |  </DD> | 
 |  <DT>ptemp | 
 |  </DT> | 
 |  <DD> | 
 |   <UL> | 
 |    <LI>sorry I lie, this pool isn't called this currently in 1.3, I | 
 |     renamed it this in my pthreads development.  I'm referring to | 
 |     the use of ptrans in the parent... contrast this with the later | 
 |     definition of ptrans in the child. | 
 |    </LI> | 
 |    <LI>subpool of permanent_pool | 
 |    </LI> | 
 |    <LI>created at the beginning of a config "cycle"; exists until the | 
 |     end of config parsing; passed to config-time routines <EM>via</EM> | 
 |     cmd->temp_pool.  Somewhat of a "bastard child" because it isn't | 
 |     available everywhere.  Used for temporary scratch space which | 
 |     may be needed by some config routines but which is deleted at | 
 |     the end of config. | 
 |    </LI> | 
 |   </UL> | 
 |  </DD> | 
 |  <DT>pchild | 
 |  </DT> | 
 |  <DD> | 
 |   <UL> | 
 |    <LI>subpool of permanent_pool | 
 |    </LI> | 
 |    <LI>created when a child is spawned (or a thread is created); lives | 
 |     until that child (thread) is destroyed | 
 |    </LI> | 
 |    <LI>passed to the module child_init functions | 
 |    </LI> | 
 |    <LI>destruction happens right after the child_exit functions are | 
 |     called... (which may explain why I think child_exit is redundant | 
 |     and unneeded) | 
 |    </LI> | 
 |   </UL> | 
 |  </DD> | 
 |  <DT>ptrans | 
 |  <DT> | 
 |  <DD> | 
 |   <UL> | 
 |    <LI>should be a subpool of pchild, but currently is a subpool of | 
 |     permanent_pool, see above | 
 |    </LI> | 
 |    <LI>cleared by the child before going into the accept() loop to receive | 
 |     a connection | 
 |    </LI> | 
 |    <LI>used as connection->pool | 
 |    </LI> | 
 |   </UL> | 
 |  </DD> | 
 |  <DT>r->pool | 
 |  </DT> | 
 |  <DD> | 
 |   <UL> | 
 |    <LI>for the main request this is a subpool of connection->pool; for | 
 |     subrequests it is a subpool of the parent request's pool. | 
 |    </LI> | 
 |    <LI>exists until the end of the request (<EM>i.e.</EM>, | 
 |     ap_destroy_sub_req, or | 
 |     in child_main after process_request has finished) | 
 |    </LI> | 
 |    <LI>note that r itself is allocated from r->pool; <EM>i.e.</EM>, | 
 |     r->pool is | 
 |     first created and then r is the first thing palloc()d from it | 
 |    </LI> | 
 |   </UL> | 
 |  </DD> | 
 | </DL> | 
 | <P> | 
 | For almost everything folks do, r->pool is the pool to use.  But you | 
 | can see how other lifetimes, such as pchild, are useful to some | 
 | modules... such as modules that need to open a database connection once | 
 | per child, and wish to clean it up when the child dies. | 
 | </P> | 
 | <P> | 
 | You can also see how some bugs have manifested themself, such as setting | 
 | connection->user to a value from r->pool -- in this case | 
 | connection exists | 
 | for the lifetime of ptrans, which is longer than r->pool (especially if | 
 | r->pool is a subrequest!).  So the correct thing to do is to allocate | 
 | from connection->pool. | 
 | </P> | 
 | <P> | 
 | And there was another interesting bug in mod_include/mod_cgi.  You'll see | 
 | in those that they do this test to decide if they should use r->pool | 
 | or r->main->pool.  In this case the resource that they are registering | 
 | for cleanup is a child process.  If it were registered in r->pool, | 
 | then the code would wait() for the child when the subrequest finishes. | 
 | With mod_include this could be any old #include, and the delay can be up | 
 | to 3 seconds... and happened quite frequently.  Instead the subprocess | 
 | is registered in r->main->pool which causes it to be cleaned up when | 
 | the entire request is done -- <EM>i.e.</EM>, after the output has been sent to | 
 | the client and logging has happened. | 
 | </P> | 
 | <H3><A NAME="pool-files">Tracking open files, etc.</A></H3> | 
 | <P> | 
 | As indicated above, resource pools are also used to track other sorts | 
 | of resources besides memory.  The most common are open files.  The | 
 | routine which is typically used for this is <CODE>ap_pfopen</CODE>, which | 
 | takes a resource pool and two strings as arguments; the strings are | 
 | the same as the typical arguments to <CODE>fopen</CODE>, <EM>e.g.</EM>, | 
 | </P> | 
 | <PRE> | 
 |      ... | 
 |      FILE *f = ap_pfopen (r->pool, r->filename, "r"); | 
 |  | 
 |      if (f == NULL) { ... } else { ... } | 
 | </PRE> | 
 | <P> | 
 | There is also a <CODE>ap_popenf</CODE> routine, which parallels the | 
 | lower-level <CODE>open</CODE> system call.  Both of these routines | 
 | arrange for the file to be closed when the resource pool in question | 
 | is cleared. | 
 | </P> | 
 | <P> | 
 | Unlike the case for memory, there <EM>are</EM> functions to close | 
 | files allocated with <CODE>ap_pfopen</CODE>, and <CODE>ap_popenf</CODE>, | 
 | namely <CODE>ap_pfclose</CODE> and <CODE>ap_pclosef</CODE>.  (This is | 
 | because, on many systems, the number of files which a single process | 
 | can have open is quite limited).  It is important to use these | 
 | functions to close files allocated with <CODE>ap_pfopen</CODE> and | 
 | <CODE>ap_popenf</CODE>, since to do otherwise could cause fatal errors on | 
 | systems such as Linux, which react badly if the same | 
 | <CODE>FILE*</CODE> is closed more than once. | 
 | </P> | 
 | <P> | 
 | (Using the <CODE>close</CODE> functions is not mandatory, since the | 
 | file will eventually be closed regardless, but you should consider it | 
 | in cases where your module is opening, or could open, a lot of files). | 
 | </P> | 
 | <H3>Other sorts of resources --- cleanup functions</H3> | 
 | <BLOCKQUOTE> | 
 | More text goes here.  Describe the the cleanup primitives in terms of | 
 | which the file stuff is implemented; also, <CODE>spawn_process</CODE>. | 
 | </BLOCKQUOTE> | 
 | <P> | 
 | Pool cleanups live until clear_pool() is called:  clear_pool(a) recursively | 
 | calls destroy_pool() on all subpools of a; then calls all the cleanups for a;  | 
 | then releases all the memory for a.  destroy_pool(a) calls clear_pool(a)  | 
 | and then releases the pool structure itself.  <EM>i.e.</EM>, clear_pool(a) doesn't | 
 | delete a, it just frees up all the resources and you can start using it | 
 | again immediately.  | 
 | </P> | 
 | <H3>Fine control --- creating and dealing with sub-pools, with a note | 
 | on sub-requests</H3> | 
 |  | 
 | On rare occasions, too-free use of <CODE>ap_palloc()</CODE> and the | 
 | associated primitives may result in undesirably profligate resource | 
 | allocation.  You can deal with such a case by creating a | 
 | <EM>sub-pool</EM>, allocating within the sub-pool rather than the main | 
 | pool, and clearing or destroying the sub-pool, which releases the | 
 | resources which were associated with it.  (This really <EM>is</EM> a | 
 | rare situation; the only case in which it comes up in the standard | 
 | module set is in case of listing directories, and then only with | 
 | <EM>very</EM> large directories.  Unnecessary use of the primitives | 
 | discussed here can hair up your code quite a bit, with very little | 
 | gain). <P> | 
 |  | 
 | The primitive for creating a sub-pool is <CODE>ap_make_sub_pool</CODE>, | 
 | which takes another pool (the parent pool) as an argument.  When the | 
 | main pool is cleared, the sub-pool will be destroyed.  The sub-pool | 
 | may also be cleared or destroyed at any time, by calling the functions | 
 | <CODE>ap_clear_pool</CODE> and <CODE>ap_destroy_pool</CODE>, respectively. | 
 | (The difference is that <CODE>ap_clear_pool</CODE> frees resources | 
 | associated with the pool, while <CODE>ap_destroy_pool</CODE> also | 
 | deallocates the pool itself.  In the former case, you can allocate new | 
 | resources within the pool, and clear it again, and so forth; in the | 
 | latter case, it is simply gone). <P> | 
 |  | 
 | One final note --- sub-requests have their own resource pools, which | 
 | are sub-pools of the resource pool for the main request.  The polite | 
 | way to reclaim the resources associated with a sub request which you | 
 | have allocated (using the <CODE>ap_sub_req_...</CODE> functions) | 
 | is <CODE>ap_destroy_sub_req</CODE>, which frees the resource pool. | 
 | Before calling this function, be sure to copy anything that you care | 
 | about which might be allocated in the sub-request's resource pool into | 
 | someplace a little less volatile (for instance, the filename in its | 
 | <CODE>request_rec</CODE> structure). <P> | 
 |  | 
 | (Again, under most circumstances, you shouldn't feel obliged to call | 
 | this function; only 2K of memory or so are allocated for a typical sub | 
 | request, and it will be freed anyway when the main request pool is | 
 | cleared.  It is only when you are allocating many, many sub-requests | 
 | for a single main request that you should seriously consider the | 
 | <CODE>ap_destroy_...</CODE> functions). | 
 |  | 
 | <H2><A NAME="config">Configuration, commands and the like</A></H2> | 
 |  | 
 | One of the design goals for this server was to maintain external | 
 | compatibility with the NCSA 1.3 server --- that is, to read the same | 
 | configuration files, to process all the directives therein correctly, | 
 | and in general to be a drop-in replacement for NCSA.  On the other | 
 | hand, another design goal was to move as much of the server's | 
 | functionality into modules which have as little as possible to do with | 
 | the monolithic server core.  The only way to reconcile these goals is | 
 | to move the handling of most commands from the central server into the | 
 | modules.  <P> | 
 |  | 
 | However, just giving the modules command tables is not enough to | 
 | divorce them completely from the server core.  The server has to | 
 | remember the commands in order to act on them later.  That involves | 
 | maintaining data which is private to the modules, and which can be | 
 | either per-server, or per-directory.  Most things are per-directory, | 
 | including in particular access control and authorization information, | 
 | but also information on how to determine file types from suffixes, | 
 | which can be modified by <CODE>AddType</CODE> and | 
 | <CODE>DefaultType</CODE> directives, and so forth.  In general, the | 
 | governing philosophy is that anything which <EM>can</EM> be made | 
 | configurable by directory should be; per-server information is | 
 | generally used in the standard set of modules for information like | 
 | <CODE>Alias</CODE>es and <CODE>Redirect</CODE>s which come into play | 
 | before the request is tied to a particular place in the underlying | 
 | file system. <P> | 
 |  | 
 | Another requirement for emulating the NCSA server is being able to | 
 | handle the per-directory configuration files, generally called | 
 | <CODE>.htaccess</CODE> files, though even in the NCSA server they can | 
 | contain directives which have nothing at all to do with access | 
 | control.  Accordingly, after URI -> filename translation, but before | 
 | performing any other phase, the server walks down the directory | 
 | hierarchy of the underlying filesystem, following the translated | 
 | pathname, to read any <CODE>.htaccess</CODE> files which might be | 
 | present.  The information which is read in then has to be | 
 | <EM>merged</EM> with the applicable information from the server's own | 
 | config files (either from the <CODE><Directory></CODE> sections | 
 | in <CODE>access.conf</CODE>, or from defaults in | 
 | <CODE>srm.conf</CODE>, which actually behaves for most purposes almost | 
 | exactly like <CODE><Directory /></CODE>).<P> | 
 |  | 
 | Finally, after having served a request which involved reading | 
 | <CODE>.htaccess</CODE> files, we need to discard the storage allocated | 
 | for handling them.  That is solved the same way it is solved wherever | 
 | else similar problems come up, by tying those structures to the | 
 | per-transaction resource pool.  <P> | 
 |  | 
 | <H3><A NAME="per-dir">Per-directory configuration structures</A></H3> | 
 |  | 
 | Let's look out how all of this plays out in <CODE>mod_mime.c</CODE>, | 
 | which defines the file typing handler which emulates the NCSA server's | 
 | behavior of determining file types from suffixes.  What we'll be | 
 | looking at, here, is the code which implements the | 
 | <CODE>AddType</CODE> and <CODE>AddEncoding</CODE> commands.  These | 
 | commands can appear in <CODE>.htaccess</CODE> files, so they must be | 
 | handled in the module's private per-directory data, which in fact, | 
 | consists of two separate <CODE>table</CODE>s for MIME types and | 
 | encoding information, and is declared as follows: | 
 |  | 
 | <PRE> | 
 | typedef struct { | 
 |     table *forced_types;      /* Additional AddTyped stuff */ | 
 |     table *encoding_types;    /* Added with AddEncoding... */ | 
 | } mime_dir_config; | 
 | </PRE> | 
 |  | 
 | When the server is reading a configuration file, or | 
 | <CODE><Directory></CODE> section, which includes one of the MIME | 
 | module's commands, it needs to create a <CODE>mime_dir_config</CODE> | 
 | structure, so those commands have something to act on.  It does this | 
 | by invoking the function it finds in the module's `create per-dir | 
 | config slot', with two arguments: the name of the directory to which | 
 | this configuration information applies (or <CODE>NULL</CODE> for | 
 | <CODE>srm.conf</CODE>), and a pointer to a resource pool in which the | 
 | allocation should happen. <P> | 
 |  | 
 | (If we are reading a <CODE>.htaccess</CODE> file, that resource pool | 
 | is the per-request resource pool for the request; otherwise it is a | 
 | resource pool which is used for configuration data, and cleared on | 
 | restarts.  Either way, it is important for the structure being created | 
 | to vanish when the pool is cleared, by registering a cleanup on the | 
 | pool if necessary). <P> | 
 |  | 
 | For the MIME module, the per-dir config creation function just | 
 | <CODE>ap_palloc</CODE>s the structure above, and a creates a couple of | 
 | <CODE>table</CODE>s to fill it.  That looks like this: | 
 |  | 
 | <PRE> | 
 | void *create_mime_dir_config (pool *p, char *dummy) | 
 | { | 
 |     mime_dir_config *new = | 
 |       (mime_dir_config *) ap_palloc (p, sizeof(mime_dir_config)); | 
 |  | 
 |     new->forced_types = ap_make_table (p, 4); | 
 |     new->encoding_types = ap_make_table (p, 4); | 
 |  | 
 |     return new; | 
 | } | 
 | </PRE> | 
 |  | 
 | Now, suppose we've just read in a <CODE>.htaccess</CODE> file.  We | 
 | already have the per-directory configuration structure for the next | 
 | directory up in the hierarchy.  If the <CODE>.htaccess</CODE> file we | 
 | just read in didn't have any <CODE>AddType</CODE> or | 
 | <CODE>AddEncoding</CODE> commands, its per-directory config structure | 
 | for the MIME module is still valid, and we can just use it. | 
 | Otherwise, we need to merge the two structures somehow. <P> | 
 |  | 
 | To do that, the server invokes the module's per-directory config merge | 
 | function, if one is present.  That function takes three arguments: | 
 | the two structures being merged, and a resource pool in which to | 
 | allocate the result.  For the MIME module, all that needs to be done | 
 | is overlay the tables from the new per-directory config structure with | 
 | those from the parent: | 
 |  | 
 | <PRE> | 
 | void *merge_mime_dir_configs (pool *p, void *parent_dirv, void *subdirv) | 
 | { | 
 |     mime_dir_config *parent_dir = (mime_dir_config *)parent_dirv; | 
 |     mime_dir_config *subdir = (mime_dir_config *)subdirv; | 
 |     mime_dir_config *new = | 
 |       (mime_dir_config *)ap_palloc (p, sizeof(mime_dir_config)); | 
 |  | 
 |     new->forced_types = ap_overlay_tables (p, subdir->forced_types, | 
 |                                         parent_dir->forced_types); | 
 |     new->encoding_types = ap_overlay_tables (p, subdir->encoding_types, | 
 |                                           parent_dir->encoding_types); | 
 |  | 
 |     return new; | 
 | } | 
 | </PRE> | 
 |  | 
 | As a note --- if there is no per-directory merge function present, the | 
 | server will just use the subdirectory's configuration info, and ignore | 
 | the parent's.  For some modules, that works just fine (<EM>e.g.</EM>, for the | 
 | includes module, whose per-directory configuration information | 
 | consists solely of the state of the <CODE>XBITHACK</CODE>), and for | 
 | those modules, you can just not declare one, and leave the | 
 | corresponding structure slot in the module itself <CODE>NULL</CODE>.<P> | 
 |  | 
 | <H3><A NAME="commands">Command handling</A></H3> | 
 |  | 
 | Now that we have these structures, we need to be able to figure out | 
 | how to fill them.  That involves processing the actual | 
 | <CODE>AddType</CODE> and <CODE>AddEncoding</CODE> commands.  To find | 
 | commands, the server looks in the module's <CODE>command table</CODE>. | 
 | That table contains information on how many arguments the commands | 
 | take, and in what formats, where it is permitted, and so forth.  That | 
 | information is sufficient to allow the server to invoke most | 
 | command-handling functions with pre-parsed arguments.  Without further | 
 | ado, let's look at the <CODE>AddType</CODE> command handler, which | 
 | looks like this (the <CODE>AddEncoding</CODE> command looks basically | 
 | the same, and won't be shown here): | 
 |  | 
 | <PRE> | 
 | char *add_type(cmd_parms *cmd, mime_dir_config *m, char *ct, char *ext) | 
 | { | 
 |     if (*ext == '.') ++ext; | 
 |     ap_table_set (m->forced_types, ext, ct); | 
 |     return NULL; | 
 | } | 
 | </PRE> | 
 |  | 
 | This command handler is unusually simple.  As you can see, it takes | 
 | four arguments, two of which are pre-parsed arguments, the third being | 
 | the per-directory configuration structure for the module in question, | 
 | and the fourth being a pointer to a <CODE>cmd_parms</CODE> structure. | 
 | That structure contains a bunch of arguments which are frequently of | 
 | use to some, but not all, commands, including a resource pool (from | 
 | which memory can be allocated, and to which cleanups should be tied), | 
 | and the (virtual) server being configured, from which the module's | 
 | per-server configuration data can be obtained if required.<P> | 
 |  | 
 | Another way in which this particular command handler is unusually | 
 | simple is that there are no error conditions which it can encounter. | 
 | If there were, it could return an error message instead of | 
 | <CODE>NULL</CODE>; this causes an error to be printed out on the | 
 | server's <CODE>stderr</CODE>, followed by a quick exit, if it is in | 
 | the main config files; for a <CODE>.htaccess</CODE> file, the syntax | 
 | error is logged in the server error log (along with an indication of | 
 | where it came from), and the request is bounced with a server error | 
 | response (HTTP error status, code 500). <P> | 
 |  | 
 | The MIME module's command table has entries for these commands, which | 
 | look like this: | 
 |  | 
 | <PRE> | 
 | command_rec mime_cmds[] = { | 
 | { "AddType", add_type, NULL, OR_FILEINFO, TAKE2, | 
 |     "a mime type followed by a file extension" }, | 
 | { "AddEncoding", add_encoding, NULL, OR_FILEINFO, TAKE2, | 
 |     "an encoding (<EM>e.g.</EM>, gzip), followed by a file extension" }, | 
 | { NULL } | 
 | }; | 
 | </PRE> | 
 |  | 
 | The entries in these tables are: | 
 |  | 
 | <UL> | 
 |   <LI> The name of the command | 
 |   <LI> The function which handles it | 
 |   <LI> a <CODE>(void *)</CODE> pointer, which is passed in the | 
 |        <CODE>cmd_parms</CODE> structure to the command handler --- | 
 |        this is useful in case many similar commands are handled by the | 
 |        same function. | 
 |   <LI> A bit mask indicating where the command may appear.  There are | 
 |        mask bits corresponding to each <CODE>AllowOverride</CODE> | 
 |        option, and an additional mask bit, <CODE>RSRC_CONF</CODE>, | 
 |        indicating that the command may appear in the server's own | 
 |        config files, but <EM>not</EM> in any <CODE>.htaccess</CODE> | 
 |        file. | 
 |   <LI> A flag indicating how many arguments the command handler wants | 
 |        pre-parsed, and how they should be passed in. | 
 |        <CODE>TAKE2</CODE> indicates two pre-parsed arguments.  Other | 
 |        options are <CODE>TAKE1</CODE>, which indicates one pre-parsed | 
 |        argument, <CODE>FLAG</CODE>, which indicates that the argument | 
 |        should be <CODE>On</CODE> or <CODE>Off</CODE>, and is passed in | 
 |        as a boolean flag, <CODE>RAW_ARGS</CODE>, which causes the | 
 |        server to give the command the raw, unparsed arguments | 
 |        (everything but the command name itself).  There is also | 
 |        <CODE>ITERATE</CODE>, which means that the handler looks the | 
 |        same as <CODE>TAKE1</CODE>, but that if multiple arguments are | 
 |        present, it should be called multiple times, and finally | 
 |        <CODE>ITERATE2</CODE>, which indicates that the command handler | 
 |        looks like a <CODE>TAKE2</CODE>, but if more arguments are | 
 |        present, then it should be called multiple times, holding the | 
 |        first argument constant. | 
 |   <LI> Finally, we have a string which describes the arguments that | 
 |        should be present.  If the arguments in the actual config file | 
 |        are not as required, this string will be used to help give a | 
 |        more specific error message.  (You can safely leave this | 
 |        <CODE>NULL</CODE>). | 
 | </UL> | 
 |  | 
 | Finally, having set this all up, we have to use it.  This is | 
 | ultimately done in the module's handlers, specifically for its | 
 | file-typing handler, which looks more or less like this; note that the | 
 | per-directory configuration structure is extracted from the | 
 | <CODE>request_rec</CODE>'s per-directory configuration vector by using | 
 | the <CODE>ap_get_module_config</CODE> function. | 
 |  | 
 | <PRE> | 
 | int find_ct(request_rec *r) | 
 | { | 
 |     int i; | 
 |     char *fn = ap_pstrdup (r->pool, r->filename); | 
 |     mime_dir_config *conf = (mime_dir_config *) | 
 |              ap_get_module_config(r->per_dir_config, &mime_module); | 
 |     char *type; | 
 |  | 
 |     if (S_ISDIR(r->finfo.st_mode)) { | 
 |         r->content_type = DIR_MAGIC_TYPE; | 
 |         return OK; | 
 |     } | 
 |  | 
 |     if((i=ap_rind(fn,'.')) < 0) return DECLINED; | 
 |     ++i; | 
 |  | 
 |     if ((type = ap_table_get (conf->encoding_types, &fn[i]))) | 
 |     { | 
 |         r->content_encoding = type; | 
 |  | 
 |         /* go back to previous extension to try to use it as a type */ | 
 |  | 
 |         fn[i-1] = '\0'; | 
 |         if((i=ap_rind(fn,'.')) < 0) return OK; | 
 |         ++i; | 
 |     } | 
 |  | 
 |     if ((type = ap_table_get (conf->forced_types, &fn[i]))) | 
 |     { | 
 |         r->content_type = type; | 
 |     } | 
 |  | 
 |     return OK; | 
 | } | 
 |  | 
 | </PRE> | 
 |  | 
 | <H3><A NAME="servconf">Side notes --- per-server configuration, virtual | 
 |  servers, <EM>etc</EM>.</A></H3> | 
 |  | 
 | The basic ideas behind per-server module configuration are basically | 
 | the same as those for per-directory configuration; there is a creation | 
 | function and a merge function, the latter being invoked where a | 
 | virtual server has partially overridden the base server configuration, | 
 | and a combined structure must be computed.  (As with per-directory | 
 | configuration, the default if no merge function is specified, and a | 
 | module is configured in some virtual server, is that the base | 
 | configuration is simply ignored). <P> | 
 |  | 
 | The only substantial difference is that when a command needs to | 
 | configure the per-server private module data, it needs to go to the | 
 | <CODE>cmd_parms</CODE> data to get at it.  Here's an example, from the | 
 | alias module, which also indicates how a syntax error can be returned | 
 | (note that the per-directory configuration argument to the command | 
 | handler is declared as a dummy, since the module doesn't actually have | 
 | per-directory config data): | 
 |  | 
 | <PRE> | 
 | char *add_redirect(cmd_parms *cmd, void *dummy, char *f, char *url) | 
 | { | 
 |     server_rec *s = cmd->server; | 
 |     alias_server_conf *conf = (alias_server_conf *) | 
 |             ap_get_module_config(s->module_config,&alias_module); | 
 |     alias_entry *new = ap_push_array (conf->redirects); | 
 |  | 
 |     if (!ap_is_url (url)) return "Redirect to non-URL"; | 
 |  | 
 |     new->fake = f; new->real = url; | 
 |     return NULL; | 
 | } | 
 | </PRE> | 
 | <!--#include virtual="footer.html" --> | 
 | </BODY></HTML> |