| <html><head> |
| <title>Apache API notes</title> |
| </head> |
| <body> |
| <!--#include virtual="header.html" --> |
| <h1>Apache API notes</h1> |
| |
| These are some notes on the Apache API and the data structures you |
| have to deal with, etc. They are not yet nearly complete, but |
| hopefully, they will help you get your bearings. Keep in mind that |
| the API is still subject to change as we gain experience with it. |
| (See the TODO file for what <em>might</em> be coming). However, |
| it will be easy to adapt modules to any changes that are made. |
| (We have more modules to adapt than you do). |
| <p> |
| |
| A few notes on general pedagogical style here. In the interest of |
| conciseness, all structure declarations here are incomplete --- the |
| real ones have more slots that I'm not telling you about. For the |
| most part, these are reserved to one component of the server core or |
| another, and should be altered by modules with caution. However, in |
| some cases, they really are things I just haven't gotten around to |
| yet. Welcome to the bleeding edge.<p> |
| |
| Finally, here's an outline, to give you some bare idea of what's |
| coming up, and in what order: |
| |
| <ul> |
| <li> <a href="#basics">Basic concepts.</a> |
| <menu> |
| <li> <a href="#HMR">Handlers, Modules, and Requests</a> |
| <li> <a href="#moduletour">A brief tour of a module</a> |
| </menu> |
| <li> <a href="#handlers">How handlers work</a> |
| <menu> |
| <li> <a href="#req_tour">A brief tour of the <code>request_rec</code></a> |
| <li> <a href="#req_orig">Where request_rec structures come from</a> |
| <li> <a href="#req_return">Handling requests, declining, and returning error codes</a> |
| <li> <a href="#resp_handlers">Special considerations for response handlers</a> |
| <li> <a href="#auth_handlers">Special considerations for authentication handlers</a> |
| <li> <a href="#log_handlers">Special considerations for logging handlers</a> |
| </menu> |
| <li> <a href="#pools">Resource allocation and resource pools</a> |
| <li> <a href="#config">Configuration, commands and the like</a> |
| <menu> |
| <li> <a href="#per-dir">Per-directory configuration structures</a> |
| <li> <a href="#commands">Command handling</a> |
| <li> <a href="#servconf">Side notes --- per-server configuration, virtual servers, etc.</a> |
| </menu> |
| </ul> |
| |
| <h2><a name="basics">Basic concepts.</a></h2> |
| |
| We begin with an overview of the basic concepts behind the |
| API, and how they are manifested in the code. |
| |
| <h3><a name="HMR">Handlers, Modules, and Requests</a></h3> |
| |
| Apache breaks down request handling into a series of steps, more or |
| less the same way the Netscape server API does (although this API has |
| a few more stages than NetSite does, as hooks for stuff I thought |
| might be useful in the future). These are: |
| |
| <ul> |
| <li> URI -> Filename translation |
| <li> Auth ID checking [is the user who they say they are?] |
| <li> Auth access checking [is the user authorized <em>here</em>?] |
| <li> Access checking other than auth |
| <li> Determining MIME type of the object requested |
| <li> `Fixups' --- there aren't any of these yet, but the phase is |
| intended as a hook for possible extensions like |
| <code>SetEnv</code>, which don't really fit well elsewhere. |
| <li> Actually sending a response back to the client. |
| <li> Logging the request |
| </ul> |
| |
| These phases are handled by looking at each of a succession of |
| <em>modules</em>, looking to see if each of them has a handler for the |
| phase, and attempting invoking it if so. The handler can typically do |
| one of three things: |
| |
| <ul> |
| <li> <em>Handle</em> the request, and indicate that it has done so |
| by returning the magic constant <code>OK</code>. |
| <li> <em>Decline</em> to handle the request, by returning the magic |
| integer constant <code>DECLINED</code>. In this case, the |
| server behaves in all respects as if the handler simply hadn't |
| been there. |
| <li> Signal an error, by returning one of the HTTP error codes. |
| This terminates normal handling of the request, although an |
| ErrorDocument may be invoked to try to mop up, and it will be |
| logged in any case. |
| </ul> |
| |
| Most phases are terminated by the first module that handles them; |
| however, for logging, `fixups', and non-access authentication |
| checking, all handlers always run (barring an error). Also, the |
| response phase is unique in that modules may declare multiple handlers |
| for it, via a dispatch table keyed on the MIME type of the requested |
| object. Modules may declare a response-phase handler which can handle |
| <em>any</em> request, by giving it the key <code>*/*</code> (i.e., a |
| wildcard MIME type specification). However, wildcard handlers are |
| only invoked if the server has already tried and failed to find a more |
| specific response handler for the MIME type of the requested object |
| (either none existed, or they all declined).<p> |
| |
| The handlers themselves are functions of one argument (a |
| <code>request_rec</code> structure. vide infra), which returns an |
| integer, as above.<p> |
| |
| <h3><a name="moduletour">A brief tour of a module</a></h3> |
| |
| At this point, we need to explain the structure of a module. Our |
| candidate will be one of the messier ones, the CGI module --- this |
| handles both CGI scripts and the <code>ScriptAlias</code> config file |
| command. It's actually a great deal more complicated than most |
| modules, but if we're going to have only one example, it might as well |
| be the one with its fingers in every place.<p> |
| |
| Let's begin with handlers. In order to handle the CGI scripts, the |
| module declares a response handler for them. Because of |
| <code>ScriptAlias</code>, it also has handlers for the name |
| translation phase (to recognise <code>ScriptAlias</code>ed URIs), the |
| type-checking phase (any <code>ScriptAlias</code>ed request is typed |
| as a CGI script).<p> |
| |
| The module needs to maintain some per (virtual) |
| server information, namely, the <code>ScriptAlias</code>es in effect; |
| the module structure therefore contains pointers to a functions which |
| builds these structures, and to another which combines two of them (in |
| case the main server and a virtual server both have |
| <code>ScriptAlias</code>es declared).<p> |
| |
| Finally, this module contains code to handle the |
| <code>ScriptAlias</code> command itself. This particular module only |
| declares one command, but there could be more, so modules have |
| <em>command tables</em> which declare their commands, and describe |
| where they are permitted, and how they are to be invoked. <p> |
| |
| A final note on the declared types of the arguments of some of these |
| commands: a <code>pool</code> is a pointer to a <em>resource pool</em> |
| structure; these are used by the server to keep track of the memory |
| which has been allocated, files opened, etc., either to service a |
| particular request, or to handle the process of configuring itself. |
| That way, when the request is over (or, for the configuration pool, |
| when the server is restarting), the memory can be freed, and the files |
| closed, <i>en masse</i>, without anyone having to write explicit code to |
| track them all down and dispose of them. Also, a |
| <code>cmd_parms</code> structure contains various information about |
| the config file being read, and other status information, which is |
| sometimes of use to the function which processes a config-file command |
| (such as <code>ScriptAlias</code>). |
| |
| With no further ado, the module itself: |
| |
| <pre> |
| /* Declarations of handlers. */ |
| |
| int translate_scriptalias (request_rec *); |
| int type_scriptalias (request_rec *); |
| int cgi_handler (request_rec *); |
| |
| /* Subsidiary dispatch table for response-phase handlers, by MIME type */ |
| |
| handler_rec cgi_handlers[] = { |
| { "application/x-httpd-cgi", cgi_handler }, |
| { NULL } |
| }; |
| |
| /* Declarations of routines to manipulate the module's configuration |
| * info. Note that these are returned, and passed in, as void *'s; |
| * the server core keeps track of them, but it doesn't, and can't, |
| * know their internal structure. |
| */ |
| |
| void *make_cgi_server_config (pool *); |
| void *merge_cgi_server_config (pool *, void *, void *); |
| |
| /* Declarations of routines to handle config-file commands */ |
| |
| extern char *script_alias(cmd_parms *, void *per_dir_config, char *fake, |
| char *real); |
| |
| command_rec cgi_cmds[] = { |
| { "ScriptAlias", script_alias, NULL, RSRC_CONF, TAKE2, |
| "a fakename and a realname"}, |
| { NULL } |
| }; |
| |
| module cgi_module = { |
| STANDARD_MODULE_STUFF, |
| NULL, /* initializer */ |
| NULL, /* dir config creator */ |
| NULL, /* dir merger --- default is to override */ |
| make_cgi_server_config, /* server config */ |
| merge_cgi_server_config, /* merge server config */ |
| cgi_cmds, /* command table */ |
| cgi_handlers, /* handlers */ |
| translate_scriptalias, /* filename translation */ |
| NULL, /* check_user_id */ |
| NULL, /* check auth */ |
| NULL, /* check access */ |
| type_scriptalias, /* type_checker */ |
| NULL, /* fixups */ |
| NULL /* logger */ |
| }; |
| </pre> |
| |
| <h2><a name="handlers">How handlers work</a></h2> |
| |
| The sole argument to handlers is a <code>request_rec</code> structure. |
| This structure describes a particular request which has been made to |
| the server, on behalf of a client. In most cases, each connection to |
| the client generates only one <code>request_rec</code> structure.<p> |
| |
| <h3><a name="req_tour">A brief tour of the <code>request_rec</code></a></h3> |
| |
| The <code>request_rec</code> contains pointers to a resource pool |
| which will be cleared when the server is finished handling the |
| request; to structures containing per-server and per-connection |
| information, and most importantly, information on the request itself.<p> |
| |
| The most important such information is a small set of character |
| strings describing attributes of the object being requested, including |
| its URI, filename, content-type and content-encoding (these being filled |
| in by the translation and type-check handlers which handle the |
| request, respectively). <p> |
| |
| Other commonly used data items are tables giving the MIME headers on |
| the client's original request, MIME headers to be sent back with the |
| response (which modules can add to at will), and environment variables |
| for any subprocesses which are spawned off in the course of servicing |
| the request. These tables are manipulated using the |
| <code>table_get</code> and <code>table_set</code> routines. <p> |
| |
| Finally, there are pointers to two data structures which, in turn, |
| point to per-module configuration structures. Specifically, these |
| hold pointers to the data structures which the module has built to |
| describe the way it has been configured to operate in a given |
| directory (via <code>.htaccess</code> files or |
| <code><Directory></code> sections), for private data it has |
| built in the course of servicing the request (so modules' handlers for |
| one phase can pass `notes' to their handlers for other phases). There |
| is another such configuration vector in the <code>server_rec</code> |
| data structure pointed to by the <code>request_rec</code>, which |
| contains per (virtual) server configuration data.<p> |
| |
| Here is an abridged declaration, giving the fields most commonly used:<p> |
| |
| <pre> |
| struct request_rec { |
| |
| pool *pool; |
| conn_rec *connection; |
| server_rec *server; |
| |
| /* What object is being requested */ |
| |
| char *uri; |
| char *filename; |
| char *path_info; |
| char *args; /* QUERY_ARGS, if any */ |
| struct stat finfo; /* Set by server core; |
| * st_mode set to zero if no such file */ |
| |
| char *content_type; |
| char *content_encoding; |
| |
| /* MIME header environments, in and out. Also, an array containing |
| * environment variables to be passed to subprocesses, so people can |
| * write modules to add to that environment. |
| * |
| * The difference between headers_out and err_headers_out is that |
| * the latter are printed even on error, and persist across internal |
| * redirects (so the headers printed for ErrorDocument handlers will |
| * have them). |
| */ |
| |
| table *headers_in; |
| table *headers_out; |
| table *err_headers_out; |
| table *subprocess_env; |
| |
| /* Info about the request itself... */ |
| |
| int header_only; /* HEAD request, as opposed to GET */ |
| char *protocol; /* Protocol, as given to us, or HTTP/0.9 */ |
| char *method; /* GET, HEAD, POST, etc. */ |
| int method_number; /* M_GET, M_POST, etc. */ |
| |
| /* Info for logging */ |
| |
| char *the_request; |
| int bytes_sent; |
| |
| /* A flag which modules can set, to indicate that the data being |
| * returned is volatile, and clients should be told not to cache it. |
| */ |
| |
| int no_cache; |
| |
| /* Various other config info which may change with .htaccess files |
| * These are config vectors, with one void* pointer for each module |
| * (the thing pointed to being the module's business). |
| */ |
| |
| void *per_dir_config; /* Options set in config files, etc. */ |
| void *request_config; /* Notes on *this* request */ |
| |
| }; |
| |
| </pre> |
| |
| <h3><a name="req_orig">Where request_rec structures come from</a></h3> |
| |
| Most <code>request_rec</code> structures are built by reading an HTTP |
| request from a client, and filling in the fields. However, there are |
| a few exceptions: |
| |
| <ul> |
| <li> If the request is to an imagemap, a type map (i.e., a |
| <code>*.var</code> file), or a CGI script which returned a |
| local `Location:', then the resource which the user requested |
| is going to be ultimately located by some URI other than what |
| the client originally supplied. In this case, the server does |
| an <em>internal redirect</em>, constructing a new |
| <code>request_rec</code> for the new URI, and processing it |
| almost exactly as if the client had requested the new URI |
| directly. <p> |
| |
| <li> If some handler signaled an error, and an |
| <code>ErrorDocument</code> is in scope, the same internal |
| redirect machinery comes into play.<p> |
| |
| <li> Finally, a handler occasionally needs to investigate `what |
| would happen if' some other request were run. For instance, |
| the directory indexing module needs to know what MIME type |
| would be assigned to a request for each directory entry, in |
| order to figure out what icon to use.<p> |
| |
| Such handlers can construct a <em>sub-request</em>, using the |
| functions <code>sub_req_lookup_file</code> and |
| <code>sub_req_lookup_uri</code>; this constructs a new |
| <code>request_rec</code> structure and processes it as you |
| would expect, up to but not including the point of actually |
| sending a response. (These functions skip over the access |
| checks if the sub-request is for a file in the same directory |
| as the original request).<p> |
| |
| (Server-side includes work by building sub-requests and then |
| actually invoking the response handler for them, via the |
| function <code>run_sub_request</code>). |
| </ul> |
| |
| <h3><a name="req_return">Handling requests, declining, and returning error codes</a></h3> |
| |
| As discussed above, each handler, when invoked to handle a particular |
| <code>request_rec</code>, has to return an <code>int</code> to |
| indicate what happened. That can either be |
| |
| <ul> |
| <li> OK --- the request was handled successfully. This may or may |
| not terminate the phase. |
| <li> DECLINED --- no erroneous condition exists, but the module |
| declines to handle the phase; the server tries to find another. |
| <li> an HTTP error code, which aborts handling of the request. |
| </ul> |
| |
| Note that if the error code returned is <code>REDIRECT</code>, then |
| the module should put a <code>Location</code> in the request's |
| <code>headers_out</code>, to indicate where the client should be |
| redirected <em>to</em>. <p> |
| |
| <h3><a name="resp_handlers">Special considerations for response handlers</a></h3> |
| |
| Handlers for most phases do their work by simply setting a few fields |
| in the <code>request_rec</code> structure (or, in the case of access |
| checkers, simply by returning the correct error code). However, |
| response handlers have to actually send a request back to the client. <p> |
| |
| They should begin by sending an HTTP response header, using the |
| function <code>send_http_header</code>. (You don't have to do |
| anything special to skip sending the header for HTTP/0.9 requests; the |
| function figures out on its own that it shouldn't do anything). If |
| the request is marked <code>header_only</code>, that's all they should |
| do; they should return after that, without attempting any further |
| output. <p> |
| |
| Otherwise, they should produce a request body which responds to the |
| client as appropriate. The primitives for this are <code>rputc</code> |
| and <code>rprintf</code>, for internally generated output, and |
| <code>send_fd</code>, to copy the contents of some <code>FILE *</code> |
| straight to the client. <p> |
| |
| At this point, you should more or less understand the following piece |
| of code, which is the handler which handles <code>GET</code> requests |
| which have no more specific handler; it also shows how conditional |
| <code>GET</code>s can be handled, if it's desirable to do so in a |
| particular response handler --- <code>set_last_modified</code> checks |
| against the <code>If-modified-since</code> value supplied by the |
| client, if any, and returns an appropriate code (which will, if |
| nonzero, be USE_LOCAL_COPY). No similar considerations apply for |
| <code>set_content_length</code>, but it returns an error code for |
| symmetry.<p> |
| |
| <pre> |
| int default_handler (request_rec *r) |
| { |
| int errstatus; |
| FILE *f; |
| |
| if (r->method_number != M_GET) return DECLINED; |
| if (r->finfo.st_mode == 0) return NOT_FOUND; |
| |
| if ((errstatus = set_content_length (r, r->finfo.st_size)) |
| || (errstatus = set_last_modified (r, r->finfo.st_mtime))) |
| return errstatus; |
| |
| f = fopen (r->filename, "r"); |
| |
| if (f == NULL) { |
| log_reason("file permissions deny server access", |
| r->filename, r); |
| return FORBIDDEN; |
| } |
| |
| register_timeout ("send", r); |
| send_http_header (r); |
| |
| if (!r->header_only) send_fd (f, r); |
| pfclose (r->pool, f); |
| return OK; |
| } |
| </pre> |
| |
| Finally, if all of this is too much of a challenge, there are a few |
| ways out of it. First off, as shown above, a response handler which |
| has not yet produced any output can simply return an error code, in |
| which case the server will automatically produce an error response. |
| Secondly, it can punt to some other handler by invoking |
| <code>internal_redirect</code>, which is how the internal redirection |
| machinery discussed above is invoked. A response handler which has |
| internally redirected should always return <code>OK</code>. <p> |
| |
| (Invoking <code>internal_redirect</code> from handlers which are |
| <em>not</em> response handlers will lead to serious confusion). |
| |
| <h3><a name="auth_handlers">Special considerations for authentication handlers</a></h3> |
| |
| Stuff that should be discussed here in detail: |
| |
| <ul> |
| <li> Authentication-phase handlers not invoked unless auth is |
| configured for the directory. |
| <li> Common auth configuration stored in the core per-dir |
| configuration; it has accessors <code>auth_type</code>, |
| <code>auth_name</code>, and <code>requires</code>. |
| <li> Common routines, to handle the protocol end of things, at least |
| for HTTP basic authentication (<code>get_basic_auth_pw</code>, |
| which sets the <code>connection->user</code> structure field |
| automatically, and <code>note_basic_auth_failure</code>, which |
| arranges for the proper <code>WWW-Authenticate:</code> header |
| to be sent back). |
| </ul> |
| |
| <h3><a name="log_handlers">Special considerations for logging handlers</a></h3> |
| |
| When a request has internally redirected, there is the question of |
| what to log. Apache handles this by bundling the entire chain of |
| redirects into a list of <code>request_rec</code> structures which are |
| threaded through the <code>r->prev</code> and <code>r->next</code> |
| pointers. The <code>request_rec</code> which is passed to the logging |
| handlers in such cases is the one which was originally built for the |
| initial request from the client; note that the bytes_sent field will |
| only be correct in the last request in the chain (the one for which a |
| response was actually sent). |
| |
| <h2><a name="pools">Resource allocation and resource pools</a></h2> |
| |
| One of the problems of writing and designing a server-pool server is |
| that of preventing leakage, that is, allocating resources (memory, |
| open files, etc.), without subsequently releasing them. The resource |
| pool machinery is designed to make it easy to prevent this from |
| happening, by allowing resource to be allocated in such a way that |
| they are <em>automatically</em> released when the server is done with |
| them. <p> |
| |
| The way this works is as follows: the memory which is allocated, file |
| opened, etc., to deal with a particular request are tied to a |
| <em>resource pool</em> which is allocated for the request. The pool |
| is a data structure which itself tracks the resources in question. <p> |
| |
| When the request has been processed, the pool is <em>cleared</em>. At |
| that point, all the memory associated with it is released for reuse, |
| all files associated with it are closed, and any other clean-up |
| functions which are associated with the pool are run. When this is |
| over, we can be confident that all the resource tied to the pool have |
| been released, and that none of them have leaked. <p> |
| |
| Server restarts, and allocation of memory and resources for per-server |
| configuration, are handled in a similar way. There is a |
| <em>configuration pool</em>, which keeps track of resources which were |
| allocated while reading the server configuration files, and handling |
| the commands therein (for instance, the memory that was allocated for |
| per-server module configuration, log files and other files that were |
| opened, and so forth). When the server restarts, and has to reread |
| the configuration files, the configuration pool is cleared, and so the |
| memory and file descriptors which were taken up by reading them the |
| last time are made available for reuse. <p> |
| |
| It should be noted that use of the pool machinery isn't generally |
| obligatory, except for situations like logging handlers, where you |
| really need to register cleanups to make sure that the log file gets |
| closed when the server restarts (this is most easily done by using the |
| function <code><a href="#pool-files">pfopen</a></code>, which also |
| arranges for the underlying file descriptor to be closed before any |
| child processes, such as for CGI scripts, are <code>exec</code>ed), or |
| in case you are using the timeout machinery (which isn't yet even |
| documented here). However, there are two benefits to using it: |
| resources allocated to a pool never leak (even if you allocate a |
| scratch string, and just forget about it); also, for memory |
| allocation, <code>palloc</code> is generally faster than |
| <code>malloc</code>.<p> |
| |
| We begin here by describing how memory is allocated to pools, and then |
| discuss how other resources are tracked by the resource pool |
| machinery. |
| |
| <h3>Allocation of memory in pools</h3> |
| |
| Memory is allocated to pools by calling the function |
| <code>palloc</code>, which takes two arguments, one being a pointer to |
| a resource pool structure, and the other being the amount of memory to |
| allocate (in <code>char</code>s). Within handlers for handling |
| requests, the most common way of getting a resource pool structure is |
| by looking at the <code>pool</code> slot of the relevant |
| <code>request_rec</code>; hence the repeated appearance of the |
| following idiom in module code: |
| |
| <pre> |
| int my_handler(request_rec *r) |
| { |
| struct my_structure *foo; |
| ... |
| |
| foo = (foo *)palloc (r->pool, sizeof(my_structure)); |
| } |
| </pre> |
| |
| Note that <em>there is no <code>pfree</code></em> --- |
| <code>palloc</code>ed memory is freed only when the associated |
| resource pool is cleared. This means that <code>palloc</code> does not |
| have to do as much accounting as <code>malloc()</code>; all it does in |
| the typical case is to round up the size, bump a pointer, and do a |
| range check.<p> |
| |
| (It also raises the possibility that heavy use of <code>palloc</code> |
| could cause a server process to grow excessively large. There are |
| two ways to deal with this, which are dealt with below; briefly, you |
| can use <code>malloc</code>, and try to be sure that all of the memory |
| gets explicitly <code>free</code>d, or you can allocate a sub-pool of |
| the main pool, allocate your memory in the sub-pool, and clear it out |
| periodically. The latter technique is discussed in the section on |
| sub-pools below, and is used in the directory-indexing code, in order |
| to avoid excessive storage allocation when listing directories with |
| thousands of files). |
| |
| <h3>Allocating initialized memory</h3> |
| |
| There are functions which allocate initialized memory, and are |
| frequently useful. The function <code>pcalloc</code> has the same |
| interface as <code>palloc</code>, but clears out the memory it |
| allocates before it returns it. The function <code>pstrdup</code> |
| takes a resource pool and a <code>char *</code> as arguments, and |
| allocates memory for a copy of the string the pointer points to, |
| returning a pointer to the copy. Finally <code>pstrcat</code> is a |
| varargs-style function, which takes a pointer to a resource pool, and |
| at least two <code>char *</code> arguments, the last of which must be |
| <code>NULL</code>. It allocates enough memory to fit copies of each |
| of the strings, as a unit; for instance: |
| |
| <pre> |
| pstrcat (r->pool, "foo", "/", "bar", NULL); |
| </pre> |
| |
| returns a pointer to 8 bytes worth of memory, initialized to |
| <code>"foo/bar"</code>. |
| |
| <h3>Tracking open files, etc.</h3> |
| |
| As indicated above, resource pools are also used to track other sorts |
| of resources besides memory. The most common are open files. The |
| routine which is typically used for this is <code>pfopen</code>, which |
| takes a resource pool and two strings as arguments; the strings are |
| the same as the typical arguments to <code>fopen</code>, e.g., |
| |
| <pre> |
| ... |
| FILE *f = pfopen (r->pool, r->filename, "r"); |
| |
| if (f == NULL) { ... } else { ... } |
| </pre> |
| |
| There is also a <code>popenf</code> routine, which parallels the |
| lower-level <code>open</code> system call. Both of these routines |
| arrange for the file to be closed when the resource pool in question |
| is cleared. <p> |
| |
| Unlike the case for memory, there <em>are</em> functions to close |
| files allocated with <code>pfopen</code>, and <code>popenf</code>, |
| namely <code>pfclose</code> and <code>pclosef</code>. (This is |
| because, on many systems, the number of files which a single process |
| can have open is quite limited). It is important to use these |
| functions to close files allocated with <code>pfopen</code> and |
| <code>popenf</code>, since to do otherwise could cause fatal errors on |
| systems such as Linux, which react badly if the same |
| <code>FILE*</code> is closed more than once. <p> |
| |
| (Using the <code>close</code> functions is not mandatory, since the |
| file will eventually be closed regardless, but you should consider it |
| in cases where your module is opening, or could open, a lot of files). |
| |
| <h3>Other sorts of resources --- cleanup functions</h3> |
| |
| More text goes here. Describe the the cleanup primitives in terms of |
| which the file stuff is implemented; also, <code>spawn_process</code>. |
| |
| <h3>Fine control --- creating and dealing with sub-pools, with a note |
| on sub-requests</h3> |
| |
| On rare occasions, too-free use of <code>palloc()</code> and the |
| associated primitives may result in undesirably profligate resource |
| allocation. You can deal with such a case by creating a |
| <em>sub-pool</em>, allocating within the sub-pool rather than the main |
| pool, and clearing or destroying the sub-pool, which releases the |
| resources which were associated with it. (This really <em>is</em> a |
| rare situation; the only case in which it comes up in the standard |
| module set is in case of listing directories, and then only with |
| <em>very</em> large directories. Unnecessary use of the primitives |
| discussed here can hair up your code quite a bit, with very little |
| gain). <p> |
| |
| The primitive for creating a sub-pool is <code>make_sub_pool</code>, |
| which takes another pool (the parent pool) as an argument. When the |
| main pool is cleared, the sub-pool will be destroyed. The sub-pool |
| may also be cleared or destroyed at any time, by calling the functions |
| <code>clear_pool</code> and <code>destroy_pool</code>, respectively. |
| (The difference is that <code>clear_pool</code> frees resources |
| associated with the pool, while <code>destroy_pool</code> also |
| deallocates the pool itself. In the former case, you can allocate new |
| resources within the pool, and clear it again, and so forth; in the |
| latter case, it is simply gone). <p> |
| |
| One final note --- sub-requests have their own resource pools, which |
| are sub-pools of the resource pool for the main request. The polite |
| way to reclaim the resources associated with a sub request which you |
| have allocated (using the <code>sub_req_lookup_...</code> functions) |
| is <code>destroy_sub_request</code>, which frees the resource pool. |
| Before calling this function, be sure to copy anything that you care |
| about which might be allocated in the sub-request's resource pool into |
| someplace a little less volatile (for instance, the filename in its |
| <code>request_rec</code> structure). <p> |
| |
| (Again, under most circumstances, you shouldn't feel obliged to call |
| this function; only 2K of memory or so are allocated for a typical sub |
| request, and it will be freed anyway when the main request pool is |
| cleared. It is only when you are allocating many, many sub-requests |
| for a single main request that you should seriously consider the |
| <code>destroy...</code> functions). |
| |
| <h2><a name="config">Configuration, commands and the like</a></h2> |
| |
| One of the design goals for this server was to maintain external |
| compatibility with the NCSA 1.3 server --- that is, to read the same |
| configuration files, to process all the directives therein correctly, |
| and in general to be a drop-in replacement for NCSA. On the other |
| hand, another design goal was to move as much of the server's |
| functionality into modules which have as little as possible to do with |
| the monolithic server core. The only way to reconcile these goals is |
| to move the handling of most commands from the central server into the |
| modules. <p> |
| |
| However, just giving the modules command tables is not enough to |
| divorce them completely from the server core. The server has to |
| remember the commands in order to act on them later. That involves |
| maintaining data which is private to the modules, and which can be |
| either per-server, or per-directory. Most things are per-directory, |
| including in particular access control and authorization information, |
| but also information on how to determine file types from suffixes, |
| which can be modified by <code>AddType</code> and |
| <code>DefaultType</code> directives, and so forth. In general, the |
| governing philosophy is that anything which <em>can</em> be made |
| configurable by directory should be; per-server information is |
| generally used in the standard set of modules for information like |
| <code>Alias</code>es and <code>Redirect</code>s which come into play |
| before the request is tied to a particular place in the underlying |
| file system. <p> |
| |
| Another requirement for emulating the NCSA server is being able to |
| handle the per-directory configuration files, generally called |
| <code>.htaccess</code> files, though even in the NCSA server they can |
| contain directives which have nothing at all to do with access |
| control. Accordingly, after URI -> filename translation, but before |
| performing any other phase, the server walks down the directory |
| hierarchy of the underlying filesystem, following the translated |
| pathname, to read any <code>.htaccess</code> files which might be |
| present. The information which is read in then has to be |
| <em>merged</em> with the applicable information from the server's own |
| config files (either from the <code><Directory></code> sections |
| in <code>access.conf</code>, or from defaults in |
| <code>srm.conf</code>, which actually behaves for most purposes almost |
| exactly like <code><Directory /></code>).<p> |
| |
| Finally, after having served a request which involved reading |
| <code>.htaccess</code> files, we need to discard the storage allocated |
| for handling them. That is solved the same way it is solved wherever |
| else similar problems come up, by tying those structures to the |
| per-transaction resource pool. <p> |
| |
| <h3><a name="per-dir">Per-directory configuration structures</a></h3> |
| |
| Let's look out how all of this plays out in <code>mod_mime.c</code>, |
| which defines the file typing handler which emulates the NCSA server's |
| behavior of determining file types from suffixes. What we'll be |
| looking at, here, is the code which implements the |
| <code>AddType</code> and <code>AddEncoding</code> commands. These |
| commands can appear in <code>.htaccess</code> files, so they must be |
| handled in the module's private per-directory data, which in fact, |
| consists of two separate <code>table</code>s for MIME types and |
| encoding information, and is declared as follows: |
| |
| <pre> |
| typedef struct { |
| table *forced_types; /* Additional AddTyped stuff */ |
| table *encoding_types; /* Added with AddEncoding... */ |
| } mime_dir_config; |
| </pre> |
| |
| When the server is reading a configuration file, or |
| <code><Directory></code> section, which includes one of the MIME |
| module's commands, it needs to create a <code>mime_dir_config</code> |
| structure, so those commands have something to act on. It does this |
| by invoking the function it finds in the module's `create per-dir |
| config slot', with two arguments: the name of the directory to which |
| this configuration information applies (or <code>NULL</code> for |
| <code>srm.conf</code>), and a pointer to a resource pool in which the |
| allocation should happen. <p> |
| |
| (If we are reading a <code>.htaccess</code> file, that resource pool |
| is the per-request resource pool for the request; otherwise it is a |
| resource pool which is used for configuration data, and cleared on |
| restarts. Either way, it is important for the structure being created |
| to vanish when the pool is cleared, by registering a cleanup on the |
| pool if necessary). <p> |
| |
| For the MIME module, the per-dir config creation function just |
| <code>palloc</code>s the structure above, and a creates a couple of |
| <code>table</code>s to fill it. That looks like this: |
| |
| <pre> |
| void *create_mime_dir_config (pool *p, char *dummy) |
| { |
| mime_dir_config *new = |
| (mime_dir_config *) palloc (p, sizeof(mime_dir_config)); |
| |
| new->forced_types = make_table (p, 4); |
| new->encoding_types = make_table (p, 4); |
| |
| return new; |
| } |
| </pre> |
| |
| Now, suppose we've just read in a <code>.htaccess</code> file. We |
| already have the per-directory configuration structure for the next |
| directory up in the hierarchy. If the <code>.htaccess</code> file we |
| just read in didn't have any <code>AddType</code> or |
| <code>AddEncoding</code> commands, its per-directory config structure |
| for the MIME module is still valid, and we can just use it. |
| Otherwise, we need to merge the two structures somehow. <p> |
| |
| To do that, the server invokes the module's per-directory config merge |
| function, if one is present. That function takes three arguments: |
| the two structures being merged, and a resource pool in which to |
| allocate the result. For the MIME module, all that needs to be done |
| is overlay the tables from the new per-directory config structure with |
| those from the parent: |
| |
| <pre> |
| void *merge_mime_dir_configs (pool *p, void *parent_dirv, void *subdirv) |
| { |
| mime_dir_config *parent_dir = (mime_dir_config *)parent_dirv; |
| mime_dir_config *subdir = (mime_dir_config *)subdirv; |
| mime_dir_config *new = |
| (mime_dir_config *)palloc (p, sizeof(mime_dir_config)); |
| |
| new->forced_types = overlay_tables (p, subdir->forced_types, |
| parent_dir->forced_types); |
| new->encoding_types = overlay_tables (p, subdir->encoding_types, |
| parent_dir->encoding_types); |
| |
| return new; |
| } |
| </pre> |
| |
| As a note --- if there is no per-directory merge function present, the |
| server will just use the subdirectory's configuration info, and ignore |
| the parent's. For some modules, that works just fine (e.g., for the |
| includes module, whose per-directory configuration information |
| consists solely of the state of the <code>XBITHACK</code>), and for |
| those modules, you can just not declare one, and leave the |
| corresponding structure slot in the module itself <code>NULL</code>.<p> |
| |
| <h3><a name="commands">Command handling</a></h3> |
| |
| Now that we have these structures, we need to be able to figure out |
| how to fill them. That involves processing the actual |
| <code>AddType</code> and <code>AddEncoding</code> commands. To find |
| commands, the server looks in the module's <code>command table</code>. |
| That table contains information on how many arguments the commands |
| take, and in what formats, where it is permitted, and so forth. That |
| information is sufficient to allow the server to invoke most |
| command-handling functions with pre-parsed arguments. Without further |
| ado, let's look at the <code>AddType</code> command handler, which |
| looks like this (the <code>AddEncoding</code> command looks basically |
| the same, and won't be shown here): |
| |
| <pre> |
| char *add_type(cmd_parms *cmd, mime_dir_config *m, char *ct, char *ext) |
| { |
| if (*ext == '.') ++ext; |
| table_set (m->forced_types, ext, ct); |
| return NULL; |
| } |
| </pre> |
| |
| This command handler is unusually simple. As you can see, it takes |
| four arguments, two of which are pre-parsed arguments, the third being |
| the per-directory configuration structure for the module in question, |
| and the fourth being a pointer to a <code>cmd_parms</code> structure. |
| That structure contains a bunch of arguments which are frequently of |
| use to some, but not all, commands, including a resource pool (from |
| which memory can be allocated, and to which cleanups should be tied), |
| and the (virtual) server being configured, from which the module's |
| per-server configuration data can be obtained if required.<p> |
| |
| Another way in which this particular command handler is unusually |
| simple is that there are no error conditions which it can encounter. |
| If there were, it could return an error message instead of |
| <code>NULL</code>; this causes an error to be printed out on the |
| server's <code>stderr</code>, followed by a quick exit, if it is in |
| the main config files; for a <code>.htaccess</code> file, the syntax |
| error is logged in the server error log (along with an indication of |
| where it came from), and the request is bounced with a server error |
| response (HTTP error status, code 500). <p> |
| |
| The MIME module's command table has entries for these commands, which |
| look like this: |
| |
| <pre> |
| command_rec mime_cmds[] = { |
| { "AddType", add_type, NULL, OR_FILEINFO, TAKE2, |
| "a mime type followed by a file extension" }, |
| { "AddEncoding", add_encoding, NULL, OR_FILEINFO, TAKE2, |
| "an encoding (e.g., gzip), followed by a file extension" }, |
| { NULL } |
| }; |
| </pre> |
| |
| The entries in these tables are: |
| |
| <ul> |
| <li> The name of the command |
| <li> The function which handles it |
| <li> a <code>(void *)</code> pointer, which is passed in the |
| <code>cmd_parms</code> structure to the command handler --- |
| this is useful in case many similar commands are handled by the |
| same function. |
| <li> A bit mask indicating where the command may appear. There are |
| mask bits corresponding to each <code>AllowOverride</code> |
| option, and an additional mask bit, <code>RSRC_CONF</code>, |
| indicating that the command may appear in the server's own |
| config files, but <em>not</em> in any <code>.htaccess</code> |
| file. |
| <li> A flag indicating how many arguments the command handler wants |
| pre-parsed, and how they should be passed in. |
| <code>TAKE2</code> indicates two pre-parsed arguments. Other |
| options are <code>TAKE1</code>, which indicates one pre-parsed |
| argument, <code>FLAG</code>, which indicates that the argument |
| should be <code>On</code> or <code>Off</code>, and is passed in |
| as a boolean flag, <code>RAW_ARGS</code>, which causes the |
| server to give the command the raw, unparsed arguments |
| (everything but the command name itself). There is also |
| <code>ITERATE</code>, which means that the handler looks the |
| same as <code>TAKE1</code>, but that if multiple arguments are |
| present, it should be called multiple times, and finally |
| <code>ITERATE2</code>, which indicates that the command handler |
| looks like a <code>TAKE2</code>, but if more arguments are |
| present, then it should be called multiple times, holding the |
| first argument constant. |
| <li> Finally, we have a string which describes the arguments that |
| should be present. If the arguments in the actual config file |
| are not as required, this string will be used to help give a |
| more specific error message. (You can safely leave this |
| <code>NULL</code>). |
| </ul> |
| |
| Finally, having set this all up, we have to use it. This is |
| ultimately done in the module's handlers, specifically for its |
| file-typing handler, which looks more or less like this; note that the |
| per-directory configuration structure is extracted from the |
| <code>request_rec</code>'s per-directory configuration vector by using |
| the <code>get_module_config</code> function. |
| |
| <pre> |
| int find_ct(request_rec *r) |
| { |
| int i; |
| char *fn = pstrdup (r->pool, r->filename); |
| mime_dir_config *conf = (mime_dir_config *) |
| get_module_config(r->per_dir_config, &mime_module); |
| char *type; |
| |
| if (S_ISDIR(r->finfo.st_mode)) { |
| r->content_type = DIR_MAGIC_TYPE; |
| return OK; |
| } |
| |
| if((i=rind(fn,'.')) < 0) return DECLINED; |
| ++i; |
| |
| if ((type = table_get (conf->encoding_types, &fn[i]))) |
| { |
| r->content_encoding = type; |
| |
| /* go back to previous extension to try to use it as a type */ |
| |
| fn[i-1] = '\0'; |
| if((i=rind(fn,'.')) < 0) return OK; |
| ++i; |
| } |
| |
| if ((type = table_get (conf->forced_types, &fn[i]))) |
| { |
| r->content_type = type; |
| } |
| |
| return OK; |
| } |
| |
| </pre> |
| |
| <h3><a name="servconf">Side notes --- per-server configuration, virtual servers, etc.</a></h3> |
| |
| The basic ideas behind per-server module configuration are basically |
| the same as those for per-directory configuration; there is a creation |
| function and a merge function, the latter being invoked where a |
| virtual server has partially overridden the base server configuration, |
| and a combined structure must be computed. (As with per-directory |
| configuration, the default if no merge function is specified, and a |
| module is configured in some virtual server, is that the base |
| configuration is simply ignored). <p> |
| |
| The only substantial difference is that when a command needs to |
| configure the per-server private module data, it needs to go to the |
| <code>cmd_parms</code> data to get at it. Here's an example, from the |
| alias module, which also indicates how a syntax error can be returned |
| (note that the per-directory configuration argument to the command |
| handler is declared as a dummy, since the module doesn't actually have |
| per-directory config data): |
| |
| <pre> |
| char *add_redirect(cmd_parms *cmd, void *dummy, char *f, char *url) |
| { |
| server_rec *s = cmd->server; |
| alias_server_conf *conf = (alias_server_conf *) |
| get_module_config(s->module_config,&alias_module); |
| alias_entry *new = push_array (conf->redirects); |
| |
| if (!is_url (url)) return "Redirect to non-URL"; |
| |
| new->fake = f; new->real = url; |
| return NULL; |
| } |
| </pre> |
| <!--#include virtual="footer.html" --> |
| </body></html> |
| |