blob: 9aefdcac470cd4073fcb9428b20ce7758dcab797 [file] [log] [blame]
The following document was written by Robert S. Thau (rst@ai.mit.edu) on the
release of Apache 1.0. Some details may have changed since then regarding the
functions and names of modules, but the basic ideas are still intact.
=================================================
The basic idea of the new Apache release is to make a modular
"tinkertoy" server, to which people can easily add code which is
valuable to them (even if it isn't universally useful) without hairing
up a monolithic server. Applications for this idea include database
integration, support for experimental search and scripting extensions,
new authentication modes (digest authentication, for instance, could
be done entirely as a module), and so forth. All modules have the
same interface to the server core, and through it, to each other.
In particular, the following are modules in the current code base:
common log format (other loggers can easily coexist with it), auth and
dbm auth (although both use common code in http_protocol.c to parse
the Authorization: line), directory handling (which can be added or
replaced), handling of aliases and access control, content
negotiation, CGI, includes, aliases, and so forth. (What's left in
the basic server? Not a whole lot). The configuration file commands
which configure these things are defined, for the most part, by the
modules themselves, and not by the server core (each module has, or
can have, a command dispatch table).
Besides carving up the base code into modules, this release makes a
few other fairly pervasive changes. Most of the global variables are
gone; most of the MAX_STRING_LENGTH char arrays are gone (the few that
are left being sprintf() targets, or I/O buffers of various sorts),
and unmunge_name has vanished. The most drastic change is the use of
a "compool" strategy to manage resources allocated for a request ---
the code in alloc.c keeps track of it all and allows it to be freed en
bloc at the end of the request. This strategy seems to be effective
in stanching memory and descriptor leaks.
Additional third-party modules can be found at
<URL:http://www.apache.org/dist/contrib/modules/>.
A brief code review:
The code here can be divided into the server core (the http_* files,
along with alloc.c and the various utility files), and several modules
(the mod_* files).
The core interfaces to modules through the "module" structure which
describes each one. There's a linked list of these things rooted at
top_module, through which http_config.c dispatches when necessary. The
module structures themselves are defined at the bottom of the mod_foo
files. (Loading new modules dynamically at runtime should be simple;
just push them onto the linked list. The only complication is what to
do with AddModule commands when the config files are reread,
particularly if you find a module has been taken out).
In addition to the core itself (which does have a module structure to
hold its command tables, and the handlers for various phases of
request handling which make it *barely* a web server on its own),
the modules included here are the following:
mod_mime.c --- deduction of MIME types and content-encodings from
filename extensions. This module defines the AddType, AddEncoding,
and TypesConfig config-file directives. This code is off in a
module by itself so that people who want to experiment with other
meta-information schemes can replace it, and still have content
negotiation work.
mod_log_config.c --- logging in configurable or common log format.
mod_auth.c --- HTTP authentication. Defines the AuthUserFile and
AuthGroupFile directives (other auth-related commands are handled by
the core itself, so it knows which requests require it to poll the
modules for authentication handlers).
mod_auth_dbm.c --- DBM auth. Untested, and left out of the modules
list in modules.c because of that, but it does at least compile.
Grump.
mod_access.c --- access checking by DNS name or IP address; defines
the "order", "allow" and "deny" config-file commands. (If this
module is compiled out, the server fails safe --- any attempt to
configure access control will die on a config file syntax error when
the relevant commands go unrecognized).
mod_negotiation.c --- Content negotiation. Defines the
CacheNegotiatedDocs config-file command. Making this a module is
perhaps going overboard, but I wanted to see how far I could push
it.
mod_alias.c --- Alias command and file translation.
mod_userdir.c --- ditto for Userdir.
mod_cgi.c --- Common Gateway Interface. Also defines ScriptAlias,
because scripts are treated slightly differently depending on
whether they are ScriptAliased or not (in particular, ExecCGI is not
required in the former case).
mod_includes.c --- server-side includes.
mod_dir.c --- defines a whole *raft* of commands; handles directories.
mod_asis.c --- ASIS file handling.
mod_dld.c --- the experimental runtime-code-loader described above.
You'll have to alter the makefile and modules.c to make this active
if you want it.
As to the core, here's a brief review of what's where:
http_protocol.c --- functions for dealing directly with the client.
Reading requests, writing replies of various sorts. I've tried to
route all data transfer between server and client through here, so
there's a single piece of code to change if we want to add, say,
HTTP-NG packetization. The major glaring exception is NPH- CGI
scripts; what *will* we do with those for HTTP-NG?
http_request.c --- functions which direct the processing of requests,
including error handling. Generally responsible for making sure
that the right module handlers get invoked, in the right order.
(This includes the "sub-request" mechanism, which is used by
includes and other stuff to ask about the status of particular
subfiles).
http_core.c ---
Contains the core module structure, its command table, and the
command handlers, also the filename translation routine, and the
like for the core. (Basically, this is all of the core module stuff
which looks more or less like the boilerplate from the other modules).
http_config.c --- Functions to read config files and dispatch to the
command handlers; also, routines to manage configuration vectors,
and to dispatch to modules' handlers for the various phases of
handling a request.
http_log.c --- just the error log. Error handling is split between
http_protocol.c (for generating the default error responses) and
http_request.c (for executive handling, including ErrorDocument
invocation); transaction logging is in the modules.
http_main.c --- System startup, restart, and accepting connections;
also timeout handling (which is pretty grotesque right now; ideas?)
alloc.c --- allocation of all resources which might have to be reclaimed
eventually, including memory, files, and child processes.