|  | APACHE 2.x ROADMAP | 
|  | ================== | 
|  | Last modified at [$Date$] | 
|  |  | 
|  |  | 
|  | WORKS IN PROGRESS | 
|  | ----------------- | 
|  |  | 
|  | * Source code should follow style guidelines. | 
|  | OK, we all agree pretty code is good.  Probably best to clean this | 
|  | up by hand immediately upon branching a 2.1 tree. | 
|  | Status: Justin volunteers to hand-edit the entire source tree ;) | 
|  |  | 
|  | Justin says: | 
|  | Recall when the release plan for 2.0 was written: | 
|  | Absolute Enforcement of an "Apache Style" for code. | 
|  | Watch this slip into 3.0. | 
|  |  | 
|  | David says: | 
|  | The style guide needs to be reviewed before this can be done. | 
|  | http://httpd.apache.org/dev/styleguide.html | 
|  | The current file is dated April 20th 1998! | 
|  |  | 
|  | OtherBill offers: | 
|  | It's survived since '98 because it's welldone :-)  Suggest we | 
|  | simply follow whatever is documented in styleguide.html as we | 
|  | branch the next tree.  Really sort of straightforward, if you | 
|  | dislike a bit within that doc, bring it up on the dev@httpd | 
|  | list prior to the next branch. | 
|  |  | 
|  | So Bill sums up ... let's get the code cleaned up in CVS head. | 
|  | Remember, it just takes cvs diff -b (that is, --ignore-space-change) | 
|  | to see the code changes and ignore that cruft.  Get editing Justin :) | 
|  |  | 
|  | * Replace stat [deferred open] with open/fstat in directory_walk. | 
|  | Justin, Ian, OtherBill all interested in this.  Implies setting up | 
|  | the apr_file_t member in request_rec, and having all modules use | 
|  | that file, and allow the cleanup to close it [if it isn't a shared, | 
|  | cached file handle.] | 
|  |  | 
|  | * The Async Apache Server implemented in terms of APR. | 
|  | [Bill Stoddard's pet project.] | 
|  | Message-ID: <008301c17d42$9b446970$01000100@sashimi> (dev@apr) | 
|  |  | 
|  | OtherBill notes that this can proceed in two parts... | 
|  |  | 
|  | Async accept, setup, and tear-down of the request | 
|  | e.g. dealing with the incoming request headers, prior to | 
|  | dispatching the request to a thread for processing. | 
|  | This doesn't need to wait for a 2.x/3.0 bump. | 
|  |  | 
|  | Async delegation of the entire request processing chain | 
|  | Too many handlers use stack storage and presume it is | 
|  | available for the life of the request, so a complete | 
|  | async implementation would need to happen 3.0 release. | 
|  |  | 
|  | Brian notes that async writes will provide a bigger | 
|  | scalability win than async reads for most servers. | 
|  | We may want to try a hybrid sync-read/async-write MPM | 
|  | as a next step.  This should be relatively easy to | 
|  | build: start with the current worker or leader/followers | 
|  | model, but hand off each response brigade to a "completion | 
|  | thread" that multiplexes writes on many connections, so | 
|  | that the worker thread doesn't have to wait around for | 
|  | the sendfile to complete. | 
|  |  | 
|  |  | 
|  | MAKING APACHE REPOSITORY-AGNOSTIC | 
|  | (or: remove knowledge of the filesystem) | 
|  |  | 
|  | [ 2002/10/01: discussion in progress on items below; this isn't | 
|  | planned yet ] | 
|  |  | 
|  | * dav_resource concept for an HTTP resource ("ap_resource") | 
|  |  | 
|  | * r->filename, r->canonical_filename, r->finfo need to | 
|  | disappear. All users need to use new APIs on the ap_resource | 
|  | object. | 
|  |  | 
|  | (backwards compat: today, when this occurs with mod_dav and a | 
|  | custom backend, the above items refer to the topmost directory | 
|  | mapped by a location; e.g. docroot) | 
|  |  | 
|  | Need to preserve a 'filename'-like string for mime-by-name | 
|  | sorts of operations.  But this only needs to be the name itself | 
|  | and not a full path. | 
|  |  | 
|  | Justin: Can we leverage the path info, or do we not trust the | 
|  | user? | 
|  |  | 
|  | gstein: well, it isn't the "path info", but the actual URI of | 
|  | the resource. And of course we trust the user... that is | 
|  | the resource they requested. | 
|  |  | 
|  | dav_resource->uri is the field you want. path_info might | 
|  | still exist, but that portion might be related to the | 
|  | CGI concept of "path translated" or some other further | 
|  | resolution. | 
|  |  | 
|  | To continue, I would suggest that "path translated" and | 
|  | having *any* path info is Badness. It means that you did | 
|  | not fully resolve a resource for the given URI. The | 
|  | "abs_path" in a URI identifies a resource, and that | 
|  | should get fully resolved. None of this "resolve to | 
|  | <here> and then we have a magical second resolution | 
|  | (inside the CGI script)" or somesuch. | 
|  |  | 
|  | Justin: Well, let's consider mod_mbox for a second.  It is sort of | 
|  | a virtual filesystem in its own right - as it introduces | 
|  | it's own notion of a URI space, but it is intrinsically | 
|  | tied to the filesystem to do the lookups.  But, for the | 
|  | portion that isn't resolved on the file system, it has | 
|  | its own addressing scheme.  Do we need the ability to | 
|  | layer resolution? | 
|  |  | 
|  | * The translate_name hook goes away | 
|  |  | 
|  | Wrowe altogether disagrees.  translate_name today even operates | 
|  | on URIs ... this mechanism needs to be preserved. | 
|  |  | 
|  | * The doc for map_to_storage is totally opaque to me. It has | 
|  | something to do with filesystems, but it also talks about | 
|  | security and per_dir_config and other stuff. I presume something | 
|  | needs to happen there -- at least better doc. | 
|  |  | 
|  | Wrowe agrees and will write it up. | 
|  |  | 
|  | * The directory_walk concept disappears. All configuration is | 
|  | tagged to Locations. The "mod_filesystem" module might have some | 
|  | internal concept of the same config appearing in multiple | 
|  | places, but that is handled internally rather than by Apache | 
|  | core. | 
|  |  | 
|  | Wrowe suggests this is wrong, instead it's private to filesystem | 
|  | requests, and is already invoked from map_to_storage, not the core | 
|  | handler.  <Directory > and <Files > blocks are preserved as-is, | 
|  | but <Directory > sections become specific to the filesystem handler | 
|  | alone.  Because alternate filesystem schemes could be loaded, this | 
|  | should be exposed, from the core, for other file-based stores to | 
|  | share. Consider an archive store where the layers become | 
|  | <Directory path> -> <Archive store> -> <File name> | 
|  |  | 
|  | Justin: How do we map Directory entries to Locations? | 
|  |  | 
|  | * The "Location tree" is an in-memory representation of the URL | 
|  | namespace. Nodes of the tree have configuration specific to that | 
|  | location in the namespace. | 
|  |  | 
|  | Something like: | 
|  |  | 
|  | typedef struct { | 
|  | const char *name;  /* name of this node relative to parent */ | 
|  |  | 
|  | struct ap_conf_vector_t *locn_config; | 
|  |  | 
|  | apr_hash_t *children; /* NULL if no child configs */ | 
|  | } ap_locn_node; | 
|  |  | 
|  | The following config: | 
|  |  | 
|  | <Location /server-status> | 
|  | SetHandler server-status | 
|  | Order deny,allow | 
|  | Deny from all | 
|  | Allow from 127.0.0.1 | 
|  | </Location> | 
|  |  | 
|  | Creates a node with name=="server_status", and the node is a | 
|  | child of the "/" node. (hmm. node->name is redundant with the | 
|  | hash key; maybe drop node->name) | 
|  |  | 
|  | In the config vector, mod_access has stored its Order, Deny, and | 
|  | Allow configs. mod_core has stored the SetHandler. | 
|  |  | 
|  | During the Location walk, we merge the config vectors normally. | 
|  |  | 
|  | Note that an Alias simply associates a filesystem path (in | 
|  | mod_filesystem) with that Location in the tree. Merging | 
|  | continues with child locations, but a merge is never done | 
|  | through filesystem locations. Config on a specific subdir needs | 
|  | to be mapped back into the corresponding point in the Location | 
|  | tree for proper merging. | 
|  |  | 
|  | * Config is parsed into a tree, as we did for the 2.0 timeframe, | 
|  | but that tree is just a representation of the config (for | 
|  | multiple runs and for in-memory manipulation and usage). It is | 
|  | unrelated to the "Location tree". | 
|  |  | 
|  | * Calls to apr_file_io functions generally need to be replaced | 
|  | with operations against the ap_resource. For example, rather | 
|  | than calling apr_dir_open/read/close(), a caller uses | 
|  | resource->repos->get_children() or somesuch. | 
|  |  | 
|  | Note that things like mod_dir, mod_autoindex, and mod_negotiation | 
|  | need to be converted to use these mechanisms so that their | 
|  | functions will work on logical repositories rather than just | 
|  | filesystems. | 
|  |  | 
|  | * How do we handle CGI scripts?  Especially when the resource may | 
|  | not be backed by a file?  Ideally, we should be able to come up | 
|  | with some mechanism to allow CGIs to work in a | 
|  | repository-independent manner. | 
|  |  | 
|  | - Writing the virtual data as a file and then executing it? | 
|  | - Can a shell be executed in a streamy manner?  (Portably?) | 
|  | - Have an 'execute_resource' hook/func that allows the | 
|  | repository to choose its manner - be it exec() or whatever. | 
|  | - Won't this approach lead to duplication of code?  Helper fns? | 
|  |  | 
|  | gstein: PHP, Perl, and Python scripts are nominally executed by | 
|  | a filter inserted by mod_php/perl/python. I'd suggest | 
|  | that shell/batch scripts are similar. | 
|  |  | 
|  | But to ask further: what if it is an executable | 
|  | *program* rather than just a script? Do we yank that out | 
|  | of the repository, drop it onto the filesystem, and run | 
|  | it? eeewwwww... | 
|  |  | 
|  | I'll vote -0.9 for CGIs as a filter. Keep 'em handlers. | 
|  |  | 
|  | Justin: So, do we give up executing CGIs from virtual repositories? | 
|  | That seems like a sad tradeoff to make.  I'd like to have | 
|  | my CGI scripts under DAV (SVN) control. | 
|  |  | 
|  | * How do we handle overlaying of Location and Directory entries? | 
|  | Right now, we have a problem when /cgi-bin/ is ScriptAlias'd and | 
|  | mod_dav has control over /.  Some people believe that /cgi-bin/ | 
|  | shouldn't be under DAV control, while others do believe it | 
|  | should be.  What's the right strategy? |