| APACHE 2.x ROADMAP |
| ================== |
| Last modified at [$Date$] |
| |
| |
| WORKS IN PROGRESS |
| ----------------- |
| |
| * Source code should follow style guidelines. |
| OK, we all agree pretty code is good. Probably best to clean this |
| up by hand immediately upon branching a 2.1 tree. |
| Status: Justin volunteers to hand-edit the entire source tree ;) |
| |
| Justin says: |
| Recall when the release plan for 2.0 was written: |
| Absolute Enforcement of an "Apache Style" for code. |
| Watch this slip into 3.0. |
| |
| David says: |
| The style guide needs to be reviewed before this can be done. |
| http://httpd.apache.org/dev/styleguide.html |
| The current file is dated April 20th 1998! |
| |
| OtherBill offers: |
| It's survived since '98 because it's welldone :-) Suggest we |
| simply follow whatever is documented in styleguide.html as we |
| branch the next tree. Really sort of straightforward, if you |
| dislike a bit within that doc, bring it up on the dev@httpd |
| list prior to the next branch. |
| |
| So Bill sums up ... let's get the code cleaned up in CVS head. |
| Remember, it just takes cvs diff -b (that is, --ignore-space-change) |
| to see the code changes and ignore that cruft. Get editing Justin :) |
| |
| * Replace stat [deferred open] with open/fstat in directory_walk. |
| Justin, Ian, OtherBill all interested in this. Implies setting up |
| the apr_file_t member in request_rec, and having all modules use |
| that file, and allow the cleanup to close it [if it isn't a shared, |
| cached file handle.] |
| |
| * The Async Apache Server implemented in terms of APR. |
| [Bill Stoddard's pet project.] |
| Message-ID: <008301c17d42$9b446970$01000100@sashimi> (dev@apr) |
| |
| OtherBill notes that this can proceed in two parts... |
| |
| Async accept, setup, and tear-down of the request |
| e.g. dealing with the incoming request headers, prior to |
| dispatching the request to a thread for processing. |
| This doesn't need to wait for a 2.x/3.0 bump. |
| |
| Async delegation of the entire request processing chain |
| Too many handlers use stack storage and presume it is |
| available for the life of the request, so a complete |
| async implementation would need to happen 3.0 release. |
| |
| Brian notes that async writes will provide a bigger |
| scalability win than async reads for most servers. |
| We may want to try a hybrid sync-read/async-write MPM |
| as a next step. This should be relatively easy to |
| build: start with the current worker or leader/followers |
| model, but hand off each response brigade to a "completion |
| thread" that multiplexes writes on many connections, so |
| that the worker thread doesn't have to wait around for |
| the sendfile to complete. |
| |
| |
| MAKING APACHE REPOSITORY-AGNOSTIC |
| (or: remove knowledge of the filesystem) |
| |
| [ 2002/10/01: discussion in progress on items below; this isn't |
| planned yet ] |
| |
| * dav_resource concept for an HTTP resource ("ap_resource") |
| |
| * r->filename, r->canonical_filename, r->finfo need to |
| disappear. All users need to use new APIs on the ap_resource |
| object. |
| |
| (backwards compat: today, when this occurs with mod_dav and a |
| custom backend, the above items refer to the topmost directory |
| mapped by a location; e.g. docroot) |
| |
| Need to preserve a 'filename'-like string for mime-by-name |
| sorts of operations. But this only needs to be the name itself |
| and not a full path. |
| |
| Justin: Can we leverage the path info, or do we not trust the |
| user? |
| |
| gstein: well, it isn't the "path info", but the actual URI of |
| the resource. And of course we trust the user... that is |
| the resource they requested. |
| |
| dav_resource->uri is the field you want. path_info might |
| still exist, but that portion might be related to the |
| CGI concept of "path translated" or some other further |
| resolution. |
| |
| To continue, I would suggest that "path translated" and |
| having *any* path info is Badness. It means that you did |
| not fully resolve a resource for the given URI. The |
| "abs_path" in a URI identifies a resource, and that |
| should get fully resolved. None of this "resolve to |
| <here> and then we have a magical second resolution |
| (inside the CGI script)" or somesuch. |
| |
| Justin: Well, let's consider mod_mbox for a second. It is sort of |
| a virtual filesystem in its own right - as it introduces |
| it's own notion of a URI space, but it is intrinsically |
| tied to the filesystem to do the lookups. But, for the |
| portion that isn't resolved on the file system, it has |
| its own addressing scheme. Do we need the ability to |
| layer resolution? |
| |
| * The translate_name hook goes away |
| |
| Wrowe altogether disagrees. translate_name today even operates |
| on URIs ... this mechanism needs to be preserved. |
| |
| * The doc for map_to_storage is totally opaque to me. It has |
| something to do with filesystems, but it also talks about |
| security and per_dir_config and other stuff. I presume something |
| needs to happen there -- at least better doc. |
| |
| Wrowe agrees and will write it up. |
| |
| * The directory_walk concept disappears. All configuration is |
| tagged to Locations. The "mod_filesystem" module might have some |
| internal concept of the same config appearing in multiple |
| places, but that is handled internally rather than by Apache |
| core. |
| |
| Wrowe suggests this is wrong, instead it's private to filesystem |
| requests, and is already invoked from map_to_storage, not the core |
| handler. <Directory > and <Files > blocks are preserved as-is, |
| but <Directory > sections become specific to the filesystem handler |
| alone. Because alternate filesystem schemes could be loaded, this |
| should be exposed, from the core, for other file-based stores to |
| share. Consider an archive store where the layers become |
| <Directory path> -> <Archive store> -> <File name> |
| |
| Justin: How do we map Directory entries to Locations? |
| |
| * The "Location tree" is an in-memory representation of the URL |
| namespace. Nodes of the tree have configuration specific to that |
| location in the namespace. |
| |
| Something like: |
| |
| typedef struct { |
| const char *name; /* name of this node relative to parent */ |
| |
| struct ap_conf_vector_t *locn_config; |
| |
| apr_hash_t *children; /* NULL if no child configs */ |
| } ap_locn_node; |
| |
| The following config: |
| |
| <Location /server-status> |
| SetHandler server-status |
| Order deny,allow |
| Deny from all |
| Allow from 127.0.0.1 |
| </Location> |
| |
| Creates a node with name=="server_status", and the node is a |
| child of the "/" node. (hmm. node->name is redundant with the |
| hash key; maybe drop node->name) |
| |
| In the config vector, mod_access has stored its Order, Deny, and |
| Allow configs. mod_core has stored the SetHandler. |
| |
| During the Location walk, we merge the config vectors normally. |
| |
| Note that an Alias simply associates a filesystem path (in |
| mod_filesystem) with that Location in the tree. Merging |
| continues with child locations, but a merge is never done |
| through filesystem locations. Config on a specific subdir needs |
| to be mapped back into the corresponding point in the Location |
| tree for proper merging. |
| |
| * Config is parsed into a tree, as we did for the 2.0 timeframe, |
| but that tree is just a representation of the config (for |
| multiple runs and for in-memory manipulation and usage). It is |
| unrelated to the "Location tree". |
| |
| * Calls to apr_file_io functions generally need to be replaced |
| with operations against the ap_resource. For example, rather |
| than calling apr_dir_open/read/close(), a caller uses |
| resource->repos->get_children() or somesuch. |
| |
| Note that things like mod_dir, mod_autoindex, and mod_negotiation |
| need to be converted to use these mechanisms so that their |
| functions will work on logical repositories rather than just |
| filesystems. |
| |
| * How do we handle CGI scripts? Especially when the resource may |
| not be backed by a file? Ideally, we should be able to come up |
| with some mechanism to allow CGIs to work in a |
| repository-independent manner. |
| |
| - Writing the virtual data as a file and then executing it? |
| - Can a shell be executed in a streamy manner? (Portably?) |
| - Have an 'execute_resource' hook/func that allows the |
| repository to choose its manner - be it exec() or whatever. |
| - Won't this approach lead to duplication of code? Helper fns? |
| |
| gstein: PHP, Perl, and Python scripts are nominally executed by |
| a filter inserted by mod_php/perl/python. I'd suggest |
| that shell/batch scripts are similar. |
| |
| But to ask further: what if it is an executable |
| *program* rather than just a script? Do we yank that out |
| of the repository, drop it onto the filesystem, and run |
| it? eeewwwww... |
| |
| I'll vote -0.9 for CGIs as a filter. Keep 'em handlers. |
| |
| Justin: So, do we give up executing CGIs from virtual repositories? |
| That seems like a sad tradeoff to make. I'd like to have |
| my CGI scripts under DAV (SVN) control. |
| |
| * How do we handle overlaying of Location and Directory entries? |
| Right now, we have a problem when /cgi-bin/ is ScriptAlias'd and |
| mod_dav has control over /. Some people believe that /cgi-bin/ |
| shouldn't be under DAV control, while others do believe it |
| should be. What's the right strategy? |