| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" |
| "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> |
| |
| <html xmlns="http://www.w3.org/1999/xhtml"> |
| <head> |
| <meta name="generator" content="HTML Tidy, see www.w3.org" /> |
| |
| <title>Request Processing in Apache 2.0</title> |
| </head> |
| <!-- Background white, links blue (unvisited), navy (visited), red (active) --> |
| |
| <body bgcolor="#FFFFFF" text="#000000" link="#0000FF" |
| vlink="#000080" alink="#FF0000"> |
| <!--#include virtual="header.html" --> |
| |
| <h1>How filters work in Apache 2.0</h1> |
| |
| <p>Warning - this is a cut 'n paste job from an email: |
| <022501c1c529$f63a9550$7f00000a@KOJ></p> |
| |
| <pre> |
| There are three basic filter types (each of these is actually broken |
| down into two categories, but that comes later). |
| |
| CONNECTION: Filters of this type are valid for the lifetime of this |
| connection. |
| |
| PROTOCOL: Filters of this type are valid for the lifetime of this |
| request from the point of view of the client, this means |
| that the request is valid from the time that the request |
| is sent until the time that the response is received. |
| |
| RESOURCE: Filters of this type are valid for the time that this |
| content is used to satisfy a request. For simple |
| requests, this is identical to PROTOCOL, but internal redirects |
| and sub-requests can change the content without ending |
| the request. |
| |
| It is important to make the distinction between a protocol and a |
| resource filter. A resource filter is tied to a specific resource, it |
| may also be tied to header information, but the main binding is to a |
| resource. If you are writing a filter and you want to know if it is |
| resource or protocol, the correct question to ask is: "Can this filter |
| be removed if the request is redirected to a different resource?" If |
| the answer is yes, then it is a resource filter. If it is no, then it |
| is most likely a protocol or connection filter. I won't go into |
| connection filters, because they seem to be well understood. |
| |
| With this definition, a few examples might help: |
| Byterange: We have coded it to be inserted for all |
| requests, and it is removed if not used. Because this filter is active |
| at the beginning of all requests, it can not be removed if it is |
| redirected, so this is a protocol filter. |
| |
| http_header: This filter actually writes the headers to the |
| network. This is obviously a required filter (except in the asis case |
| which is special and will be dealt with below) and so it is a protocol |
| filter. |
| |
| Deflate: The administrator configures this filter based on |
| which file has been requested. If we do an internal redirect from an |
| autoindex page to an index.html page, the deflate filter may be added or |
| removed based on config, so this is a resource filter. |
| |
| The further breakdown of each category into two more filter types is |
| strictly for ordering. We could remove it, and only allow for one |
| filter type, but the order would tend to be wrong, and we would need to |
| hack things to make it work. Currently, the RESOURCE filters only have |
| one filter type, but that should change. |
| |
| How are filters inserted? |
| This is actually rather simple in theory, but the code is |
| complex. First of all, it is important that everybody realize that |
| there are three filter lists for each request, but they are all |
| concatenated together. So, the first list is r->output_filters, then |
| r->proto_output_filters, and finally r->connection->output_filters. |
| These correspond to the RESOURCE, PROTOCOL, and CONNECTION filters |
| respectively. The problem previously, was that we used a singly linked |
| list to create the filter stack, and we started from the "correct" |
| location. This means that if I had a RESOURCE filter on the stack, and |
| I added a CONNECTION filter, the CONNECTION filter would be ignored. |
| This should make sense, because we would insert the connection filter at |
| the top of the c->output_filters list, but the end of r->output_filters |
| pointed to the filter that used to be at the front of c->output_filters. |
| This is obviously wrong. The new insertion code uses a doubly linked |
| list. This has the advantage that we never lose a filter that has been |
| inserted. Unfortunately, it comes with a separate set of headaches. |
| |
| The problem is that we have two different cases were we use subrequests. |
| The first is to insert more data into a response. The second is to |
| replace the existing response with an internal redirect. These are two |
| different cases and need to be treated as such. |
| |
| In the first case, we are creating the subrequest from within a handler |
| or filter. This means that the next filter should be passed to |
| make_sub_request function, and the last resource filter in the |
| sub-request will point to the next filter in the main request. This |
| makes sense, because the sub-request's data needs to flow through the |
| same set of filters as the main request. A graphical representation |
| might help: |
| |
| Default_handler --> includes_filter --> byterange --> content_length -> |
| etc |
| |
| If the includes filter creates a sub request, then we don't want the |
| data from that sub-request to go through the includes filter, because it |
| might not be SSI data. So, the subrequest adds the following: |
| |
| Default_handler --> includes_filter -/-> byterange --> content_length -> etc |
| / |
| Default_handler --> sub_request_core |
| |
| What happens if the subrequest is SSI data? Well, that's easy, the |
| includes_filter is a resource filter, so it will be added to the sub |
| request in between the Default_handler and the sub_request_core filter. |
| |
| The second case for sub-requests is when one sub-request is going to |
| become the real request. This happens whenever a sub-request is created |
| outside of a handler or filter, and NULL is passed as the next filter to |
| the make_sub_request function. |
| |
| In this case, the resource filters no longer make sense for the new |
| request, because the resource has changed. So, instead of starting from |
| scratch, we simply point the front of the resource filters for the |
| sub-request to the front of the protocol filters for the old request. |
| This means that we won't lose any of the protocol filters, neither will |
| we try to send this data through a filter that shouldn't see it. |
| |
| The problem is that we are using a doubly-linked list for our filter |
| stacks now. But, you should notice that it is possible for two lists to |
| intersect in this model. So, you do you handle the previous pointer? |
| This is a very difficult question to answer, because there is no "right" |
| answer, either method is equally valid. I looked at why we use the |
| previous pointer. The only reason for it is to allow for easier |
| addition of new servers. With that being said, the solution I chose was |
| to make the previous pointer always stay on the original request. |
| |
| This causes some more complex logic, but it works for all cases. My |
| concern in having it move to the sub-request, is that for the more |
| common case (where a sub-request is used to add data to a response), the |
| main filter chain would be wrong. That didn't seem like a good idea to |
| me. |
| |
| asis: |
| The final topic. :-) Mod_Asis is a bit of a hack, but the |
| handler needs to remove all filters except for connection filters, and |
| send the data. If you are using mod_asis, all other bets are off. |
| |
| The absolutely last point is that the reason this code was so hard to |
| get right, was because we had hacked so much to force it to work. I |
| wrote most of the hacks originally, so I am very much to blame. |
| However, now that the code is right, I have started to remove some |
| hacks. Most people should have seen that the reset_filters and |
| add_required_filters functions are gone. Those inserted protocol level |
| filters for error conditions, in fact, both functions did the same |
| thing, one after the other, it was really strange. Because we don't |
| lose protocol filters for error cases any more, those hacks went away. |
| The HTTP_HEADER, Content-length, and Byterange filters are all added in |
| the insert_filters phase, because if they were added earlier, we had |
| some interesting interactions. Now, those could all be moved to be |
| inserted with the HTTP_IN, CORE, and CORE_IN filters. That would make |
| the code easier to follow. |
| </pre> |
| |
| <!--#include virtual="footer.html" --> |
| </body> |
| </html> |
| |