docs/manual/developer/filters.html - httpd - Git at Google

 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

 <html xmlns="http://www.w3.org/1999/xhtml">
   <head>
     <meta name="generator" content="HTML Tidy, see www.w3.org" />

     <title>Request Processing in Apache 2.0</title>
   </head>
   <!-- Background white, links blue (unvisited), navy (visited), red (active) -->

   <body bgcolor="#FFFFFF" text="#000000" link="#0000FF"
   vlink="#000080" alink="#FF0000">
     <!--#include virtual="header.html" -->

     <h1>How filters work in Apache 2.0</h1>

     <p>Warning - this is a cut 'n paste job from an email:
        &lt022501c1c529$f63a9550$7f00000a@KOJ&gt</p>

 <pre>
 There are three basic filter types (each of these is actually broken
 down into two categories, but that comes later).

 CONNECTION:  Filters of this type are valid for the lifetime of this
              connection.

 PROTOCOL:    Filters of this type are valid for the lifetime of this
              request from the point of view of the client, this means
              that the request is valid from the time that the request
              is sent until the time that the response is received.

 RESOURCE:    Filters of this type are valid for the time that this
              content is used to satisfy a request.  For simple
              requests, this is identical to PROTOCOL, but internal redirects
              and sub-requests can change the content without ending
              the request.

 It is important to make the distinction between a protocol and a
 resource filter.  A resource filter is tied to a specific resource, it
 may also be tied to header information, but the main binding is to a
 resource.  If you are writing a filter and you want to know if it is
 resource or protocol, the correct question to ask is:  "Can this filter
 be removed if the request is redirected to a different resource?"  If
 the answer is yes, then it is a resource filter.  If it is no, then it
 is most likely a protocol or connection filter.  I won't go into
 connection filters, because they seem to be well understood.

 With this definition, a few examples might help:
 Byterange:  We have coded it to be inserted for all
 requests, and it is removed if not used.  Because this filter is active
 at the beginning of all requests, it can not be removed if it is
 redirected, so this is a protocol filter.

 http_header:  This filter actually writes the headers to the
 network.  This is obviously a required filter (except in the asis case
 which is special and will be dealt with below) and so it is a protocol
 filter.

 Deflate:  The administrator configures this filter based on
 which file has been requested.  If we do an internal redirect from an
 autoindex page to an index.html page, the deflate filter may be added or
 removed based on config, so this is a resource filter.

 The further breakdown of each category into two more filter types is
 strictly for ordering.  We could remove it, and only allow for one
 filter type, but the order would tend to be wrong, and we would need to
 hack things to make it work.  Currently, the RESOURCE filters only have
 one filter type, but that should change.

 How are filters inserted?
 This is actually rather simple in theory, but the code is
 complex.  First of all, it is important that everybody realize that
 there are three filter lists for each request, but they are all
 concatenated together.  So, the first list is r->output_filters, then
 r->proto_output_filters, and finally r->connection->output_filters.
 These correspond to the RESOURCE, PROTOCOL, and CONNECTION filters
 respectively.  The problem previously, was that we used a singly linked
 list to create the filter stack, and we started from the "correct"
 location.  This means that if I had a RESOURCE filter on the stack, and
 I added a CONNECTION filter, the CONNECTION filter would be ignored.
 This should make sense, because we would insert the connection filter at
 the top of the c->output_filters list, but the end of r->output_filters
 pointed to the filter that used to be at the front of c->output_filters.
 This is obviously wrong.  The new insertion code uses a doubly linked
 list.  This has the advantage that we never lose a filter that has been
 inserted.  Unfortunately, it comes with a separate set of headaches.

 The problem is that we have two different cases were we use subrequests.
 The first is to insert more data into a response.  The second is to
 replace the existing response with an internal redirect.  These are two
 different cases and need to be treated as such.

 In the first case, we are creating the subrequest from within a handler
 or filter.  This means that the next filter should be passed to
 make_sub_request function, and the last resource filter in the
 sub-request will point to the next filter in the main request.  This
 makes sense, because the sub-request's data needs to flow through the
 same set of filters as the main request.  A graphical representation
 might help:

 Default_handler --> includes_filter --> byterange --> content_length ->
 etc

 If the includes filter creates a sub request, then we don't want the
 data from that sub-request to go through the includes filter, because it
 might not be SSI data.  So, the subrequest adds the following:

 Default_handler --> includes_filter -/-> byterange --> content_length -> etc
                                     /
 Default_handler --> sub_request_core

 What happens if the subrequest is SSI data?  Well, that's easy, the
 includes_filter is a resource filter, so it will be added to the sub
 request in between the Default_handler and the sub_request_core filter.

 The second case for sub-requests is when one sub-request is going to
 become the real request.  This happens whenever a sub-request is created
 outside of a handler or filter, and NULL is passed as the next filter to
 the make_sub_request function.

 In this case, the resource filters no longer make sense for the new
 request, because the resource has changed.  So, instead of starting from
 scratch, we simply point the front of the resource filters for the
 sub-request to the front of the protocol filters for the old request.
 This means that we won't lose any of the protocol filters, neither will
 we try to send this data through a filter that shouldn't see it.

 The problem is that we are using a doubly-linked list for our filter
 stacks now. But, you should notice that it is possible for two lists to
 intersect in this model.  So, you do you handle the previous pointer?
 This is a very difficult question to answer, because there is no "right"
 answer, either method is equally valid.  I looked at why we use the
 previous pointer.  The only reason for it is to allow for easier
 addition of new servers.  With that being said, the solution I chose was
 to make the previous pointer always stay on the original request.

 This causes some more complex logic, but it works for all cases.  My
 concern in having it move to the sub-request, is that for the more
 common case (where a sub-request is used to add data to a response), the
 main filter chain would be wrong.  That didn't seem like a good idea to
 me.

 asis:
 The final topic.  :-)  Mod_Asis is a bit of a hack, but the
 handler needs to remove all filters except for connection filters, and
 send the data.  If you are using mod_asis, all other bets are off.

 The absolutely last point is that the reason this code was so hard to
 get right, was because we had hacked so much to force it to work.  I
 wrote most of the hacks originally, so I am very much to blame.
 However, now that the code is right, I have started to remove some
 hacks.  Most people should have seen that the reset_filters and
 add_required_filters functions are gone.  Those inserted protocol level
 filters for error conditions, in fact, both functions did the same
 thing, one after the other, it was really strange.  Because we don't
 lose protocol filters for error cases any more, those hacks went away.
 The HTTP_HEADER, Content-length, and Byterange filters are all added in
 the insert_filters phase, because if they were added earlier, we had
 some interesting interactions.  Now, those could all be moved to be
 inserted with the HTTP_IN, CORE, and CORE_IN filters.  That would make
 the code easier to follow.
 </pre>

     <!--#include virtual="footer.html" -->
   </body>
 </html>
	<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
	"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

	<html xmlns="http://www.w3.org/1999/xhtml">
	<head>
	<meta name="generator" content="HTML Tidy, see www.w3.org" />

	<title>Request Processing in Apache 2.0</title>
	</head>
	<!-- Background white, links blue (unvisited), navy (visited), red (active) -->

	<body bgcolor="#FFFFFF" text="#000000" link="#0000FF"
	vlink="#000080" alink="#FF0000">
	<!--#include virtual="header.html" -->

	<h1>How filters work in Apache 2.0</h1>

	<p>Warning - this is a cut 'n paste job from an email:
	&lt022501c1c529$f63a9550$7f00000a@KOJ&gt</p>

	<pre>
	There are three basic filter types (each of these is actually broken
	down into two categories, but that comes later).

	CONNECTION: Filters of this type are valid for the lifetime of this
	connection.

	PROTOCOL: Filters of this type are valid for the lifetime of this
	request from the point of view of the client, this means
	that the request is valid from the time that the request
	is sent until the time that the response is received.

	RESOURCE: Filters of this type are valid for the time that this
	content is used to satisfy a request. For simple
	requests, this is identical to PROTOCOL, but internal redirects
	and sub-requests can change the content without ending
	the request.

	It is important to make the distinction between a protocol and a
	resource filter. A resource filter is tied to a specific resource, it
	may also be tied to header information, but the main binding is to a
	resource. If you are writing a filter and you want to know if it is
	resource or protocol, the correct question to ask is: "Can this filter
	be removed if the request is redirected to a different resource?" If
	the answer is yes, then it is a resource filter. If it is no, then it
	is most likely a protocol or connection filter. I won't go into
	connection filters, because they seem to be well understood.

	With this definition, a few examples might help:
	Byterange: We have coded it to be inserted for all
	requests, and it is removed if not used. Because this filter is active
	at the beginning of all requests, it can not be removed if it is
	redirected, so this is a protocol filter.

	http_header: This filter actually writes the headers to the
	network. This is obviously a required filter (except in the asis case
	which is special and will be dealt with below) and so it is a protocol
	filter.

	Deflate: The administrator configures this filter based on
	which file has been requested. If we do an internal redirect from an
	autoindex page to an index.html page, the deflate filter may be added or
	removed based on config, so this is a resource filter.

	The further breakdown of each category into two more filter types is
	strictly for ordering. We could remove it, and only allow for one
	filter type, but the order would tend to be wrong, and we would need to
	hack things to make it work. Currently, the RESOURCE filters only have
	one filter type, but that should change.

	How are filters inserted?
	This is actually rather simple in theory, but the code is
	complex. First of all, it is important that everybody realize that
	there are three filter lists for each request, but they are all
	concatenated together. So, the first list is r->output_filters, then
	r->proto_output_filters, and finally r->connection->output_filters.
	These correspond to the RESOURCE, PROTOCOL, and CONNECTION filters
	respectively. The problem previously, was that we used a singly linked
	list to create the filter stack, and we started from the "correct"
	location. This means that if I had a RESOURCE filter on the stack, and
	I added a CONNECTION filter, the CONNECTION filter would be ignored.
	This should make sense, because we would insert the connection filter at
	the top of the c->output_filters list, but the end of r->output_filters
	pointed to the filter that used to be at the front of c->output_filters.
	This is obviously wrong. The new insertion code uses a doubly linked
	list. This has the advantage that we never lose a filter that has been
	inserted. Unfortunately, it comes with a separate set of headaches.

	The problem is that we have two different cases were we use subrequests.
	The first is to insert more data into a response. The second is to
	replace the existing response with an internal redirect. These are two
	different cases and need to be treated as such.

	In the first case, we are creating the subrequest from within a handler
	or filter. This means that the next filter should be passed to
	make_sub_request function, and the last resource filter in the
	sub-request will point to the next filter in the main request. This
	makes sense, because the sub-request's data needs to flow through the
	same set of filters as the main request. A graphical representation
	might help:

	Default_handler --> includes_filter --> byterange --> content_length ->
	etc

	If the includes filter creates a sub request, then we don't want the
	data from that sub-request to go through the includes filter, because it
	might not be SSI data. So, the subrequest adds the following:

	Default_handler --> includes_filter -/-> byterange --> content_length -> etc
	/
	Default_handler --> sub_request_core

	What happens if the subrequest is SSI data? Well, that's easy, the
	includes_filter is a resource filter, so it will be added to the sub
	request in between the Default_handler and the sub_request_core filter.

	The second case for sub-requests is when one sub-request is going to
	become the real request. This happens whenever a sub-request is created
	outside of a handler or filter, and NULL is passed as the next filter to
	the make_sub_request function.

	In this case, the resource filters no longer make sense for the new
	request, because the resource has changed. So, instead of starting from
	scratch, we simply point the front of the resource filters for the
	sub-request to the front of the protocol filters for the old request.
	This means that we won't lose any of the protocol filters, neither will
	we try to send this data through a filter that shouldn't see it.

	The problem is that we are using a doubly-linked list for our filter
	stacks now. But, you should notice that it is possible for two lists to
	intersect in this model. So, you do you handle the previous pointer?
	This is a very difficult question to answer, because there is no "right"
	answer, either method is equally valid. I looked at why we use the
	previous pointer. The only reason for it is to allow for easier
	addition of new servers. With that being said, the solution I chose was
	to make the previous pointer always stay on the original request.

	This causes some more complex logic, but it works for all cases. My
	concern in having it move to the sub-request, is that for the more
	common case (where a sub-request is used to add data to a response), the
	main filter chain would be wrong. That didn't seem like a good idea to
	me.

	asis:
	The final topic. :-) Mod_Asis is a bit of a hack, but the
	handler needs to remove all filters except for connection filters, and
	send the data. If you are using mod_asis, all other bets are off.

	The absolutely last point is that the reason this code was so hard to
	get right, was because we had hacked so much to force it to work. I
	wrote most of the hacks originally, so I am very much to blame.
	However, now that the code is right, I have started to remove some
	hacks. Most people should have seen that the reset_filters and
	add_required_filters functions are gone. Those inserted protocol level
	filters for error conditions, in fact, both functions did the same
	thing, one after the other, it was really strange. Because we don't
	lose protocol filters for error cases any more, those hacks went away.
	The HTTP_HEADER, Content-length, and Byterange filters are all added in
	the insert_filters phase, because if they were added earlier, we had
	some interesting interactions. Now, those could all be moved to be
	inserted with the HTTP_IN, CORE, and CORE_IN filters. That would make
	the code easier to follow.
	</pre>

	<!--#include virtual="footer.html" -->
	</body>
	</html>