2.4.x/docs/manual/developer/output-filters.xml - httpd - Git at Google

 <?xml version="1.0" encoding="UTF-8" ?>
 <!DOCTYPE manualpage SYSTEM "../style/manualpage.dtd">
 <?xml-stylesheet type="text/xsl" href="../style/manual.en.xsl"?>
 <!-- $LastChangedRevision$ -->

 <!--
  Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at

      http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
 -->

 <manualpage metafile="output-filters.xml.meta">
   <parentdocument href="./">Developer Documentation</parentdocument>

   <title>Guide to writing output filters</title>

   <summary>
     <p>There are a number of common pitfalls encountered when writing
     output filters; this page aims to document best practice for
     authors of new or existing filters.</p>

     <p>This document is applicable to both version 2.0 and version 2.2
     of the Apache HTTP Server; it specifically targets
     <code>RESOURCE</code>-level or <code>CONTENT_SET</code>-level
     filters though some advice is generic to all types of filter.</p>
   </summary>

   <section id="basics">
     <title>Filters and bucket brigades</title>

     <p>Each time a filter is invoked, it is passed a <em>bucket
     brigade</em>, containing a sequence of <em>buckets</em> which
     represent both data content and metadata.  Every bucket has a
     <em>bucket type</em>; a number of bucket types are defined and
     used by the <code>httpd</code> core modules (and the
     <code>apr-util</code> library which provides the bucket brigade
     interface), but modules are free to define their own types.</p>

     <note type="hint">Output filters must be prepared to process
     buckets of non-standard types; with a few exceptions, a filter
     need not care about the types of buckets being filtered.</note>

     <p>A filter can tell whether a bucket represents either data or
     metadata using the <code>APR_BUCKET_IS_METADATA</code> macro.
     Generally, all metadata buckets should be passed down the filter
     chain by an output filter.  Filters may transform, delete, and
     insert data buckets as appropriate.</p>

     <p>There are two metadata bucket types which all filters must pay
     attention to: the <code>EOS</code> bucket type, and the
     <code>FLUSH</code> bucket type.  An <code>EOS</code> bucket
     indicates that the end of the response has been reached and no
     further buckets need be processed.  A <code>FLUSH</code> bucket
     indicates that the filter should flush any buffered buckets (if
     applicable) down the filter chain immediately.</p>

     <note type="hint"><code>FLUSH</code> buckets are sent when the
     content generator (or an upstream filter) knows that there may be
     a delay before more content can be sent.  By passing
     <code>FLUSH</code> buckets down the filter chain immediately,
     filters ensure that the client is not kept waiting for pending
     data longer than necessary.</note>

     <p>Filters can create <code>FLUSH</code> buckets and pass these
     down the filter chain if desired.  Generating <code>FLUSH</code>
     buckets unnecessarily, or too frequently, can harm network
     utilisation since it may force large numbers of small packets to
     be sent, rather than a small number of larger packets.  The
     section on <a href="#nonblock">Non-blocking bucket reads</a>
     covers a case where filters are encouraged to generate
     <code>FLUSH</code> buckets.</p>

     <example><title>Example bucket brigade</title>
     HEAP FLUSH FILE EOS</example>

     <p>This shows a bucket brigade which may be passed to a filter; it
     contains two metadata buckets (<code>FLUSH</code> and
     <code>EOS</code>), and two data buckets (<code>HEAP</code> and
     <code>FILE</code>).</p>

   </section>

   <section id="invocation">
     <title>Filter invocation</title>

     <p>For any given request, an output filter might be invoked only
     once and be given a single brigade representing the entire response.
     It is also possible that the number of times a filter is invoked
     for a single response is proportional to the size of the content
     being filtered, with the filter being passed a brigade containing
     a single bucket each time.  Filters must operate correctly in
     either case.</p>

     <note type="warning">An output filter which allocates long-lived
     memory every time it is invoked may consume memory proportional to
     response size.  Output filters which need to allocate memory
     should do so once per response; see <a href="#state">Maintaining
     state</a> below.</note>

     <p>An output filter can distinguish the final invocation for a
     given response by the presence of an <code>EOS</code> bucket in
     the brigade.  Any buckets in the brigade after an EOS should be
     ignored.</p>

     <p>An output filter should never pass an empty brigade down the
     filter chain.  To be defensive, filters should be prepared to
     accept an empty brigade, and should return success without passing
     this brigade on down the filter chain.  The handling of an empty
     brigade should have no side effects (such as changing any state
     private to the filter).</p>

     <example><title>How to handle an empty brigade</title>
     <highlight language="c">
 apr_status_t dummy_filter(ap_filter_t *f, apr_bucket_brigade *bb)
 {
     if (APR_BRIGADE_EMPTY(bb)) {
         return APR_SUCCESS;
     }
     ...
     </highlight>
     </example>

   </section>

   <section id="brigade">
     <title>Brigade structure</title>

     <p>A bucket brigade is a doubly-linked list of buckets.  The list
     is terminated (at both ends) by a <em>sentinel</em> which can be
     distinguished from a normal bucket by comparing it with the
     pointer returned by <code>APR_BRIGADE_SENTINEL</code>.  The list
     sentinel is in fact not a valid bucket structure; any attempt to
     call normal bucket functions (such as
     <code>apr_bucket_read</code>) on the sentinel will have undefined
     behaviour (i.e. will crash the process).</p>

     <p>There are a variety of functions and macros for traversing and
     manipulating bucket brigades; see the <a
     href="http://apr.apache.org/docs/apr-util/trunk/group___a_p_r___util___bucket___brigades.html">apr_buckets.h</a>
     header for complete coverage.  Commonly used macros include:</p>

     <dl>
       <dt><code>APR_BRIGADE_FIRST(bb)</code></dt>
       <dd>returns the first bucket in brigade bb</dd>

       <dt><code>APR_BRIGADE_LAST(bb)</code></dt>
       <dd>returns the last bucket in brigade bb</dd>

       <dt><code>APR_BUCKET_NEXT(e)</code></dt>
       <dd>gives the next bucket after bucket e</dd>

       <dt><code>APR_BUCKET_PREV(e)</code></dt>
       <dd>gives the bucket before bucket e</dd>

     </dl>

     <p>The <code>apr_bucket_brigade</code> structure itself is
     allocated out of a pool, so if a filter creates a new brigade, it
     must ensure that memory use is correctly bounded.  A filter which
     allocates a new brigade out of the request pool
     (<code>r->pool</code>) on every invocation, for example, will fall
     foul of the <a href="#invocation">warning above</a> concerning
     memory use.  Such a filter should instead create a brigade on the
     first invocation per request, and store that brigade in its <a
     href="#state">state structure</a>.</p>

     <note type="warning"><p>It is generally never advisable to use
     <code>apr_brigade_destroy</code> to "destroy" a brigade unless
     you know for certain that the brigade will never be used
     again, even then, it should be used rarely.  The
     memory used by the brigade structure will not be released by
     calling this function (since it comes from a pool), but the
     associated pool cleanup is unregistered.  Using
     <code>apr_brigade_destroy</code> can in fact cause memory leaks;
     if a "destroyed" brigade contains buckets when its
     containing pool is destroyed, those buckets will <em>not</em> be
     immediately destroyed.</p>

     <p>In general, filters should use <code>apr_brigade_cleanup</code>
     in preference to <code>apr_brigade_destroy</code>.</p></note>

   </section>

   <section id="buckets">

     <title>Processing buckets</title>

     <p>When dealing with non-metadata buckets, it is important to
     understand that the "<code>apr_bucket *</code>" object is an
     abstract <em>representation</em> of data:</p>

     <ol>
       <li>The amount of data represented by the bucket may or may not
       have a determinate length; for a bucket which represents data of
       indeterminate length, the <code>->length</code> field is set to
       the value <code>(apr_size_t)-1</code>.  For example, buckets of
       the <code>PIPE</code> bucket type have an indeterminate length;
       they represent the output from a pipe.</li>

       <li>The data represented by a bucket may or may not be mapped
       into memory.  The <code>FILE</code> bucket type, for example,
       represents data stored in a file on disk.</li>
     </ol>

     <p>Filters read the data from a bucket using the
     <code>apr_bucket_read</code> function.  When this function is
     invoked, the bucket may <em>morph</em> into a different bucket
     type, and may also insert a new bucket into the bucket brigade.
     This must happen for buckets which represent data not mapped into
     memory.</p>

     <p>To give an example; consider a bucket brigade containing a
     single <code>FILE</code> bucket representing an entire file, 24
     kilobytes in size:</p>

     <example>FILE(0K-24K)</example>

     <p>When this bucket is read, it will read a block of data from the
     file, morph into a <code>HEAP</code> bucket to represent that
     data, and return the data to the caller.  It also inserts a new
     <code>FILE</code> bucket representing the remainder of the file;
     after the <code>apr_bucket_read</code> call, the brigade looks
     like:</p>

     <example>HEAP(8K) FILE(8K-24K)</example>

   </section>

   <section id="filtering">
     <title>Filtering brigades</title>

     <p>The basic function of any output filter will be to iterate
     through the passed-in brigade and transform (or simply examine)
     the content in some manner.  The implementation of the iteration
     loop is critical to producing a well-behaved output filter.</p>

     <p>Taking an example which loops through the entire brigade as
     follows:</p>

     <example><title>Bad output filter -- do not imitate!</title>
     <highlight language="c">
 apr_bucket *e = APR_BRIGADE_FIRST(bb);
 const char *data;
 apr_size_t length;

 while (e != APR_BRIGADE_SENTINEL(bb)) {
     apr_bucket_read(e, &amp;data, &amp;length, APR_BLOCK_READ);
     e = APR_BUCKET_NEXT(e);
 }

 return ap_pass_brigade(bb);
 </highlight>
     </example>

     <p>The above implementation would consume memory proportional to
     content size.  If passed a <code>FILE</code> bucket, for example,
     the entire file contents would be read into memory as each
     <code>apr_bucket_read</code> call morphed a <code>FILE</code>
     bucket into a <code>HEAP</code> bucket.</p>

     <p>In contrast, the implementation below will consume a fixed
     amount of memory to filter any brigade; a temporary brigade is
     needed and must be allocated only once per response, see the <a
     href="#state">Maintaining state</a> section.</p>

     <example><title>Better output filter</title>
 <highlight language="c">
 apr_bucket *e;
 const char *data;
 apr_size_t length;

 while ((e = APR_BRIGADE_FIRST(bb)) != APR_BRIGADE_SENTINEL(bb)) {
     rv = apr_bucket_read(e, &amp;data, &amp;length, APR_BLOCK_READ);
     if (rv) ...;
     /* Remove bucket e from bb. */
     APR_BUCKET_REMOVE(e);
     /* Insert it into  temporary brigade. */
     APR_BRIGADE_INSERT_HEAD(tmpbb, e);
     /* Pass brigade downstream. */
     rv = ap_pass_brigade(f->next, tmpbb);
     if (rv) ...;
     apr_brigade_cleanup(tmpbb);
 }
 </highlight>
     </example>

   </section>

   <section id="state">

     <title>Maintaining state</title>

     <p>A filter which needs to maintain state over multiple
     invocations per response can use the <code>->ctx</code> field of
     its <code>ap_filter_t</code> structure.  It is typical to store a
     temporary brigade in such a structure, to avoid having to allocate
     a new brigade per invocation as described in the <a
     href="#brigade">Brigade structure</a> section.</p>

     <example><title>Example code to maintain filter state</title>
 <highlight language="c">
 struct dummy_state {
     apr_bucket_brigade *tmpbb;
     int filter_state;
     ...
 };

 apr_status_t dummy_filter(ap_filter_t *f, apr_bucket_brigade *bb)
 {
     struct dummy_state *state;

     state = f->ctx;
     if (state == NULL) {

         /* First invocation for this response: initialise state structure.
          */
         f->ctx = state = apr_palloc(f->r->pool, sizeof *state);

         state->tmpbb = apr_brigade_create(f->r->pool, f->c->bucket_alloc);
         state->filter_state = ...;
     }
     ...
 </highlight>
     </example>

   </section>

   <section id="buffer">
     <title>Buffering buckets</title>

     <p>If a filter decides to store buckets beyond the duration of a
     single filter function invocation (for example storing them in its
     <code>->ctx</code> state structure), those buckets must be <em>set
     aside</em>.  This is necessary because some bucket types provide
     buckets which represent temporary resources (such as stack memory)
     which will fall out of scope as soon as the filter chain completes
     processing the brigade.</p>

     <p>To setaside a bucket, the <code>apr_bucket_setaside</code>
     function can be called.  Not all bucket types can be setaside, but
     if successful, the bucket will have morphed to ensure it has a
     lifetime at least as long as the pool given as an argument to the
     <code>apr_bucket_setaside</code> function.</p>

     <p>Alternatively, the <code>ap_save_brigade</code> function can be
     used, which will move all the buckets into a separate brigade
     containing buckets with a lifetime as long as the given pool
     argument.  This function must be used with care, taking into
     account the following points:</p>

     <ol>
       <li>On return, <code>ap_save_brigade</code> guarantees that all
       the buckets in the returned brigade will represent data mapped
       into memory.  If given an input brigade containing, for example,
       a <code>PIPE</code> bucket, <code>ap_save_brigade</code> will
       consume an arbitrary amount of memory to store the entire output
       of the pipe.</li>

       <li>When <code>ap_save_brigade</code> reads from buckets which
       cannot be setaside, it will always perform blocking reads,
       removing the opportunity to use <a href="#nonblock">Non-blocking
       bucket reads</a>.</li>

       <li>If <code>ap_save_brigade</code> is used without passing a
       non-NULL "<code>saveto</code>" (destination) brigade parameter,
       the function will create a new brigade, which may cause memory
       use to be proportional to content size as described in the <a
       href="#brigade">Brigade structure</a> section.</li>
     </ol>

     <note type="warning">Filters must ensure that any buffered data is
     processed and passed down the filter chain during the last
     invocation for a given response (a brigade containing an EOS
     bucket).  Otherwise such data will be lost.</note>

   </section>

   <section id="nonblock">
     <title>Non-blocking bucket reads</title>

     <p>The <code>apr_bucket_read</code> function takes an
     <code>apr_read_type_e</code> argument which determines whether a
     <em>blocking</em> or <em>non-blocking</em> read will be performed
     from the data source.  A good filter will first attempt to read
     from every data bucket using a non-blocking read; if that fails
     with <code>APR_EAGAIN</code>, then send a <code>FLUSH</code>
     bucket down the filter chain, and retry using a blocking read.</p>

     <p>This mode of operation ensures that any filters further down the
     filter chain will flush any buffered buckets if a slow content
     source is being used.</p>

     <p>A CGI script is an example of a slow content source which is
     implemented as a bucket type. <module>mod_cgi</module> will send
     <code>PIPE</code> buckets which represent the output from a CGI
     script; reading from such a bucket will block when waiting for the
     CGI script to produce more output.</p>

     <example>
       <title>Example code using non-blocking bucket reads</title>
       <highlight language="c">
 apr_bucket *e;
 apr_read_type_e mode = APR_NONBLOCK_READ;

 while ((e = APR_BRIGADE_FIRST(bb)) != APR_BRIGADE_SENTINEL(bb)) {
     apr_status_t rv;

     rv = apr_bucket_read(e, &amp;data, &amp;length, mode);
     if (rv == APR_EAGAIN &amp;&amp; mode == APR_NONBLOCK_READ) {

         /* Pass down a brigade containing a flush bucket: */
         APR_BRIGADE_INSERT_TAIL(tmpbb, apr_bucket_flush_create(...));
         rv = ap_pass_brigade(f->next, tmpbb);
         apr_brigade_cleanup(tmpbb);
         if (rv != APR_SUCCESS) return rv;

         /* Retry, using a blocking read. */
         mode = APR_BLOCK_READ;
         continue;
     }
     else if (rv != APR_SUCCESS) {
         /* handle errors */
     }

     /* Next time, try a non-blocking read first. */
     mode = APR_NONBLOCK_READ;
     ...
 }
 </highlight>
     </example>

   </section>

   <section id="rules">
     <title>Ten rules for output filters</title>

     <p>In summary, here is a set of rules for all output filters to
     follow:</p>

     <ol>
       <li>Output filters should not pass empty brigades down the filter
       chain, but should be tolerant of being passed empty
       brigades.</li>

       <li>Output filters must pass all metadata buckets down the filter
       chain; <code>FLUSH</code> buckets should be respected by passing
       any pending or buffered buckets down the filter chain.</li>

       <li>Output filters should ignore any buckets following an
       <code>EOS</code> bucket.</li>

       <li>Output filters must process a fixed amount of data at a
       time, to ensure that memory consumption is not proportional to
       the size of the content being filtered.</li>

       <li>Output filters should be agnostic with respect to bucket
       types, and must be able to process buckets of unfamiliar
       type.</li>

       <li>After calling <code>ap_pass_brigade</code> to pass a brigade
       down the filter chain, output filters should call
       <code>apr_brigade_cleanup</code> to ensure the brigade is empty
       before reusing that brigade structure; output filters should
       never use <code>apr_brigade_destroy</code> to "destroy"
       brigades.</li>

       <li>Output filters must <em>setaside</em> any buckets which are
       preserved beyond the duration of the filter function.</li>

       <li>Output filters must not ignore the return value of
       <code>ap_pass_brigade</code>, and must return appropriate errors
       back up the filter chain.</li>

       <li>Output filters must only create a fixed number of bucket
       brigades for each response, rather than one per invocation.</li>

       <li>Output filters should first attempt non-blocking reads from
       each data bucket, and send a <code>FLUSH</code> bucket down the
       filter chain if the read blocks, before retrying with a blocking
       read.</li>

     </ol>

   </section>

 </manualpage>
	<?xml version="1.0" encoding="UTF-8" ?>
	<!DOCTYPE manualpage SYSTEM "../style/manualpage.dtd">
	<?xml-stylesheet type="text/xsl" href="../style/manual.en.xsl"?>
	<!-- $LastChangedRevision$ -->

	<!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->

	<manualpage metafile="output-filters.xml.meta">
	<parentdocument href="./">Developer Documentation</parentdocument>

	<title>Guide to writing output filters</title>

	<summary>
	<p>There are a number of common pitfalls encountered when writing
	output filters; this page aims to document best practice for
	authors of new or existing filters.</p>

	<p>This document is applicable to both version 2.0 and version 2.2
	of the Apache HTTP Server; it specifically targets
	<code>RESOURCE</code>-level or <code>CONTENT_SET</code>-level
	filters though some advice is generic to all types of filter.</p>
	</summary>

	<section id="basics">
	<title>Filters and bucket brigades</title>

	<p>Each time a filter is invoked, it is passed a <em>bucket
	brigade</em>, containing a sequence of <em>buckets</em> which
	represent both data content and metadata. Every bucket has a
	<em>bucket type</em>; a number of bucket types are defined and
	used by the <code>httpd</code> core modules (and the
	<code>apr-util</code> library which provides the bucket brigade
	interface), but modules are free to define their own types.</p>

	<note type="hint">Output filters must be prepared to process
	buckets of non-standard types; with a few exceptions, a filter
	need not care about the types of buckets being filtered.</note>

	<p>A filter can tell whether a bucket represents either data or
	metadata using the <code>APR_BUCKET_IS_METADATA</code> macro.
	Generally, all metadata buckets should be passed down the filter
	chain by an output filter. Filters may transform, delete, and
	insert data buckets as appropriate.</p>

	<p>There are two metadata bucket types which all filters must pay
	attention to: the <code>EOS</code> bucket type, and the
	<code>FLUSH</code> bucket type. An <code>EOS</code> bucket
	indicates that the end of the response has been reached and no
	further buckets need be processed. A <code>FLUSH</code> bucket
	indicates that the filter should flush any buffered buckets (if
	applicable) down the filter chain immediately.</p>

	<note type="hint"><code>FLUSH</code> buckets are sent when the
	content generator (or an upstream filter) knows that there may be
	a delay before more content can be sent. By passing
	<code>FLUSH</code> buckets down the filter chain immediately,
	filters ensure that the client is not kept waiting for pending
	data longer than necessary.</note>

	<p>Filters can create <code>FLUSH</code> buckets and pass these
	down the filter chain if desired. Generating <code>FLUSH</code>
	buckets unnecessarily, or too frequently, can harm network
	utilisation since it may force large numbers of small packets to
	be sent, rather than a small number of larger packets. The
	section on <a href="#nonblock">Non-blocking bucket reads</a>
	covers a case where filters are encouraged to generate
	<code>FLUSH</code> buckets.</p>

	<example><title>Example bucket brigade</title>
	HEAP FLUSH FILE EOS</example>

	<p>This shows a bucket brigade which may be passed to a filter; it
	contains two metadata buckets (<code>FLUSH</code> and
	<code>EOS</code>), and two data buckets (<code>HEAP</code> and
	<code>FILE</code>).</p>

	</section>

	<section id="invocation">
	<title>Filter invocation</title>

	<p>For any given request, an output filter might be invoked only
	once and be given a single brigade representing the entire response.
	It is also possible that the number of times a filter is invoked
	for a single response is proportional to the size of the content
	being filtered, with the filter being passed a brigade containing
	a single bucket each time. Filters must operate correctly in
	either case.</p>

	<note type="warning">An output filter which allocates long-lived
	memory every time it is invoked may consume memory proportional to
	response size. Output filters which need to allocate memory
	should do so once per response; see <a href="#state">Maintaining
	state</a> below.</note>

	<p>An output filter can distinguish the final invocation for a
	given response by the presence of an <code>EOS</code> bucket in
	the brigade. Any buckets in the brigade after an EOS should be
	ignored.</p>

	<p>An output filter should never pass an empty brigade down the
	filter chain. To be defensive, filters should be prepared to
	accept an empty brigade, and should return success without passing
	this brigade on down the filter chain. The handling of an empty
	brigade should have no side effects (such as changing any state
	private to the filter).</p>

	<example><title>How to handle an empty brigade</title>
	<highlight language="c">
	apr_status_t dummy_filter(ap_filter_t f, apr_bucket_brigade bb)
	{
	if (APR_BRIGADE_EMPTY(bb)) {
	return APR_SUCCESS;
	}
	...
	</highlight>
	</example>

	</section>

	<section id="brigade">
	<title>Brigade structure</title>

	<p>A bucket brigade is a doubly-linked list of buckets. The list
	is terminated (at both ends) by a <em>sentinel</em> which can be
	distinguished from a normal bucket by comparing it with the
	pointer returned by <code>APR_BRIGADE_SENTINEL</code>. The list
	sentinel is in fact not a valid bucket structure; any attempt to
	call normal bucket functions (such as
	<code>apr_bucket_read</code>) on the sentinel will have undefined
	behaviour (i.e. will crash the process).</p>

	<p>There are a variety of functions and macros for traversing and
	manipulating bucket brigades; see the <a
	href="http://apr.apache.org/docs/apr-util/trunk/group___a_p_r___util___bucket___brigades.html">apr_buckets.h</a>
	header for complete coverage. Commonly used macros include:</p>

	<dl>
	<dt><code>APR_BRIGADE_FIRST(bb)</code></dt>
	<dd>returns the first bucket in brigade bb</dd>

	<dt><code>APR_BRIGADE_LAST(bb)</code></dt>
	<dd>returns the last bucket in brigade bb</dd>

	<dt><code>APR_BUCKET_NEXT(e)</code></dt>
	<dd>gives the next bucket after bucket e</dd>

	<dt><code>APR_BUCKET_PREV(e)</code></dt>
	<dd>gives the bucket before bucket e</dd>

	</dl>

	<p>The <code>apr_bucket_brigade</code> structure itself is
	allocated out of a pool, so if a filter creates a new brigade, it
	must ensure that memory use is correctly bounded. A filter which
	allocates a new brigade out of the request pool
	(<code>r->pool</code>) on every invocation, for example, will fall
	foul of the <a href="#invocation">warning above</a> concerning
	memory use. Such a filter should instead create a brigade on the
	first invocation per request, and store that brigade in its <a
	href="#state">state structure</a>.</p>

	<note type="warning"><p>It is generally never advisable to use
	<code>apr_brigade_destroy</code> to "destroy" a brigade unless
	you know for certain that the brigade will never be used
	again, even then, it should be used rarely. The
	memory used by the brigade structure will not be released by
	calling this function (since it comes from a pool), but the
	associated pool cleanup is unregistered. Using
	<code>apr_brigade_destroy</code> can in fact cause memory leaks;
	if a "destroyed" brigade contains buckets when its
	containing pool is destroyed, those buckets will <em>not</em> be
	immediately destroyed.</p>

	<p>In general, filters should use <code>apr_brigade_cleanup</code>
	in preference to <code>apr_brigade_destroy</code>.</p></note>

	</section>

	<section id="buckets">

	<title>Processing buckets</title>

	<p>When dealing with non-metadata buckets, it is important to
	understand that the "<code>apr_bucket *</code>" object is an
	abstract <em>representation</em> of data:</p>

	<ol>
	<li>The amount of data represented by the bucket may or may not
	have a determinate length; for a bucket which represents data of
	indeterminate length, the <code>->length</code> field is set to
	the value <code>(apr_size_t)-1</code>. For example, buckets of
	the <code>PIPE</code> bucket type have an indeterminate length;
	they represent the output from a pipe.</li>

	<li>The data represented by a bucket may or may not be mapped
	into memory. The <code>FILE</code> bucket type, for example,
	represents data stored in a file on disk.</li>
	</ol>

	<p>Filters read the data from a bucket using the
	<code>apr_bucket_read</code> function. When this function is
	invoked, the bucket may <em>morph</em> into a different bucket
	type, and may also insert a new bucket into the bucket brigade.
	This must happen for buckets which represent data not mapped into
	memory.</p>

	<p>To give an example; consider a bucket brigade containing a
	single <code>FILE</code> bucket representing an entire file, 24
	kilobytes in size:</p>

	<example>FILE(0K-24K)</example>

	<p>When this bucket is read, it will read a block of data from the
	file, morph into a <code>HEAP</code> bucket to represent that
	data, and return the data to the caller. It also inserts a new
	<code>FILE</code> bucket representing the remainder of the file;
	after the <code>apr_bucket_read</code> call, the brigade looks
	like:</p>

	<example>HEAP(8K) FILE(8K-24K)</example>

	</section>

	<section id="filtering">
	<title>Filtering brigades</title>

	<p>The basic function of any output filter will be to iterate
	through the passed-in brigade and transform (or simply examine)
	the content in some manner. The implementation of the iteration
	loop is critical to producing a well-behaved output filter.</p>

	<p>Taking an example which loops through the entire brigade as
	follows:</p>

	<example><title>Bad output filter -- do not imitate!</title>
	<highlight language="c">
	apr_bucket *e = APR_BRIGADE_FIRST(bb);
	const char *data;
	apr_size_t length;

	while (e != APR_BRIGADE_SENTINEL(bb)) {
	apr_bucket_read(e, &data, &length, APR_BLOCK_READ);
	e = APR_BUCKET_NEXT(e);
	}

	return ap_pass_brigade(bb);
	</highlight>
	</example>

	<p>The above implementation would consume memory proportional to
	content size. If passed a <code>FILE</code> bucket, for example,
	the entire file contents would be read into memory as each
	<code>apr_bucket_read</code> call morphed a <code>FILE</code>
	bucket into a <code>HEAP</code> bucket.</p>

	<p>In contrast, the implementation below will consume a fixed
	amount of memory to filter any brigade; a temporary brigade is
	needed and must be allocated only once per response, see the <a
	href="#state">Maintaining state</a> section.</p>

	<example><title>Better output filter</title>
	<highlight language="c">
	apr_bucket *e;
	const char *data;
	apr_size_t length;

	while ((e = APR_BRIGADE_FIRST(bb)) != APR_BRIGADE_SENTINEL(bb)) {
	rv = apr_bucket_read(e, &data, &length, APR_BLOCK_READ);
	if (rv) ...;
	/* Remove bucket e from bb. */
	APR_BUCKET_REMOVE(e);
	/* Insert it into temporary brigade. */
	APR_BRIGADE_INSERT_HEAD(tmpbb, e);
	/* Pass brigade downstream. */
	rv = ap_pass_brigade(f->next, tmpbb);
	if (rv) ...;
	apr_brigade_cleanup(tmpbb);
	}
	</highlight>
	</example>

	</section>

	<section id="state">

	<title>Maintaining state</title>

	<p>A filter which needs to maintain state over multiple
	invocations per response can use the <code>->ctx</code> field of
	its <code>ap_filter_t</code> structure. It is typical to store a
	temporary brigade in such a structure, to avoid having to allocate
	a new brigade per invocation as described in the <a
	href="#brigade">Brigade structure</a> section.</p>

	<example><title>Example code to maintain filter state</title>
	<highlight language="c">
	struct dummy_state {
	apr_bucket_brigade *tmpbb;
	int filter_state;
	...
	};

	apr_status_t dummy_filter(ap_filter_t f, apr_bucket_brigade bb)
	{
	struct dummy_state *state;

	state = f->ctx;
	if (state == NULL) {

	/* First invocation for this response: initialise state structure.
	*/
	f->ctx = state = apr_palloc(f->r->pool, sizeof *state);

	state->tmpbb = apr_brigade_create(f->r->pool, f->c->bucket_alloc);
	state->filter_state = ...;
	}
	...
	</highlight>
	</example>

	</section>

	<section id="buffer">
	<title>Buffering buckets</title>

	<p>If a filter decides to store buckets beyond the duration of a
	single filter function invocation (for example storing them in its
	<code>->ctx</code> state structure), those buckets must be <em>set
	aside</em>. This is necessary because some bucket types provide
	buckets which represent temporary resources (such as stack memory)
	which will fall out of scope as soon as the filter chain completes
	processing the brigade.</p>

	<p>To setaside a bucket, the <code>apr_bucket_setaside</code>
	function can be called. Not all bucket types can be setaside, but
	if successful, the bucket will have morphed to ensure it has a
	lifetime at least as long as the pool given as an argument to the
	<code>apr_bucket_setaside</code> function.</p>

	<p>Alternatively, the <code>ap_save_brigade</code> function can be
	used, which will move all the buckets into a separate brigade
	containing buckets with a lifetime as long as the given pool
	argument. This function must be used with care, taking into
	account the following points:</p>

	<ol>
	<li>On return, <code>ap_save_brigade</code> guarantees that all
	the buckets in the returned brigade will represent data mapped
	into memory. If given an input brigade containing, for example,
	a <code>PIPE</code> bucket, <code>ap_save_brigade</code> will
	consume an arbitrary amount of memory to store the entire output
	of the pipe.</li>

	<li>When <code>ap_save_brigade</code> reads from buckets which
	cannot be setaside, it will always perform blocking reads,
	removing the opportunity to use <a href="#nonblock">Non-blocking
	bucket reads</a>.</li>

	<li>If <code>ap_save_brigade</code> is used without passing a
	non-NULL "<code>saveto</code>" (destination) brigade parameter,
	the function will create a new brigade, which may cause memory
	use to be proportional to content size as described in the <a
	href="#brigade">Brigade structure</a> section.</li>
	</ol>

	<note type="warning">Filters must ensure that any buffered data is
	processed and passed down the filter chain during the last
	invocation for a given response (a brigade containing an EOS
	bucket). Otherwise such data will be lost.</note>

	</section>

	<section id="nonblock">
	<title>Non-blocking bucket reads</title>

	<p>The <code>apr_bucket_read</code> function takes an
	<code>apr_read_type_e</code> argument which determines whether a
	<em>blocking</em> or <em>non-blocking</em> read will be performed
	from the data source. A good filter will first attempt to read
	from every data bucket using a non-blocking read; if that fails
	with <code>APR_EAGAIN</code>, then send a <code>FLUSH</code>
	bucket down the filter chain, and retry using a blocking read.</p>

	<p>This mode of operation ensures that any filters further down the
	filter chain will flush any buffered buckets if a slow content
	source is being used.</p>

	<p>A CGI script is an example of a slow content source which is
	implemented as a bucket type. <module>mod_cgi</module> will send
	<code>PIPE</code> buckets which represent the output from a CGI
	script; reading from such a bucket will block when waiting for the
	CGI script to produce more output.</p>

	<example>
	<title>Example code using non-blocking bucket reads</title>
	<highlight language="c">
	apr_bucket *e;
	apr_read_type_e mode = APR_NONBLOCK_READ;

	while ((e = APR_BRIGADE_FIRST(bb)) != APR_BRIGADE_SENTINEL(bb)) {
	apr_status_t rv;

	rv = apr_bucket_read(e, &data, &length, mode);
	if (rv == APR_EAGAIN && mode == APR_NONBLOCK_READ) {

	/* Pass down a brigade containing a flush bucket: */
	APR_BRIGADE_INSERT_TAIL(tmpbb, apr_bucket_flush_create(...));
	rv = ap_pass_brigade(f->next, tmpbb);
	apr_brigade_cleanup(tmpbb);
	if (rv != APR_SUCCESS) return rv;

	/* Retry, using a blocking read. */
	mode = APR_BLOCK_READ;
	continue;
	}
	else if (rv != APR_SUCCESS) {
	/* handle errors */
	}

	/* Next time, try a non-blocking read first. */
	mode = APR_NONBLOCK_READ;
	...
	}
	</highlight>
	</example>

	</section>

	<section id="rules">
	<title>Ten rules for output filters</title>

	<p>In summary, here is a set of rules for all output filters to
	follow:</p>

	<ol>
	<li>Output filters should not pass empty brigades down the filter
	chain, but should be tolerant of being passed empty
	brigades.</li>

	<li>Output filters must pass all metadata buckets down the filter
	chain; <code>FLUSH</code> buckets should be respected by passing
	any pending or buffered buckets down the filter chain.</li>

	<li>Output filters should ignore any buckets following an
	<code>EOS</code> bucket.</li>

	<li>Output filters must process a fixed amount of data at a
	time, to ensure that memory consumption is not proportional to
	the size of the content being filtered.</li>

	<li>Output filters should be agnostic with respect to bucket
	types, and must be able to process buckets of unfamiliar
	type.</li>

	<li>After calling <code>ap_pass_brigade</code> to pass a brigade
	down the filter chain, output filters should call
	<code>apr_brigade_cleanup</code> to ensure the brigade is empty
	before reusing that brigade structure; output filters should
	never use <code>apr_brigade_destroy</code> to "destroy"
	brigades.</li>

	<li>Output filters must <em>setaside</em> any buckets which are
	preserved beyond the duration of the filter function.</li>

	<li>Output filters must not ignore the return value of
	<code>ap_pass_brigade</code>, and must return appropriate errors
	back up the filter chain.</li>

	<li>Output filters must only create a fixed number of bucket
	brigades for each response, rather than one per invocation.</li>

	<li>Output filters should first attempt non-blocking reads from
	each data bucket, and send a <code>FLUSH</code> bucket down the
	filter chain if the read blocks, before retrying with a blocking
	read.</li>

	</ol>

	</section>

	</manualpage>