ASF_20_SRC_AUTO/src/documentation/xdocs/userdocs/offline/index.xml - cocoon - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.0//EN" "../../dtd/document-v10.dtd">

 <document>
   <header>
     <title>Offline Page Generation</title>
     <version>1.0</version>
     <type>Technical document</type>
     <authors><person name="Upayavira" email="upayavira@apache.org"/>
     </authors>
     <abstract>This document explains the basic concepts of offline page generation with Apache Cocoon.</abstract>
   </header>
   <body>
     <s1 title="Overview">
       <p>Cocoon can generate static, 'offline' versions of web pages or web sites, as well
          as sites served dynamically. This document covers the concepts involved in offline
          page and site generation.
       </p>
     </s1>
     <s1 title="Offline Page Generation">
       <p>Cocoon allows static versions of Cocoon web sites to be created.</p>
       <p>At present, this can be done in three ways:</p>
       <ul>
         <li><link href="cli.html">Command Line Interface</link></li>
         <li><link href="ant.html">Using Ant Task</link></li>
         <li><link href="bean.html">Cocoon Bean</link></li>
       </ul>
       <p>This document explains the general concepts that are shared by all of these approaches.
          The specific details for each method are explained on a separate page.</p>
       <p>Cocoon, when generating pages offline, can follow links in a page (whether that page
          is HTML, PDF or anything else), and can rewrite URIs to create filenames by checking
          the mime type of the generated page. All links to pages who's URIs change are changed
          too.
       </p>
     </s1>
     <s1 title="Configuration">
       <p>To use Cocoon in its offline mode, a servlet container (e.g. Tomcat or Jetty) is not
          needed. Cocoon can generate an offline site directly using the information available
          in the Cocoon <code>webapp</code> folder.</p>
       <p>Having said this, many choose to have a servlet container available locally for use
          whilst debugging, as this can speed up the development process significantly.</p>
       <s2 title="Directories and Files">
         <p>As all the information Cocoon needs to generate a site is stored in the Cocoon
            webapp directory, we need to tell it where to find it, and where to find various
            other files and directories. These are:</p>
         <ul>
           <li>Context directory (the Cocoon Webapp directory)</li>
           <li>Configuration File (usually <code>${COCOON_WEBAPP}/WEB-INF/cocoon.xconf</code>)</li>
           <li>Work Directory (used by Cocoon to store temporary files, this can be anywhere of your choosing)</li>
         </ul>
       </s2>
       <s2 title="Logging">
         <p>There are three options that need to be specified in relation to logging. These are:</p>
         <ul>
           <li>Log Kit (the logging configuration file, usually <code>${COCOON_WEBAPP}/WEB-INF/logkit.xconf</code>)</li>
           <li>Logger (a category used for logging, as configured in the configuration file)</li>
           <li>Log Level (a logging level, either DEBUG, INFO WARN, ERROR or FATAL_ERROR. Relates specifically to logging
               at startup, after which log kit configuration takes over)</li>
         </ul>
       </s2>
       <s2 title="Other Configuration Options">
         <p>In online mode, a User agent string tells Cocoon what browser is being used to access a page. The user agent
            can be configured manually for offline generation.</p>
         <p>In online mode, an accept string is provided by a browser, telling the browser what types of content it
            is capable of accepting. This will be a comma separated list of mime types. In offline mode, an accept
            string can also be specified.</p>
         <p>As Cocoon based sites can change the content they generate based upon the user agent string and the accepts
            string, it can be necessary to specify them in order to have the correct content generated.</p>
         <p>In order to generate sites that make use of databases and database connections, it is necessary to load
            JDBC classes at startup. Cocoon allows for this.</p>
         <p>When, in offline mode, Cocoon generates a page ending in a <code>/</code>, the resultant file cannot be
            written to a filesystem as its name would refer specifically to a directory. Therefore, the user can
            specify a default filename which will be appended to the page's URI before saving to disc.</p>
       </s2>
     </s1>
     <s1 title="URIs and Targets">
       <s2 title="SourceURIs">
         <p>A source URI (which may also have a source prefix prepended) is the part of the URI that is given
            to Cocoon for processing. So, for example, if you access a page with:
            <code>http://localhost:8080/cocoon/site/page.html</code> then the source URI would be
            <code>site/page.html</code></p>
       </s2>
       <s2 title="Destinations and Modifiable Sources">
         <p>Most of the time, when generating pages, the generated pages will be simply written to disk.</p>
         <p>However, this is not the only option. Generated pages can be written anywhere for which a
            <code>ModifiableSource</code> exists. So, for example, it is possible to generate a site and
            have the pages written directly to a web server using FTP, by making use of the Avalon
            <code>FTPSource</code>.</p>
       </s2>
       <s2 title="Target Types">
         <p>When generating a page, Cocoon needs to know how to decide upon the URI of the generated page.
            This process could be described as 'URI arithmetic'.</p>
         <p>Source and destination URIs are made up of the following elements:</p>
         <ul>
           <li>Source Prefix: Part of a source URI used to request a page but excluded from the destination
               URI</li>
           <li>Source URI: Part of a source URI that is used when calculating the destination URI</li>
           <li>Destination URI: The base URI for a destination</li>
           <li>Type: The method used for merging the above elements (can be append, replace or
               insert</li>
         </ul>
         <note>When combining elements to make a URI, it is the user's responsibility to include directory
               separators. For example, <code>foo</code> with <code>bar</code> appended will be
               <code>foobar</code>, whereas <code>foo/</code> with <code>bar</code> appended will be
               <code>foo/bar</code>.
         </note>
         <s3 title="Appending">
           <p>Here, when calculating the destination URI, the source prefix is ignored, and the destination
              URI is calculated by appending the source URI to the end of the destination URI. For example,
              with the following values:</p>
           <p>Source prefix: <code>site/</code>, source URI: <code>page.html</code>, destination URI:
              <code>pages/</code></p>
           <p>A request will be made to Cocoon for a page at: <code>site/page.html</code>. This will be
              saved as <code>pages/page.html</code>.</p>
         </s3>
         <s3 title="Replacing">
           <p>Here, when calculating the destination URI, the source prefix and the source URI are
              ignored, and the destination URI is used as is. This is useful when you wish to save the
              generated page with a filename that bears no relationship to the source URI. For example,
              with the following values:</p>
           <p>Source prefix: <code>site/</code>, source URI: <code>page.html</code>, destination URI:
              <code>pages/simple.html</code></p>
           <p>A request will be made to Cocoon for a page at: <code>site/page.html</code>. This will be
              saved as <code>pages/simple.html</code>.</p>
           <note>Given the nature of this target type, it inherently cannot be used when following links
              (otherwise all pages will be written on top of each other).</note>
         </s3>
         <s3 title="Inserting">
           <p>Here, when calculating the destination URI, the source prefix is ignored, and the source URI
              is inserted into the destination URI at the point marked by an asterisk (*). This is intended
              for use with complex protocols where the source URI does not appear at the end of the
              destination URI.</p>
         </s3>
       </s2>
       <s2 title="Mime Type Checking">
         <p>Cocoon can optionally test the mime type for a page, and, if the mime type doesn't match the page's
            extension, amend the destination URI to include the correct extension. This will ensure that pages
            will load correctly when served by a static web server.</p>
         <p>When Cocoon amends a destination URI, it also amends URIs for links in those pages, so that links
            will still work when a site has been crawled.</p>
         <note>This feature substantially slows down page generation, as each page must be generated three times,
               (once to find links, once to find its mime-type and once to collect the actual content. This
               can be avoided by ensuring that all URIs in the site are correct and do not need amending, in which
               case it is only necessary to generate a page once.</note>
       </s2>
     </s1>
     <s1 title="Following Links and Site Crawling">
       <p>Cocoon can be configured to either follow, or ignore, links in pages that it generates. It has two methods
       of gathering links, 'link view' and 'link gathering'.</p>
       <s2 title="Link View Crawling">
         <p>With link view crawling, Cocoon gets the links by generating the 'link view' for a page. Using link view
            gives a significant degree of configurability in terms of which links are gathered, as it is possible to
            insert a transformer into the view to select out links that should not be followed.</p>
         <p>The disadvantage with link view crawling is that each page must be generated twice, which doubles page
            generation time.</p>
         <p>Link view is usually configured in the root sitemap with:</p>
         <source>
 <![CDATA[

   <map:views>

   <map:view from-position="last" name="links">
    <map:serialize type="links"/>
   </map:view>

  </map:views>
 ]]>
         </source>
         <p>If you have this in your root sitemap, you do not need it in your sub-sitemaps. However, you may choose
            to override it with one that carries our further processing - for example, with an XSLT transformer that
            removes links that should not be crawled.</p>
         <p>See <link href="../concepts/views.html">views</link> for more on views. </p>
         <p>You can see the link view yourself by appending <code>?cocoon-view=links</code> to the page's URI.</p>
       </s2>
       <s2 title="Link Gathering Crawling">
         <p>With link gathering crawling, links are gathered from the SAX stream right before the serializer. All
            <code>src</code>, <code>href</code> and <code>xlink:href</code> attributes are taken to be links, and are
            therefore followed.</p>
         <p>The benefit of link gathering crawling is that pages do not need to be generated twice. However, one looses
            the ability to configure which links should be followed that exists with link view crawling.</p>
       </s2>
     </s1>
     <s1 title="Broken Links">
       <p>When a page cannot be found at a URI that has either been specified, or has been found as a link in another
          page, it is considered 'broken'.</p>
       <p>Exactly what is done when a broken link is found depends upon the method used to evoke
          Cocoon. See related pages for specific details.</p>
       <s2 title="Broken Link Handling using xconf Configuration method">
         <p>The xconf method allows for more sophisticated broken link handling. The
              user can select to have broken links reported to a file, this file being
              either text or XML.</p>
         <p>When this file is plain text, it will have one link URI per line.</p>
         <p>When this file is in XML, it will detail a message explaining the reason
            for the broken link, as well as the URI of the link.</p>
         <p>It is also possible to specify whether an error page should be generated
            in the place of the broken page (based upon the configured
            <code>&lt;map:handle-errors&gt;</code> code in the sitemap). If required,
            an extension can be appended to the original file's URI to signify that
            it is an error page (e.g. <code>.error</code>).</p>
       </s2>
     </s1>
     <s1 title="Precompiling XSPs">
       <p>When used offline, Cocoon can precompile XSP pages. If no URIs are specified, it will scan all directories
          within the context directory looking for XSP files, each of which will be compiled. If URIs are specified,
          all links will be followed looking for pages that make use of XSP, compiling those XSP pages as they are
          found.</p>
     </s1>
   </body>
 </document>
	<?xml version="1.0" encoding="UTF-8"?>
	<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.0//EN" "../../dtd/document-v10.dtd">

	<document>
	<header>
	<title>Offline Page Generation</title>
	<version>1.0</version>
	<type>Technical document</type>
	<authors><person name="Upayavira" email="upayavira@apache.org"/>
	</authors>
	<abstract>This document explains the basic concepts of offline page generation with Apache Cocoon.</abstract>
	</header>
	<body>
	<s1 title="Overview">
	<p>Cocoon can generate static, 'offline' versions of web pages or web sites, as well
	as sites served dynamically. This document covers the concepts involved in offline
	page and site generation.
	</p>
	</s1>
	<s1 title="Offline Page Generation">
	<p>Cocoon allows static versions of Cocoon web sites to be created.</p>
	<p>At present, this can be done in three ways:</p>
	<ul>
	<li><link href="cli.html">Command Line Interface</link></li>
	<li><link href="ant.html">Using Ant Task</link></li>
	<li><link href="bean.html">Cocoon Bean</link></li>
	</ul>
	<p>This document explains the general concepts that are shared by all of these approaches.
	The specific details for each method are explained on a separate page.</p>
	<p>Cocoon, when generating pages offline, can follow links in a page (whether that page
	is HTML, PDF or anything else), and can rewrite URIs to create filenames by checking
	the mime type of the generated page. All links to pages who's URIs change are changed
	too.
	</p>
	</s1>
	<s1 title="Configuration">
	<p>To use Cocoon in its offline mode, a servlet container (e.g. Tomcat or Jetty) is not
	needed. Cocoon can generate an offline site directly using the information available
	in the Cocoon <code>webapp</code> folder.</p>
	<p>Having said this, many choose to have a servlet container available locally for use
	whilst debugging, as this can speed up the development process significantly.</p>
	<s2 title="Directories and Files">
	<p>As all the information Cocoon needs to generate a site is stored in the Cocoon
	webapp directory, we need to tell it where to find it, and where to find various
	other files and directories. These are:</p>
	<ul>
	<li>Context directory (the Cocoon Webapp directory)</li>
	<li>Configuration File (usually <code>${COCOON_WEBAPP}/WEB-INF/cocoon.xconf</code>)</li>
	<li>Work Directory (used by Cocoon to store temporary files, this can be anywhere of your choosing)</li>
	</ul>
	</s2>
	<s2 title="Logging">
	<p>There are three options that need to be specified in relation to logging. These are:</p>
	<ul>
	<li>Log Kit (the logging configuration file, usually <code>${COCOON_WEBAPP}/WEB-INF/logkit.xconf</code>)</li>
	<li>Logger (a category used for logging, as configured in the configuration file)</li>
	<li>Log Level (a logging level, either DEBUG, INFO WARN, ERROR or FATAL_ERROR. Relates specifically to logging
	at startup, after which log kit configuration takes over)</li>
	</ul>
	</s2>
	<s2 title="Other Configuration Options">
	<p>In online mode, a User agent string tells Cocoon what browser is being used to access a page. The user agent
	can be configured manually for offline generation.</p>
	<p>In online mode, an accept string is provided by a browser, telling the browser what types of content it
	is capable of accepting. This will be a comma separated list of mime types. In offline mode, an accept
	string can also be specified.</p>
	<p>As Cocoon based sites can change the content they generate based upon the user agent string and the accepts
	string, it can be necessary to specify them in order to have the correct content generated.</p>
	<p>In order to generate sites that make use of databases and database connections, it is necessary to load
	JDBC classes at startup. Cocoon allows for this.</p>
	<p>When, in offline mode, Cocoon generates a page ending in a <code>/</code>, the resultant file cannot be
	written to a filesystem as its name would refer specifically to a directory. Therefore, the user can
	specify a default filename which will be appended to the page's URI before saving to disc.</p>
	</s2>
	</s1>
	<s1 title="URIs and Targets">
	<s2 title="SourceURIs">
	<p>A source URI (which may also have a source prefix prepended) is the part of the URI that is given
	to Cocoon for processing. So, for example, if you access a page with:
	<code>http://localhost:8080/cocoon/site/page.html</code> then the source URI would be
	<code>site/page.html</code></p>
	</s2>
	<s2 title="Destinations and Modifiable Sources">
	<p>Most of the time, when generating pages, the generated pages will be simply written to disk.</p>
	<p>However, this is not the only option. Generated pages can be written anywhere for which a
	<code>ModifiableSource</code> exists. So, for example, it is possible to generate a site and
	have the pages written directly to a web server using FTP, by making use of the Avalon
	<code>FTPSource</code>.</p>
	</s2>
	<s2 title="Target Types">
	<p>When generating a page, Cocoon needs to know how to decide upon the URI of the generated page.
	This process could be described as 'URI arithmetic'.</p>
	<p>Source and destination URIs are made up of the following elements:</p>
	<ul>
	<li>Source Prefix: Part of a source URI used to request a page but excluded from the destination
	URI</li>
	<li>Source URI: Part of a source URI that is used when calculating the destination URI</li>
	<li>Destination URI: The base URI for a destination</li>
	<li>Type: The method used for merging the above elements (can be append, replace or
	insert</li>
	</ul>
	<note>When combining elements to make a URI, it is the user's responsibility to include directory
	separators. For example, <code>foo</code> with <code>bar</code> appended will be
	<code>foobar</code>, whereas <code>foo/</code> with <code>bar</code> appended will be
	<code>foo/bar</code>.
	</note>
	<s3 title="Appending">
	<p>Here, when calculating the destination URI, the source prefix is ignored, and the destination
	URI is calculated by appending the source URI to the end of the destination URI. For example,
	with the following values:</p>
	<p>Source prefix: <code>site/</code>, source URI: <code>page.html</code>, destination URI:
	<code>pages/</code></p>
	<p>A request will be made to Cocoon for a page at: <code>site/page.html</code>. This will be
	saved as <code>pages/page.html</code>.</p>
	</s3>
	<s3 title="Replacing">
	<p>Here, when calculating the destination URI, the source prefix and the source URI are
	ignored, and the destination URI is used as is. This is useful when you wish to save the
	generated page with a filename that bears no relationship to the source URI. For example,
	with the following values:</p>
	<p>Source prefix: <code>site/</code>, source URI: <code>page.html</code>, destination URI:
	<code>pages/simple.html</code></p>
	<p>A request will be made to Cocoon for a page at: <code>site/page.html</code>. This will be
	saved as <code>pages/simple.html</code>.</p>
	<note>Given the nature of this target type, it inherently cannot be used when following links
	(otherwise all pages will be written on top of each other).</note>
	</s3>
	<s3 title="Inserting">
	<p>Here, when calculating the destination URI, the source prefix is ignored, and the source URI
	is inserted into the destination URI at the point marked by an asterisk (*). This is intended
	for use with complex protocols where the source URI does not appear at the end of the
	destination URI.</p>
	</s3>
	</s2>
	<s2 title="Mime Type Checking">
	<p>Cocoon can optionally test the mime type for a page, and, if the mime type doesn't match the page's
	extension, amend the destination URI to include the correct extension. This will ensure that pages
	will load correctly when served by a static web server.</p>
	<p>When Cocoon amends a destination URI, it also amends URIs for links in those pages, so that links
	will still work when a site has been crawled.</p>
	<note>This feature substantially slows down page generation, as each page must be generated three times,
	(once to find links, once to find its mime-type and once to collect the actual content. This
	can be avoided by ensuring that all URIs in the site are correct and do not need amending, in which
	case it is only necessary to generate a page once.</note>
	</s2>
	</s1>
	<s1 title="Following Links and Site Crawling">
	<p>Cocoon can be configured to either follow, or ignore, links in pages that it generates. It has two methods
	of gathering links, 'link view' and 'link gathering'.</p>
	<s2 title="Link View Crawling">
	<p>With link view crawling, Cocoon gets the links by generating the 'link view' for a page. Using link view
	gives a significant degree of configurability in terms of which links are gathered, as it is possible to
	insert a transformer into the view to select out links that should not be followed.</p>
	<p>The disadvantage with link view crawling is that each page must be generated twice, which doubles page
	generation time.</p>
	<p>Link view is usually configured in the root sitemap with:</p>
	<source>
	<![CDATA[

	<map:views>

	<map:view from-position="last" name="links">
	<map:serialize type="links"/>
	</map:view>

	</map:views>
	]]>
	</source>
	<p>If you have this in your root sitemap, you do not need it in your sub-sitemaps. However, you may choose
	to override it with one that carries our further processing - for example, with an XSLT transformer that
	removes links that should not be crawled.</p>
	<p>See <link href="../concepts/views.html">views</link> for more on views. </p>
	<p>You can see the link view yourself by appending <code>?cocoon-view=links</code> to the page's URI.</p>
	</s2>
	<s2 title="Link Gathering Crawling">
	<p>With link gathering crawling, links are gathered from the SAX stream right before the serializer. All
	<code>src</code>, <code>href</code> and <code>xlink:href</code> attributes are taken to be links, and are
	therefore followed.</p>
	<p>The benefit of link gathering crawling is that pages do not need to be generated twice. However, one looses
	the ability to configure which links should be followed that exists with link view crawling.</p>
	</s2>
	</s1>
	<s1 title="Broken Links">
	<p>When a page cannot be found at a URI that has either been specified, or has been found as a link in another
	page, it is considered 'broken'.</p>
	<p>Exactly what is done when a broken link is found depends upon the method used to evoke
	Cocoon. See related pages for specific details.</p>
	<s2 title="Broken Link Handling using xconf Configuration method">
	<p>The xconf method allows for more sophisticated broken link handling. The
	user can select to have broken links reported to a file, this file being
	either text or XML.</p>
	<p>When this file is plain text, it will have one link URI per line.</p>
	<p>When this file is in XML, it will detail a message explaining the reason
	for the broken link, as well as the URI of the link.</p>
	<p>It is also possible to specify whether an error page should be generated
	in the place of the broken page (based upon the configured
	<code><map:handle-errors></code> code in the sitemap). If required,
	an extension can be appended to the original file's URI to signify that
	it is an error page (e.g. <code>.error</code>).</p>
	</s2>
	</s1>
	<s1 title="Precompiling XSPs">
	<p>When used offline, Cocoon can precompile XSP pages. If no URIs are specified, it will scan all directories
	within the context directory looking for XSP files, each of which will be compiled. If URIs are specified,
	all links will be followed looking for pages that make use of XSP, compiling those XSP pages as they are
	found.</p>
	</s1>
	</body>
	</document>