blob: df841949b20c50470a0f8efcce7c53868077ef9e [file] [log] [blame]
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta content="Apache Forrest" name="Generator">
<meta name="Forrest-version" content="0.7">
<meta name="Forrest-skin-name" content="pelt">
<title>i18n</title>
<link type="text/css" href="skin/basic.css" rel="stylesheet">
<link media="screen" type="text/css" href="skin/screen.css" rel="stylesheet">
<link media="print" type="text/css" href="skin/print.css" rel="stylesheet">
<link type="text/css" href="skin/profile.css" rel="stylesheet">
<script src="skin/getBlank.js" language="javascript" type="text/javascript"></script><script src="skin/getMenu.js" language="javascript" type="text/javascript"></script><script src="skin/fontsize.js" language="javascript" type="text/javascript"></script>
<link rel="shortcut icon" href="images/favicon.ico">
</head>
<body onload="init()">
<script type="text/javascript">ndeSetTextSize();</script>
<div id="top">
<div class="breadtrail">
<a href="http://www.apache.org/">Apache</a> &gt; <a href="http://lucene.apache.org/">Lucene</a> &gt; <a href="http://lucene.apache.org/nutch/">Nutch</a><script src="skin/breadcrumbs.js" language="JavaScript" type="text/javascript"></script>
</div>
<div class="header">
<div class="grouplogo">
<a href="http://lucene.apache.org/"><img class="logoImage" alt="Lucene" src="http://lucene.apache.org/java/docs/images/lucene_green_150.gif" title="Apache Lucene"></a>
</div>
<div class="projectlogo">
<a href="http://lucene.apache.org/nutch/"><img class="logoImage" alt="Nutch" src="images/nutch-logo.gif" title="Open Source Web Search Software"></a>
</div>
<div class="searchbox">
<form action="http://www.google.com/search" method="get" class="roundtopsmall">
<input value="lucene.apache.org" name="sitesearch" type="hidden"><input onFocus="getBlank (this, 'Search the site with google');" size="25" name="q" id="query" type="text" value="Search the site with google">&nbsp;
<input attr="value" name="Search" value="Search" type="submit">
</form>
</div>
<ul id="tabs">
<li class="current">
<a class="base-selected" href="index.html">Main</a>
</li>
<li>
<a class="base-not-selected" href="http://wiki.apache.org/nutch/">Wiki</a>
</li>
</ul>
</div>
</div>
<div id="main">
<div id="publishedStrip">
<div id="level2tabs"></div>
<script type="text/javascript"><!--
document.write("<text>Last Published:</text> " + document.lastModified);
// --></script>
</div>
<div class="breadtrail">
&nbsp;
</div>
<div id="menu">
<div onclick="SwitchMenu('menu_1.1', 'skin/')" id="menu_1.1Title" class="menutitle">Project</div>
<div id="menu_1.1" class="menuitemgroup">
<div class="menuitem">
<a href="index.html">News</a>
</div>
<div class="menuitem">
<a href="about.html">About</a>
</div>
<div class="menuitem">
<a href="credits.html">Credits</a>
</div>
<div class="menuitem">
<a href="http://www.cafepress.com/nutch/">Buy Stuff</a>
</div>
</div>
<div onclick="SwitchMenu('menu_selected_1.2', 'skin/')" id="menu_selected_1.2Title" class="menutitle" style="background-image: url('skin/images/chapter_open.gif');">Documentation</div>
<div id="menu_selected_1.2" class="selectedmenuitemgroup" style="display: block;">
<div class="menuitem">
<a href="http://wiki.apache.org/nutch/FAQ">FAQ</a>
</div>
<div class="menuitem">
<a href="http://wiki.apache.org/nutch/">Wiki</a>
</div>
<div class="menuitem">
<a href="tutorial.html">Tutorial ver. 0.7.2</a>
</div>
<div class="menuitem">
<a href="tutorial8.html">Tutorial ver. 0.8</a>
</div>
<div class="menuitem">
<a href="bot.html">Robot </a>
</div>
<div class="menupage">
<div class="menupagetitle">i18n</div>
</div>
<div class="menuitem">
<a href="apidocs/index.html">API Docs ver. 0.7.2</a>
</div>
<div class="menuitem">
<a href="nutch-nightly/docs/api/index.html">API Docs ver. 0.8</a>
</div>
</div>
<div onclick="SwitchMenu('menu_1.3', 'skin/')" id="menu_1.3Title" class="menutitle">Resources</div>
<div id="menu_1.3" class="menuitemgroup">
<div class="menuitem">
<a href="release/">Download</a>
</div>
<div class="menuitem">
<a href="nightly.html">Nightly builds</a>
</div>
<div class="menuitem">
<a href="mailing_lists.html">Mailing Lists</a>
</div>
<div class="menuitem">
<a href="issue_tracking.html">Issue Tracking</a>
</div>
<div class="menuitem">
<a href="version_control.html">Version Control</a>
</div>
</div>
<div onclick="SwitchMenu('menu_1.4', 'skin/')" id="menu_1.4Title" class="menutitle">Related Projects</div>
<div id="menu_1.4" class="menuitemgroup">
<div class="menuitem">
<a href="http://lucene.apache.org/java/">Lucene Java</a>
</div>
<div class="menuitem">
<a href="http://lucene.apache.org/hadoop/">Hadoop</a>
</div>
</div>
<div id="credit"></div>
<div id="roundbottom">
<img style="display: none" class="corner" height="15" width="15" alt="" src="skin/images/rc-b-l-15-1body-2menu-3menu.png"></div>
<div id="credit2"></div>
</div>
<div id="content">
<div title="Portable Document Format" class="pdflink">
<a class="dida" href="i18n.pdf"><img alt="PDF -icon" src="skin/images/pdfdoc.gif" class="skin"><br>
PDF</a>
</div>
<h1>i18n</h1>
<div id="minitoc-area">
<ul class="minitoc">
<li>
<a href="#Getting+Started">Getting Started</a>
</li>
<li>
<a href="#Page+Header">Page Header</a>
</li>
<li>
<a href="#Static+Page+Content">Static Page Content</a>
</li>
<li>
<a href="#Dynamic+Page+Content">Dynamic Page Content</a>
</li>
<li>
<a href="#Generating+Static+Pages">Generating Static Pages</a>
</li>
<li>
<a href="#Testing+Dynamic+Pages">Testing Dynamic Pages</a>
</li>
</ul>
</div>
<p>The Nutch search pages are easy to internationalize.</p>
<p>For each language, there are three kinds things which must be
translated:</p>
<ol>
<li>
<b>page header</b>: This is a list of anchors included at the top of
every page.</li>
<li>
<b>static pages</b>: These include the "about" page, the "search"
page and the "help" page.</li>
<li>
<b>dynamic page text</b>: These are strings used when constructing
search result pages.</li>
</ol>
<p>Each of the above is described in more detail below.</p>
<a name="N10027"></a><a name="Getting+Started"></a>
<h2 class="h3">Getting Started</h2>
<div class="section">
<p>The things to translate are:</p>
<ol>
<li>the page header</li>
<li>the "about" page (<tt>src/web/pages/<i>lang</i>/about.xml</tt>)</li>
<li>the "search" page (<tt>src/web/pages/<i>lang</i>/search.xml</tt>)</li>
<li>the "help" page (<tt>src/web/pages/<i>lang</i>/help.xml</tt>)</li>
<li>text for search results (<tt>src/web/locale/org/nutch/jsp/search_<i>lang</i>.properties</tt>)</li>
</ol>
<p>If you'd like to provide a translation, simply post translations of
these five files to <a href="mailto:nutch-dev@lucene.apache.org">nutch-dev@lucene.apache.org</a>
as an attachment.</p>
</div>
<a name="N10061"></a><a name="Page+Header"></a>
<h2 class="h3">Page Header</h2>
<div class="section">
<p>The Nutch page header is included at the top of every page.</p>
<p>The header is filed as
<tt>src/web/include/<i>language</i>/header.xml</tt> where
<i>language</i> is the <a href="http://ftp.ics.uci.edu/pub/ietf/http/related/iso639.txt">IS0639</a>
language code.</p>
<p>The format of the header file is:</p>
<pre>
&lt;header-menu&gt;
&lt;item&gt; ... &lt;/item&gt;
&lt;item&gt; ... &lt;/item&gt;
&lt;/header-menu&gt;
</pre>
<p>Each item typically includes an HTML anchor, one for each of the
top-level pages in the translation.</p>
<p>For example, the header file for an English translation is filed
as <a href="http://svn.apache.org/repos/asf/lucene/nutch/trunk/src/web/include/en/header.xml"><tt>src/web/include/en/header.xml</tt></a>.</p>
</div>
<a name="N1008B"></a><a name="Static+Page+Content"></a>
<h2 class="h3">Static Page Content</h2>
<div class="section">
<p>Static pages compose most of the Nutch website, and are also used
for project documentation. These are HTML generated from XML files by
XSLT. This process is used to include a standard header and footer,
and optionally a menu of sub-pages.</p>
<p>Static page content is filed as
<tt>src/web/pages/<i>language</i>/<i>page</i>.xml</tt> where
<i>language</i> is the IS0639 language code, as above, and <i>page</i>
determines the name of the page generated:
<tt>docs/<i>page</i>.html</tt>.</p>
<p>The format of a static page xml file is:</p>
<pre>
&lt;page&gt;
&lt;title&gt; ... &lt;/title&gt;
&lt;menu&gt;
&lt;item&gt; ... &lt;/item&gt;
&lt;item&gt; ... &lt;/item&gt;
&lt;/menu&gt;
&lt;body&gt; ... &lt;/body&gt;
&lt;/page&gt;
</pre>
<tt>&lt;menu&gt;</tt>
<p>Note that if you use an encoding other than UTF-8 (the default for
XML data) then you need to declare that. Also, if you use HTML
entities in your data, you'll need to declare these too. Look at
existing translations for examples of this.</p>
<p>For example, the English language "about" page is filed
as <a href="http://svn.apache.org/repos/asf/lucene/nutch/trunk/src/web/pages/en/about.xml"><tt>src/web/pages/en/about.xml</tt></a>.</p>
</div>
<a name="N100C0"></a><a name="Dynamic+Page+Content"></a>
<h2 class="h3">Dynamic Page Content</h2>
<div class="section">
<p>Java Server Pages (JSP) is used to generate Nutch search results, and
a few other dynamic pages (cached content, score explanations, etc.).</p>
<p>These use Java's <a href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/Locale.html">Locale</a>
mechanism for internationalization. For each page/language pair,
there is a Java property file containing the translated text of that
page.</p>
<p>These property files are filed as
<tt>src/web/locale/org/nutch/jsp/<i>page</i>_<i>language</i>.xml</tt>
where <i>page</i> is the name of the JSP page in <a href="http://svn.apache.org/repos/asf/lucene/nutch/trunk/src/web/jsp/"><tt>src/web/jsp/</tt></a>
and <i>language</i> is the IS0639 language code, as above.</p>
<p>For example, text for the English language search results page is filed
as <a href="http://svn.apache.org/repos/asf/lucene/nutch/trunk/src/web/locale/org/nutch/jsp/search_en.properties"><tt>src/web/locale/org/nutch/jsp/search_en.properties</tt></a>.
This contains something like:</p>
<pre>
title = search results
search = Search
hits = Hits &lt;b&gt;{0}-{1}&lt;/b&gt; (out of {2} total matching documents):
cached = cached
explain = explain
anchors = anchors
next = Next
</pre>
<p>Each entry corresponds to a text fragment on the search results
page. The "hits" entry uses Java's <a href="http://java.sun.com/j2se/1.4.2/docs/api/java/text/MessageFormat.html">MessageFormat</a>.</p>
<p>Note that property files must use the ISO 8859-1 encoding with
unicode escapes. If you author them in a different encoding, please
use Java's <tt>native2ascii</tt> tool to convert them to this
encoding.</p>
</div>
<a name="N100FF"></a><a name="Generating+Static+Pages"></a>
<h2 class="h3">Generating Static Pages</h2>
<div class="section">
<p>To generate the static pages you must have <a href="http://java.sun.com/j2se/downloads.html">Java</a>, <a href="http://ant.apache.org/">Ant</a> and Nutch installed. To
install Nutch, either download and unpack the latest <a href="http://lucene.apache.org/nutch/release/nightly/">release</a>, or check it
out from <a href="version_control.html">Subversion</a>.</p>
<p>Then give the command:</p>
<pre>
ant generate-docs
</pre>
<i>This documentation needs more detail. Could someone
please submit a list of the actual steps required here?</i>
<p>Once this is working, try adding directories and files to make your
own translation of the header and a few of the static pages.</p>
</div>
<a name="N10124"></a><a name="Testing+Dynamic+Pages"></a>
<h2 class="h3">Testing Dynamic Pages</h2>
<div class="section">
<p>To test the dynamic pages you must also have <a href="http://jakarta.apache.org/tomcat/">Tomcat</a> installed.</p>
<p>An index is also required. You can collect your own by working
through the <a href="http://lucene.apache.org/nutch/tutorial.html">tutorial</a>.
Once you have an index, follow the steps outlined at the end of the
tutorial for searching.</p>
<i>This documentation needs more detail. Could someone
please submit a list of the actual steps required here?</i>
</div>
</div>
<div class="clearboth">&nbsp;</div>
</div>
<div id="footer">
<div class="lastmodified">
<script type="text/javascript"><!--
document.write("<text>Last Published:</text> " + document.lastModified);
// --></script>
</div>
<div class="copyright">
Copyright &copy;
2005 <a href="http://www.apache.org/licenses/">The Apache Software Foundation.</a>
</div>
</div>
</body>
</html>