| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> |
| |
| <!-- |
| Copyright 1999-2004 The Apache Software Foundation |
| Licensed under the Apache License, Version 2.0 (the "License"); |
| you may not use this file except in compliance with the License. |
| You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| |
| <!-- Content Stylesheet for Site --> |
| |
| |
| <!-- start the processing --> |
| <!-- ====================================================================== --> |
| <!-- GENERATED FILE, DO NOT EDIT, EDIT THE XML FILE IN xdocs INSTEAD! --> |
| <!-- Main Page Section --> |
| <!-- ====================================================================== --> |
| <html> |
| <head> |
| <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/> |
| |
| <meta name="author" value="Andrew C. Oliver"> |
| <meta name="email" value="acoliver@apache.org"> |
| |
| |
| |
| |
| <title>Jakarta Lucene - Jakarta Lucene - Basic Demo Sources Walkthrough</title> |
| </head> |
| |
| <body bgcolor="#ffffff" text="#000000" link="#525D76"> |
| <table border="0" width="100%" cellspacing="0"> |
| <!-- TOP IMAGE --> |
| <tr> |
| <td align="left"> |
| <a href="http://jakarta.apache.org"><img src="http://jakarta.apache.org/images/jakarta-logo.gif" border="0"/></a> |
| </td> |
| <td align="right"> |
| <a href="http://jakarta.apache.org/lucene/"><img src="./images/lucene_green_300.gif" alt="Jakarta Lucene" border="0"/></a> |
| </td> |
| </tr> |
| </table> |
| <table border="0" width="100%" cellspacing="4"> |
| <tr><td colspan="2"> |
| <hr noshade="" size="1"/> |
| </td></tr> |
| |
| <tr> |
| <!-- LEFT SIDE NAVIGATION --> |
| <td width="20%" valign="top" nowrap="true"> |
| |
| <!-- ============================================================ --> |
| |
| <p><strong>About</strong></p> |
| <ul> |
| <li> <a href="./index.html">Overview</a> |
| </li> |
| <li> <a href="http://wiki.apache.org/jakarta-lucene/PoweredBy">Powered by Lucene</a> |
| </li> |
| <li> <a href="./whoweare.html">Who We Are</a> |
| </li> |
| <li> <a href="http://jakarta.apache.org/site/mail.html">Mailing Lists</a> |
| </li> |
| </ul> |
| <p><strong>Resources</strong></p> |
| <ul> |
| <li> <a href="http://wiki.apache.org/jakarta-lucene">Wiki</a> |
| </li> |
| <li> <a href="http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi">FAQ (Official)</a> |
| </li> |
| <li> <a href="http://www.jguru.com/faq/Lucene">jGuru FAQ</a> |
| </li> |
| <li> <a href="./gettingstarted.html">Getting Started</a> |
| </li> |
| <li> <a href="./queryparsersyntax.html">Query Syntax</a> |
| </li> |
| <li> <a href="./systemproperties.html">System Properties</a> |
| </li> |
| <li> <a href="./fileformats.html">File Formats</a> |
| </li> |
| <li> <a href="./api/index.html">Javadoc</a> |
| </li> |
| <li> <a href="./contributions.html">Contributions</a> |
| </li> |
| <li> <a href="./resources.html">Articles, etc.</a> |
| </li> |
| <li> <a href="./benchmarks.html">Benchmarks</a> |
| </li> |
| <li> <a href="http://issues.apache.org/bugzilla/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&email1=&emailtype1=substring&emailassigned_to1=1&email2=&emailtype2=substring&emailreporter2=1&bugidtype=include&bug_id=&changedin=&votes=&chfieldfrom=&chfieldto=Now&chfieldvalue=&product=Lucene&short_desc=%5BPATCH%5D&short_desc_type=allwordssubstr&long_desc=&long_desc_type=allwordssubstr&bug_file_loc=&bug_file_loc_type=allwordssubstr&keywords=&keywords_type=anywords&field0-0-0=noop&type0-0-0=noop&value0-0-0=&cmdtype=doit&order=%27Importance%27">Patches</a> |
| </li> |
| <li> <a href="http://jakarta.apache.org/site/bugs.html">Bugs</a> |
| </li> |
| <li> <a href="http://issues.apache.org/bugzilla/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&email1=&emailtype1=substring&emailassigned_to1=1&email2=&emailtype2=substring&emailreporter2=1&bugidtype=include&bug_id=&changedin=&votes=&chfieldfrom=&chfieldto=Now&chfieldvalue=&product=Lucene&short_desc=&short_desc_type=allwordssubstr&long_desc=&long_desc_type=allwordssubstr&bug_file_loc=&bug_file_loc_type=allwordssubstr&keywords=&keywords_type=anywords&field0-0-0=noop&type0-0-0=noop&value0-0-0=&cmdtype=doit&order=%27Importance%27">Lucene Bugs</a> |
| </li> |
| <li> <a href="http://issues.apache.org/eyebrowse/SummarizeList?listId=30">Lucene-user</a> |
| </li> |
| <li> <a href="http://issues.apache.org/eyebrowse/SummarizeList?listId=29">Lucene-dev</a> |
| </li> |
| <li> <a href="./lucene-sandbox/">Lucene Sandbox</a> |
| </li> |
| </ul> |
| <p><strong>Download</strong></p> |
| <ul> |
| <li> <a href="http://jakarta.apache.org/site/binindex.html">Binaries</a> |
| </li> |
| <li> <a href="http://jakarta.apache.org/site/sourceindex.html">Source Code</a> |
| </li> |
| <li> <a href="http://jakarta.apache.org/site/cvsindex.html">CVS Repositories</a> |
| </li> |
| </ul> |
| <p><strong>Jakarta</strong></p> |
| <ul> |
| <li> <a href="http://jakarta.apache.org/site/getinvolved.html">Get Involved</a> |
| </li> |
| <li> <a href="http://jakarta.apache.org/site/acknowledgements.html">Acknowledgements</a> |
| </li> |
| <li> <a href="http://jakarta.apache.org/site/contact.html">Contact</a> |
| </li> |
| <li> <a href="http://jakarta.apache.org/site/legal.html">Legal</a> |
| </li> |
| </ul> |
| </td> |
| <td width="80%" align="left" valign="top"> |
| <table border="0" cellspacing="0" cellpadding="2" width="100%"> |
| <tr><td bgcolor="#525D76"> |
| <font color="#ffffff" face="arial,helvetica,sanserif"> |
| <a name="About the Code"><strong>About the Code</strong></a> |
| </font> |
| </td></tr> |
| <tr><td> |
| <blockquote> |
| <p> |
| In this section we walk through the sources behind the basic Lucene demo such as where to |
| find it, its parts and their function. This section is intended for Java developers |
| wishing to understand how to use Jakarta Lucene in their applications. |
| </p> |
| </blockquote> |
| </p> |
| </td></tr> |
| <tr><td><br/></td></tr> |
| </table> |
| <table border="0" cellspacing="0" cellpadding="2" width="100%"> |
| <tr><td bgcolor="#525D76"> |
| <font color="#ffffff" face="arial,helvetica,sanserif"> |
| <a name="Location of the source"><strong>Location of the source</strong></a> |
| </font> |
| </td></tr> |
| <tr><td> |
| <blockquote> |
| <p> |
| Relative to the directory created when you extracted Lucene or retreived it from CVS, you |
| should see a directory called "src" which in turn contains a directory called "demo". |
| This is the root for all of the Lucene demos. Under this directory is org/apache/lucene/demo, |
| this is where all the Java sources live. |
| </p> |
| <p> |
| Within this directory you should see the IndexFiles class we executed earlier. Bring that |
| up in vi or your alternative text editor and lets take a look at it. |
| </p> |
| </blockquote> |
| </p> |
| </td></tr> |
| <tr><td><br/></td></tr> |
| </table> |
| <table border="0" cellspacing="0" cellpadding="2" width="100%"> |
| <tr><td bgcolor="#525D76"> |
| <font color="#ffffff" face="arial,helvetica,sanserif"> |
| <a name="IndexFiles"><strong>IndexFiles</strong></a> |
| </font> |
| </td></tr> |
| <tr><td> |
| <blockquote> |
| <p> |
| As we discussed in the previous walkthrough, the IndexFiles class creates a Lucene Index. |
| Lets take a look at how it does this. |
| </p> |
| <p> |
| The first substantial thing the main function does is instantiate an instance |
| of IndexWriter. It passes a string called "index" and a new instance of a class called |
| "StandardAnalyzer". The "index" string is the name of the directory that all index information |
| should be stored in. Because we're not passing any path information, one must assume this |
| will be created as a subdirectory of the current directory (if does not already exist). On |
| some platforms this may actually result in it being created in other directories (such as |
| the user's home directory). |
| </p> |
| <p> |
| The <b>IndexWriter</b> is the main class responsible for creating indicies. To use it you |
| must instantiate it with a path that it can write the index into, if this path does not |
| exist it will create it, otherwise it will refresh the index living at that path. You |
| must a also pass an instance of <b>org.apache.analysis.Analyzer</b>. |
| </p> |
| <p> |
| The <b>Analyzer</b>, in this case, the <b>Stop Analyzer</b> is little more than a standard Java |
| Tokenizer, converting all strings to lowercase and filtering out useless words from the index. |
| By useless words I mean common language words such as articles (a,an,the) and other words that |
| would be useless for searching. It should be noted that there are different rules for every |
| language, and you should use the proper analyzer for each. Lucene currently provides Analyzers |
| for English and German. |
| </p> |
| <p> |
| Looking down further in the file, you should see the indexDocs() code. This recursive function |
| simply crawls the directories and uses FileDocument to create Document objects. The Document |
| is simply a data object to represent the content in the file as well as its creation time and |
| location. These instances are added to the indexWriter. Take a look inside FileDocument. Its |
| not particularly complicated, it just adds fields to the Document. |
| </p> |
| <p> |
| As you can see there isn't much to creating an index. The devil is in the details. You may also |
| wish to examine the other samples in this directory, particularly the IndexHTML class. It is |
| a bit more complex but builds upon this example. |
| </p> |
| </blockquote> |
| </p> |
| </td></tr> |
| <tr><td><br/></td></tr> |
| </table> |
| <table border="0" cellspacing="0" cellpadding="2" width="100%"> |
| <tr><td bgcolor="#525D76"> |
| <font color="#ffffff" face="arial,helvetica,sanserif"> |
| <a name="Searching Files"><strong>Searching Files</strong></a> |
| </font> |
| </td></tr> |
| <tr><td> |
| <blockquote> |
| <p> |
| The SearchFiles class is quite simple. It primarily collaborates with an IndexSearcher, StandardAnalyzer |
| (which is used in the IndexFiles class as well) and a QueryParser. The query parser is constructed |
| with an analyzer used to interperate your query in the same way the Index was interperated: finding |
| the end of words and removing useless words like 'a', 'an' and 'the'. The Query object contains the |
| results from the QueryParser which is passed to the searcher. The searcher results are returned in |
| a collection of Documents called "Hits" which is then iterated through and displayed to the user. |
| </p> |
| </blockquote> |
| </p> |
| </td></tr> |
| <tr><td><br/></td></tr> |
| </table> |
| <table border="0" cellspacing="0" cellpadding="2" width="100%"> |
| <tr><td bgcolor="#525D76"> |
| <font color="#ffffff" face="arial,helvetica,sanserif"> |
| <a name="The Web example..."><strong>The Web example...</strong></a> |
| </font> |
| </td></tr> |
| <tr><td> |
| <blockquote> |
| <p> |
| <a href="demo3.html">read on>>></a> |
| </p> |
| </blockquote> |
| </p> |
| </td></tr> |
| <tr><td><br/></td></tr> |
| </table> |
| </td> |
| </tr> |
| |
| <!-- FOOTER --> |
| <tr><td colspan="2"> |
| <hr noshade="" size="1"/> |
| </td></tr> |
| <tr><td colspan="2"> |
| <div align="center"><font color="#525D76" size="-1"><em> |
| Copyright © 1999-2004, The Apache Software Foundation |
| </em></font></div> |
| </td></tr> |
| </table> |
| </body> |
| </html> |
| <!-- end the processing --> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |