| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> |
| |
| <!-- |
| Copyright 1999-2004 The Apache Software Foundation |
| Licensed under the Apache License, Version 2.0 (the "License"); |
| you may not use this file except in compliance with the License. |
| You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| |
| <!-- Content Stylesheet for Site --> |
| |
| |
| <!-- start the processing --> |
| <!-- ====================================================================== --> |
| <!-- GENERATED FILE, DO NOT EDIT, EDIT THE XML FILE IN xdocs INSTEAD! --> |
| <!-- Main Page Section --> |
| <!-- ====================================================================== --> |
| <html> |
| <head> |
| <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/> |
| |
| <meta name="author" value="Andrew C. Oliver"> |
| <meta name="email" value="acoliver@apache.org"> |
| |
| |
| |
| |
| <title>Jakarta Lucene - Jakarta Lucene - Basic Demo Sources Walkthrough</title> |
| </head> |
| |
| <body bgcolor="#ffffff" text="#000000" link="#525D76"> |
| <table border="0" width="100%" cellspacing="0"> |
| <!-- TOP IMAGE --> |
| <tr> |
| <td align="left"> |
| <a href="http://jakarta.apache.org"><img src="http://jakarta.apache.org/images/jakarta-logo.gif" border="0"/></a> |
| </td> |
| <td align="right"> |
| <a href="http://jakarta.apache.org/lucene/"><img src="./images/lucene_green_300.gif" alt="Jakarta Lucene" border="0"/></a> |
| </td> |
| </tr> |
| </table> |
| <table border="0" width="100%" cellspacing="4"> |
| <tr><td colspan="2"> |
| <hr noshade="" size="1"/> |
| </td></tr> |
| |
| <tr> |
| <!-- LEFT SIDE NAVIGATION --> |
| <td width="20%" valign="top" nowrap="true"> |
| |
| <!-- ============================================================ --> |
| |
| <p><strong>About</strong></p> |
| <ul> |
| <li> <a href="./index.html">Overview</a> |
| </li> |
| <li> <a href="http://wiki.apache.org/jakarta-lucene/PoweredBy">Powered by Lucene</a> |
| </li> |
| <li> <a href="./whoweare.html">Who We Are</a> |
| </li> |
| <li> <a href="http://jakarta.apache.org/site/mail.html">Mailing Lists</a> |
| </li> |
| </ul> |
| <p><strong>Resources</strong></p> |
| <ul> |
| <li> <a href="http://wiki.apache.org/jakarta-lucene">Wiki</a> |
| </li> |
| <li> <a href="http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi">FAQ (Official)</a> |
| </li> |
| <li> <a href="http://www.jguru.com/faq/Lucene">jGuru FAQ</a> |
| </li> |
| <li> <a href="./gettingstarted.html">Getting Started</a> |
| </li> |
| <li> <a href="./queryparsersyntax.html">Query Syntax</a> |
| </li> |
| <li> <a href="./systemproperties.html">System Properties</a> |
| </li> |
| <li> <a href="./fileformats.html">File Formats</a> |
| </li> |
| <li> <a href="./api/index.html">Javadoc</a> |
| </li> |
| <li> <a href="./contributions.html">Contributions</a> |
| </li> |
| <li> <a href="./resources.html">Articles, etc.</a> |
| </li> |
| <li> <a href="./benchmarks.html">Benchmarks</a> |
| </li> |
| <li> <a href="http://issues.apache.org/bugzilla/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&email1=&emailtype1=substring&emailassigned_to1=1&email2=&emailtype2=substring&emailreporter2=1&bugidtype=include&bug_id=&changedin=&votes=&chfieldfrom=&chfieldto=Now&chfieldvalue=&product=Lucene&short_desc=%5BPATCH%5D&short_desc_type=allwordssubstr&long_desc=&long_desc_type=allwordssubstr&bug_file_loc=&bug_file_loc_type=allwordssubstr&keywords=&keywords_type=anywords&field0-0-0=noop&type0-0-0=noop&value0-0-0=&cmdtype=doit&order=%27Importance%27">Patches</a> |
| </li> |
| <li> <a href="http://jakarta.apache.org/site/bugs.html">Bugs</a> |
| </li> |
| <li> <a href="http://issues.apache.org/bugzilla/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&email1=&emailtype1=substring&emailassigned_to1=1&email2=&emailtype2=substring&emailreporter2=1&bugidtype=include&bug_id=&changedin=&votes=&chfieldfrom=&chfieldto=Now&chfieldvalue=&product=Lucene&short_desc=&short_desc_type=allwordssubstr&long_desc=&long_desc_type=allwordssubstr&bug_file_loc=&bug_file_loc_type=allwordssubstr&keywords=&keywords_type=anywords&field0-0-0=noop&type0-0-0=noop&value0-0-0=&cmdtype=doit&order=%27Importance%27">Lucene Bugs</a> |
| </li> |
| <li> <a href="http://issues.apache.org/eyebrowse/SummarizeList?listId=30">Lucene-user</a> |
| </li> |
| <li> <a href="http://issues.apache.org/eyebrowse/SummarizeList?listId=29">Lucene-dev</a> |
| </li> |
| <li> <a href="./lucene-sandbox/">Lucene Sandbox</a> |
| </li> |
| </ul> |
| <p><strong>Download</strong></p> |
| <ul> |
| <li> <a href="http://jakarta.apache.org/site/binindex.html">Binaries</a> |
| </li> |
| <li> <a href="http://jakarta.apache.org/site/sourceindex.html">Source Code</a> |
| </li> |
| <li> <a href="http://jakarta.apache.org/site/cvsindex.html">CVS Repositories</a> |
| </li> |
| </ul> |
| <p><strong>Jakarta</strong></p> |
| <ul> |
| <li> <a href="http://jakarta.apache.org/site/getinvolved.html">Get Involved</a> |
| </li> |
| <li> <a href="http://jakarta.apache.org/site/acknowledgements.html">Acknowledgements</a> |
| </li> |
| <li> <a href="http://jakarta.apache.org/site/contact.html">Contact</a> |
| </li> |
| <li> <a href="http://jakarta.apache.org/site/legal.html">Legal</a> |
| </li> |
| </ul> |
| </td> |
| <td width="80%" align="left" valign="top"> |
| <table border="0" cellspacing="0" cellpadding="2" width="100%"> |
| <tr><td bgcolor="#525D76"> |
| <font color="#ffffff" face="arial,helvetica,sanserif"> |
| <a name="About the Code"><strong>About the Code</strong></a> |
| </font> |
| </td></tr> |
| <tr><td> |
| <blockquote> |
| <p> |
| In this section we walk through the sources behind the basic Lucene Web Application demo. |
| Where to find it, its parts, and their function. This section is intended for Java developers |
| wishing to understand how to use Jakarta Lucene in their applications or for those involved |
| in deploying web applications based on Lucene. |
| </p> |
| </blockquote> |
| </p> |
| </td></tr> |
| <tr><td><br/></td></tr> |
| </table> |
| <table border="0" cellspacing="0" cellpadding="2" width="100%"> |
| <tr><td bgcolor="#525D76"> |
| <font color="#ffffff" face="arial,helvetica,sanserif"> |
| <a name="Location of the source (developers/deployers)"><strong>Location of the source (developers/deployers)</strong></a> |
| </font> |
| </td></tr> |
| <tr><td> |
| <blockquote> |
| <p> |
| Relative the directory created when you extracted Lucene or retreived it from CVS, you |
| should see a directory called "src" which in turn contains a directory called "jsp". |
| This is the root for all of the Lucene web demo. |
| </p> |
| <p> |
| Within this directory you should see the index.jsp class. Bring this up in vi or your |
| editor of choice. |
| </p> |
| </blockquote> |
| </p> |
| </td></tr> |
| <tr><td><br/></td></tr> |
| </table> |
| <table border="0" cellspacing="0" cellpadding="2" width="100%"> |
| <tr><td bgcolor="#525D76"> |
| <font color="#ffffff" face="arial,helvetica,sanserif"> |
| <a name="index.jsp (developers/deployers)"><strong>index.jsp (developers/deployers)</strong></a> |
| </font> |
| </td></tr> |
| <tr><td> |
| <blockquote> |
| <p> |
| This jsp page is pretty boring by itself. All it does is include a header, display a form and |
| include a footer. If you look at the form, it has two fields: query (where you enter your |
| search criteria) and maxresults where you specify the number of results per page. If you look |
| at the form tag, you'll notice it uses the get method as opposed to the post. While this is |
| considered deprecated functionality by the latest w3c specs, its unlikely to go away due to the |
| usefulness of being able to bookmark things like searches. By the structure of this JSP it should |
| be easy to customize it without even editing this particular file. You could simply change the |
| header and footer. Let's look at the header.jsp (located in the same directory) next. |
| </p> |
| </blockquote> |
| </p> |
| </td></tr> |
| <tr><td><br/></td></tr> |
| </table> |
| <table border="0" cellspacing="0" cellpadding="2" width="100%"> |
| <tr><td bgcolor="#525D76"> |
| <font color="#ffffff" face="arial,helvetica,sanserif"> |
| <a name="header.jsp (developers/deployers)"><strong>header.jsp (developers/deployers)</strong></a> |
| </font> |
| </td></tr> |
| <tr><td> |
| <blockquote> |
| <p> |
| The header is also very simple by itself. The only thing it does is include the configuration.jsp |
| (which you looked at in the last section of this guide) and set the title and a brief header. This |
| would be a good place to put your own custom HTML to "pretty" things up a bit. We won't cover the |
| footer because all it does is display the footer and close your tags. Let's look at the results.jsp, |
| the meat of this application next. |
| </p> |
| </blockquote> |
| </p> |
| </td></tr> |
| <tr><td><br/></td></tr> |
| </table> |
| <table border="0" cellspacing="0" cellpadding="2" width="100%"> |
| <tr><td bgcolor="#525D76"> |
| <font color="#ffffff" face="arial,helvetica,sanserif"> |
| <a name="results.jsp (developers)"><strong>results.jsp (developers)</strong></a> |
| </font> |
| </td></tr> |
| <tr><td> |
| <blockquote> |
| <p> |
| The results.jsp had a lot more functionality. Much of it is for paging the search results we'll not |
| cover this as its commented well enough. It does not peform any optimizations such as caching results, |
| etc. as that would make this a more complex example. The first thing in this page is the actual imports |
| for the Lucene classes and Lucene demo classes. These classes are loaded from the jars included in the |
| WEB-INF/lib directory in the final war file. |
| </p> |
| <p> |
| You'll notice that this file includes the same header and footer as the "index.jsp". From there the jsp |
| constructs an IndexSearcher with the "indexLocation" that was specified in the "configuration.jsp". If there |
| is an error of any kind in opening the index, it is diplayed ot the user and a boolean flag is set to tell |
| the rest of the sections of the jsp not to continue. |
| </p> |
| <p> |
| From there, this jsp attempts to get the search criteria, the start index (used for paging) and the maximum |
| number of results per page. If the maximum results per page is not set or not valid then it and the |
| start index are set to default values. If only the start index is invalid it is set to a default value. If |
| the criteria isn't provided then a servlet error is thrown (it is assumed that this is the result of url tampering |
| or some form of browser malfunction). |
| </p> |
| <p> |
| The jsp moves on to construct a StandardAnalyzer just as in the simple demo, to analyze the search critieria, it |
| is passed to the QueryParser along with the criteria to construct a Query object. You'll also notice the |
| string literal "contents" included. This is to specify the search should include the the contents and not |
| the title, url or some other field in the indexed documents. If there is any error in constructing a Query |
| object an error is displayed to the user. |
| </p> |
| <p> |
| In the next section of the jsp the IndexSearcher is asked to search given the query object. the results are |
| returned in a collection called "hits". If the length property of the hits collection is 0 then an error |
| is displayed to the user and the error flag is set. |
| </p> |
| <p> |
| Finally the jsp iterates through the hits collection and displayed properties of the "Document" objects we talked |
| about in the first walkthrough. These objects contain "known" fields specific to their indexer (in this case |
| "IndexHTML" constructs a document with "url", "title" and "contents"). You'll notice that these results are paged |
| but the search is repeated every time. This is an area where optimization could improve performance for large |
| result sets. |
| </p> |
| </blockquote> |
| </p> |
| </td></tr> |
| <tr><td><br/></td></tr> |
| </table> |
| <table border="0" cellspacing="0" cellpadding="2" width="100%"> |
| <tr><td bgcolor="#525D76"> |
| <font color="#ffffff" face="arial,helvetica,sanserif"> |
| <a name="More sources (developers)"><strong>More sources (developers)</strong></a> |
| </font> |
| </td></tr> |
| <tr><td> |
| <blockquote> |
| <p> |
| There are additional sources used by the web app that were not specifically covered by either walkthrough. For |
| example the HTML parser, the IndexHTML class and HTMLDocument class. These are very similar to the classes |
| covered in the first example, however they have properties sepecific to parsing and indexing HTML. This is |
| beyond our scope; however, by now you should feel like you're "getting started" with Lucene. |
| </p> |
| </blockquote> |
| </p> |
| </td></tr> |
| <tr><td><br/></td></tr> |
| </table> |
| <table border="0" cellspacing="0" cellpadding="2" width="100%"> |
| <tr><td bgcolor="#525D76"> |
| <font color="#ffffff" face="arial,helvetica,sanserif"> |
| <a name="Where to go from here? (Everyone!)"><strong>Where to go from here? (Everyone!)</strong></a> |
| </font> |
| </td></tr> |
| <tr><td> |
| <blockquote> |
| <p> |
| There are a number of things this demo doesn't do or doesn't do quite right. For instance, you may |
| have noticed that documents in the root context are unreachable (unless you reconfigure Tomcat to |
| support that context or redirect to it), anywhere where the directory doesn't quite match the context mapping, |
| you'll have a broken link in your results. If you want to index non-local files or have some other |
| needs this isn't supported, plus there may be security issues with running the indexing application from |
| your webapps directory. There are a number of things left for you the implementor or developer to do. |
| </p> |
| <p> |
| In time some of these things may be added to Lucene as features (if you've got a good idea we'd love to hear it!), |
| but for now: this is where you begin and the search engine/indexer ends. Lastly, one would assume you'd |
| want to follow the above advice and customize the application to look a little more fancy than black on |
| white with "Lucene Template" at the top. We'll see you on the Lucene Users' or Developers' mailing lists! |
| </p> |
| </blockquote> |
| </p> |
| </td></tr> |
| <tr><td><br/></td></tr> |
| </table> |
| <table border="0" cellspacing="0" cellpadding="2" width="100%"> |
| <tr><td bgcolor="#525D76"> |
| <font color="#ffffff" face="arial,helvetica,sanserif"> |
| <a name="When to contact the Author"><strong>When to contact the Author</strong></a> |
| </font> |
| </td></tr> |
| <tr><td> |
| <blockquote> |
| <p> |
| Please resist the urge to contact the authors of this document (without bribes of fame and fortune attached). First |
| contact the <a href="http://jakarta.apache.org/site/mail.html">mailing lists</a>. That being said feedback, |
| and modifications to this document and samples are ever so greatly appreciatedThey are just best sent to the |
| lists so that everyone can share in them. Certainly you'll get the most help there as well. |
| Thanks for understanding. |
| </p> |
| </blockquote> |
| </p> |
| </td></tr> |
| <tr><td><br/></td></tr> |
| </table> |
| </td> |
| </tr> |
| |
| <!-- FOOTER --> |
| <tr><td colspan="2"> |
| <hr noshade="" size="1"/> |
| </td></tr> |
| <tr><td colspan="2"> |
| <div align="center"><font color="#525D76" size="-1"><em> |
| Copyright © 1999-2004, The Apache Software Foundation |
| </em></font></div> |
| </td></tr> |
| </table> |
| </body> |
| </html> |
| <!-- end the processing --> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |