<html> | |
<head> | |
<title>Apache Lucene API</title> | |
</head> | |
<body> | |
<p>Apache Lucene is a high-performance, full-featured text search engine library. | |
Here's a simple example how to use Lucene for indexing and searching (using JUnit | |
to check if the results are what we expect):</p> | |
<!-- ======================================================== --> | |
<!-- = Java Sourcecode to HTML automatically converted code = --> | |
<!-- = Java2Html Converter 5.0 [2006-02-26] by Markus Gebhard markus@jave.de = --> | |
<!-- = Further information: http://www.java2html.de = --> | |
<div align="left" class="java"> | |
<table border="0" cellpadding="3" cellspacing="0" bgcolor="#ffffff"> | |
<tr> | |
<!-- start source code --> | |
<td nowrap="nowrap" valign="top" align="left"> | |
<code> | |
<font color="#ffffff"> </font><font color="#000000">Analyzer analyzer = </font><font color="#7f0055"><b>new </b></font><font color="#000000">StandardAnalyzer</font><font color="#000000">()</font><font color="#000000">;</font><br /> | |
<font color="#ffffff"></font><br /> | |
<font color="#ffffff"> </font><font color="#3f7f5f">// Store the index in memory:</font><br /> | |
<font color="#ffffff"> </font><font color="#000000">Directory directory = </font><font color="#7f0055"><b>new </b></font><font color="#000000">RAMDirectory</font><font color="#000000">()</font><font color="#000000">;</font><br /> | |
<font color="#ffffff"> </font><font color="#3f7f5f">// To store an index on disk, use this instead:</font><br /> | |
<font color="#ffffff"> </font><font color="#3f7f5f">//Directory directory = FSDirectory.getDirectory("/tmp/testindex");</font><br /> | |
<font color="#ffffff"> </font><font color="#000000">IndexWriter iwriter = </font><font color="#7f0055"><b>new </b></font><font color="#000000">IndexWriter</font><font color="#000000">(</font><font color="#000000">directory, analyzer, </font><font color="#7f0055"><b>true</b></font><font color="#000000">)</font><font color="#000000">;</font><br /> | |
<font color="#ffffff"> </font><font color="#000000">iwriter.setMaxFieldLength</font><font color="#000000">(</font><font color="#990000">25000</font><font color="#000000">)</font><font color="#000000">;</font><br /> | |
<font color="#ffffff"> </font><font color="#000000">Document doc = </font><font color="#7f0055"><b>new </b></font><font color="#000000">Document</font><font color="#000000">()</font><font color="#000000">;</font><br /> | |
<font color="#ffffff"> </font><font color="#000000">String text = </font><font color="#2a00ff">"This is the text to be indexed."</font><font color="#000000">;</font><br /> | |
<font color="#ffffff"> </font><font color="#000000">doc.add</font><font color="#000000">(</font><font color="#7f0055"><b>new </b></font><font color="#000000">Field</font><font color="#000000">(</font><font color="#2a00ff">"fieldname"</font><font color="#000000">, text, Field.Store.YES,</font><br /> | |
<font color="#ffffff"> </font><font color="#000000">Field.Index.TOKENIZED</font><font color="#000000">))</font><font color="#000000">;</font><br /> | |
<font color="#ffffff"> </font><font color="#000000">iwriter.addDocument</font><font color="#000000">(</font><font color="#000000">doc</font><font color="#000000">)</font><font color="#000000">;</font><br /> | |
<font color="#ffffff"> </font><font color="#000000">iwriter.optimize</font><font color="#000000">()</font><font color="#000000">;</font><br /> | |
<font color="#ffffff"> </font><font color="#000000">iwriter.close</font><font color="#000000">()</font><font color="#000000">;</font><br /> | |
<font color="#ffffff"> </font><br /> | |
<font color="#ffffff"> </font><font color="#3f7f5f">// Now search the index:</font><br /> | |
<font color="#ffffff"> </font><font color="#000000">IndexSearcher isearcher = </font><font color="#7f0055"><b>new </b></font><font color="#000000">IndexSearcher</font><font color="#000000">(</font><font color="#000000">directory</font><font color="#000000">)</font><font color="#000000">;</font><br /> | |
<font color="#ffffff"> </font><font color="#3f7f5f">// Parse a simple query that searches for "text":</font><br /> | |
<font color="#ffffff"> </font><font color="#000000">QueryParser parser = </font><font color="#7f0055"><b>new </b></font><font color="#000000">QueryParser</font><font color="#000000">(</font><font color="#2a00ff">"fieldname"</font><font color="#000000">, analyzer</font><font color="#000000">)</font><font color="#000000">;</font><br /> | |
<font color="#ffffff"> </font><font color="#000000">Query query = parser.parse</font><font color="#000000">(</font><font color="#2a00ff">"text"</font><font color="#000000">)</font><font color="#000000">;</font><br /> | |
<font color="#ffffff"> </font><font color="#000000">Hits hits = isearcher.search</font><font color="#000000">(</font><font color="#000000">query</font><font color="#000000">)</font><font color="#000000">;</font><br /> | |
<font color="#ffffff"> </font><font color="#000000">assertEquals</font><font color="#000000">(</font><font color="#990000">1</font><font color="#000000">, hits.length</font><font color="#000000">())</font><font color="#000000">;</font><br /> | |
<font color="#ffffff"> </font><font color="#3f7f5f">// Iterate through the results:</font><br /> | |
<font color="#ffffff"> </font><font color="#7f0055"><b>for </b></font><font color="#000000">(</font><font color="#7f0055"><b>int </b></font><font color="#000000">i = </font><font color="#990000">0</font><font color="#000000">; i < hits.length</font><font color="#000000">()</font><font color="#000000">; i++</font><font color="#000000">) {</font><br /> | |
<font color="#ffffff"> </font><font color="#000000">Document hitDoc = hits.doc</font><font color="#000000">(</font><font color="#000000">i</font><font color="#000000">)</font><font color="#000000">;</font><br /> | |
<font color="#ffffff"> </font><font color="#000000">assertEquals</font><font color="#000000">(</font><font color="#2a00ff">"This is the text to be indexed."</font><font color="#000000">, hitDoc.get</font><font color="#000000">(</font><font color="#2a00ff">"fieldname"</font><font color="#000000">))</font><font color="#000000">;</font><br /> | |
<font color="#ffffff"> </font><font color="#000000">}</font><br /> | |
<font color="#ffffff"> </font><font color="#000000">isearcher.close</font><font color="#000000">()</font><font color="#000000">;</font><br /> | |
<font color="#ffffff"> </font><font color="#000000">directory.close</font><font color="#000000">()</font><font color="#000000">;</font></code> | |
</td> | |
<!-- end source code --> | |
</tr> | |
</table> | |
</div> | |
<!-- = END of automatically generated HTML code = --> | |
<!-- ======================================================== --> | |
<p>The Lucene API is divided into several packages:</p> | |
<ul> | |
<li> | |
<b><a href = "org/apache/lucene/analysis/package-summary.html">org.apache.lucene.analysis</a></b> | |
defines an abstract <a href = "org/apache/lucene/analysis/Analyzer.html">Analyzer</a> | |
API for converting text from a <a href = "http://java.sun.com//products/jdk/1.2/docs/api/java/io/Reader.html">java.io.Reader</a> | |
into a <a href = "org/apache/lucene/analysis/TokenStream.html">TokenStream</a>, | |
an enumeration of <a href = "org/apache/lucene/analysis/Token.html">Token</a>s. | |
A TokenStream is composed by applying <a href = "org/apache/lucene/analysis/TokenFilter.html">TokenFilter</a>s | |
to the output of a <a href = "org/apache/lucene/analysis/Tokenizer.html">Tokenizer</a>. | |
A few simple implemenations are provided, including <a href = "org/apache/lucene/analysis/StopAnalyzer.html">StopAnalyzer</a> | |
and the grammar-based <a href = "org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a>.</li> | |
<li> | |
<b><a href = "org/apache/lucene/document/package-summary.html">org.apache.lucene.document</a></b> | |
provides a simple <a href = "org/apache/lucene/document/Document.html">Document</a> | |
class. A document is simply a set of named <a href = "org/apache/lucene/document/Field.html">Field</a>s, | |
whose values may be strings or instances of <a href = "http://java.sun.com//products/jdk/1.2/docs/api/java/io/Reader.html">java.io.Reader</a>.</li> | |
<li> | |
<b><a href = "org/apache/lucene/index/package-summary.html">org.apache.lucene.index</a></b> | |
provides two primary classes: <a href = "org/apache/lucene/index/IndexWriter.html">IndexWriter</a>, | |
which creates and adds documents to indices; and <a href = "org/apache/lucene/index/IndexReader.html">IndexReader</a>, | |
which accesses the data in the index.</li> | |
<li> | |
<b><a href = "org/apache/lucene/search/package-summary.html">org.apache.lucene.search</a></b> | |
provides data structures to represent queries (<a href = "org/apache/lucene/search/TermQuery.html">TermQuery</a> | |
for individual words, <a href = "org/apache/lucene/search/PhraseQuery.html">PhraseQuery</a> | |
for phrases, and <a href = "org/apache/lucene/search/BooleanQuery.html">BooleanQuery</a> | |
for boolean combinations of queries) and the abstract <a href = "org/apache/lucene/search/Searcher.html">Searcher</a> | |
which turns queries into <a href = "org/apache/lucene/search/Hits.html">Hits</a>. | |
<a href = "org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a> | |
implements search over a single IndexReader.</li> | |
<li> | |
<b><a href = "org/apache/lucene/queryParser/package-summary.html">org.apache.lucene.queryParser</a></b> | |
uses <a href = "http://javacc.dev.java.net">JavaCC</a> to implement a | |
<a href = "org/apache/lucene/queryParser/QueryParser.html">QueryParser</a>.</li> | |
<li> | |
<b><a href = "org/apache/lucene/store/package-summary.html">org.apache.lucene.store</a></b> | |
defines an abstract class for storing persistent data, the <a href = "org/apache/lucene/store/Directory.html">Directory</a>, | |
a collection of named files written by an <a href = "org/apache/lucene/store/IndexOutput.html">IndexOutput</a> | |
and read by an <a href = "org/apache/lucene/store/IndexInput.html">IndexInput</a>. | |
Two implementations are provided, <a href = "org/apache/lucene/store/FSDirectory.html">FSDirectory</a>, | |
which uses a file system directory to store files, and <a href = "org/apache/lucene/store/RAMDirectory.html">RAMDirectory</a> | |
which implements files as memory-resident data structures.</li> | |
<li> | |
<b><a href = "org/apache/lucene/util/package-summary.html">org.apache.lucene.util</a></b> | |
contains a few handy data structures, e.g., <a href = "org/apache/lucene/util/BitVector.html">BitVector</a> | |
and <a href = "org/apache/lucene/util/PriorityQueue.html">PriorityQueue</a>.</li> | |
</ul> | |
To use Lucene, an application should: | |
<ol> | |
<li> | |
Create <a href = "org/apache/lucene/document/Document.html">Document</a>s by | |
adding | |
<a href = "org/apache/lucene/document/Field.html">Field</a>s;</li> | |
<li> | |
Create an <a href = "org/apache/lucene/index/IndexWriter.html">IndexWriter</a> | |
and add documents to it with <a href = "org/apache/lucene/index/IndexWriter.html#addDocument(org.apache.lucene.document.Document)">addDocument()</a>;</li> | |
<li> | |
Call <a href = "org/apache/lucene/queryParser/QueryParser.html#parse(java.lang.String)">QueryParser.parse()</a> | |
to build a query from a string; and</li> | |
<li> | |
Create an <a href = "org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a> | |
and pass the query to its <a href = "org/apache/lucene/search/Searcher.html#search(org.apache.lucene.search.Query)">search()</a> | |
method.</li> | |
</ol> | |
Some simple examples of code which does this are: | |
<ul> | |
<li> | |
<a href = "http://svn.apache.org//repos/asf/lucene/java/trunk/src/demo/org/apache/lucene/demo/FileDocument.java">FileDocument.java</a> contains | |
code to create a Document for a file.</li> | |
<li> | |
<a href = "http://svn.apache.org//repos/asf/lucene/java/trunk/src/demo/org/apache/lucene/demo/IndexFiles.java">IndexFiles.java</a> creates an | |
index for all the files contained in a directory.</li> | |
<li> | |
<a href = "http://svn.apache.org//repos/asf/lucene/java/trunk/src/demo/org/apache/lucene/demo/DeleteFiles.java">DeleteFiles.java</a> deletes some | |
of these files from the index.</li> | |
<li> | |
<a href = "http://svn.apache.org//repos/asf/lucene/java/trunk/src/demo/org/apache/lucene/demo/SearchFiles.java">SearchFiles.java</a> prompts for | |
queries and searches an index.</li> | |
</ul> | |
To demonstrate these, try something like: | |
<blockquote><tt>> <b>java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.IndexFiles rec.food.recipes/soups</b></tt> | |
<br><tt>adding rec.food.recipes/soups/abalone-chowder</tt> | |
<br><tt> </tt>[ ... ] | |
<p><tt>> <b>java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.SearchFiles</b></tt> | |
<br><tt>Query: <b>chowder</b></tt> | |
<br><tt>Searching for: chowder</tt> | |
<br><tt>34 total matching documents</tt> | |
<br><tt>1. rec.food.recipes/soups/spam-chowder</tt> | |
<br><tt> </tt>[ ... thirty-four documents contain the word "chowder" ... ] | |
<p><tt>Query: <b>"clam chowder" AND Manhattan</b></tt> | |
<br><tt>Searching for: +"clam chowder" +manhattan</tt> | |
<br><tt>2 total matching documents</tt> | |
<br><tt>1. rec.food.recipes/soups/clam-chowder</tt> | |
<br><tt> </tt>[ ... two documents contain the phrase "clam chowder" | |
and the word "manhattan" ... ] | |
<br> [ Note: "+" and "-" are canonical, but "AND", "OR" | |
and "NOT" may be used. ]</blockquote> | |
The <a href = "http://svn.apache.org//repos/asf/lucene/java/trunk/src/demo/org/apache/lucene/demo/IndexHTML.java">IndexHTML</a> demo is more sophisticated. | |
It incrementally maintains an index of HTML files, adding new files as | |
they appear, deleting old files as they disappear and re-indexing files | |
as they change. | |
<blockquote><tt>> <b>java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.IndexHTML -create java/jdk1.1.6/docs/relnotes</b></tt> | |
<br><tt>adding java/jdk1.1.6/docs/relnotes/SMICopyright.html</tt> | |
<br><tt> </tt>[ ... create an index containing all the relnotes ] | |
<p><tt>> <b>rm java/jdk1.1.6/docs/relnotes/smicopyright.html</b></tt> | |
<p><tt>> <b>java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.IndexHTML java/jdk1.1.6/docs/relnotes</b></tt> | |
<br><tt>deleting java/jdk1.1.6/docs/relnotes/SMICopyright.html</tt></blockquote> | |
</body> | |
</html> |