lucene/src/java/overview.html - lucene-solr - Git at Google

 <!--
  Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at

      http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
 -->
 <html>
 <!--
  Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at

      http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
 -->
 <head>
    <title>Apache Lucene API</title>
 </head>
 <body>

 <p>Apache Lucene is a high-performance, full-featured text search engine library.
 Here's a simple example how to use Lucene for indexing and searching (using JUnit
 to check if the results are what we expect):</p>

 <!-- code comes from org.apache.lucene.TestDemo: -->
 <!-- ======================================================== -->
 <!-- = Java Sourcecode to HTML automatically converted code = -->
 <!-- =   Java2Html Converter 5.0 [2006-03-04] by Markus Gebhard  markus@jave.de   = -->
 <!-- =     Further information: http://www.java2html.de     = -->
 <pre class="prettyprint">
     Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);

     // Store the index in memory:
     Directory directory = new RAMDirectory();
     // To store an index on disk, use this instead:
     //Directory directory = FSDirectory.open("/tmp/testindex");
     IndexWriter iwriter = new IndexWriter(directory, analyzer, true,
                                           new IndexWriter.MaxFieldLength(25000));
     Document doc = new Document();
     String text = "This is the text to be indexed.";
     doc.add(new Field("fieldname", text, Field.Store.YES,
         Field.Index.ANALYZED));
     iwriter.addDocument(doc);
     iwriter.close();

     // Now search the index:
     IndexSearcher isearcher = new IndexSearcher(directory, true); // read-only=true
     // Parse a simple query that searches for "text":
     QueryParser parser = new QueryParser("fieldname", analyzer);
     Query query = parser.parse("text");
     ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;
     assertEquals(1, hits.length);
     // Iterate through the results:
     for (int i = 0; i < hits.length; i++) {
       Document hitDoc = isearcher.doc(hits[i].doc);
       assertEquals("This is the text to be indexed.", hitDoc.get("fieldname"));
     }
     isearcher.close();
     directory.close();</pre>
 <!-- =       END of automatically generated HTML code       = -->
 <!-- ======================================================== -->


 <p>The Lucene API is divided into several packages:</p>

 <ul>
 <li>
 <b><a href="org/apache/lucene/analysis/package-summary.html">org.apache.lucene.analysis</a></b>
 defines an abstract <a href="org/apache/lucene/analysis/Analyzer.html">Analyzer</a>
 API for converting text from a <a href="http://java.sun.com/products/jdk/1.2/docs/api/java/io/Reader.html">java.io.Reader</a>
 into a <a href="org/apache/lucene/analysis/TokenStream.html">TokenStream</a>,
 an enumeration of token <a href="org/apache/lucene/util/Attribute.html">Attribute</a>s.&nbsp;
 A TokenStream can be composed by applying <a href="org/apache/lucene/analysis/TokenFilter.html">TokenFilter</a>s
 to the output of a <a href="org/apache/lucene/analysis/Tokenizer.html">Tokenizer</a>.&nbsp;
 Tokenizers and TokenFilters are strung together and applied with an <a href="org/apache/lucene/analysis/Analyzer.html">Analyzer</a>.&nbsp;
 A handful of Analyzer implementations are provided, including <a href="org/apache/lucene/analysis/StopAnalyzer.html">StopAnalyzer</a>
 and the grammar-based <a href="org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a>.</li>

 <li>
 <b><a href="org/apache/lucene/document/package-summary.html">org.apache.lucene.document</a></b>
 provides a simple <a href="org/apache/lucene/document/Document.html">Document</a>
 class.&nbsp; A Document is simply a set of named <a href="org/apache/lucene/document/Field.html">Field</a>s,
 whose values may be strings or instances of <a href="http://java.sun.com/products/jdk/1.2/docs/api/java/io/Reader.html">java.io.Reader</a>.</li>

 <li>
 <b><a href="org/apache/lucene/index/package-summary.html">org.apache.lucene.index</a></b>
 provides two primary classes: <a href="org/apache/lucene/index/IndexWriter.html">IndexWriter</a>,
 which creates and adds documents to indices; and <a href="org/apache/lucene/index/IndexReader.html">IndexReader</a>,
 which accesses the data in the index.</li>

 <li>
 <b><a href="org/apache/lucene/search/package-summary.html">org.apache.lucene.search</a></b>
 provides data structures to represent queries (ie <a href="org/apache/lucene/search/TermQuery.html">TermQuery</a>
 for individual words, <a href="org/apache/lucene/search/PhraseQuery.html">PhraseQuery</a>
 for phrases, and <a href="org/apache/lucene/search/BooleanQuery.html">BooleanQuery</a>
 for boolean combinations of queries) and the abstract <a href="org/apache/lucene/search/Searcher.html">Searcher</a>
 which turns queries into <a href="org/apache/lucene/search/TopDocs.html">TopDocs</a>.
 <a href="org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a>
 implements search over a single IndexReader.</li>

 <li>
 <b><a href="org/apache/lucene/queryParser/package-summary.html">org.apache.lucene.queryParser</a></b>
 uses <a href="http://javacc.dev.java.net">JavaCC</a> to implement a
 <a href="org/apache/lucene/queryParser/QueryParser.html">QueryParser</a>.</li>

 <li>
 <b><a href="org/apache/lucene/store/package-summary.html">org.apache.lucene.store</a></b>
 defines an abstract class for storing persistent data, the <a href="org/apache/lucene/store/Directory.html">Directory</a>,
 which is a collection of named files written by an <a href="org/apache/lucene/store/IndexOutput.html">IndexOutput</a>
 and read by an <a href="org/apache/lucene/store/IndexInput.html">IndexInput</a>.&nbsp;
 Multiple implementations are provided, including <a href="org/apache/lucene/store/FSDirectory.html">FSDirectory</a>,
 which uses a file system directory to store files, and <a href="org/apache/lucene/store/RAMDirectory.html">RAMDirectory</a>
 which implements files as memory-resident data structures.</li>

 <li>
 <b><a href="org/apache/lucene/util/package-summary.html">org.apache.lucene.util</a></b>
 contains a few handy data structures and util classes, ie <a href="org/apache/lucene/util/BitVector.html">BitVector</a>
 and <a href="org/apache/lucene/util/PriorityQueue.html">PriorityQueue</a>.</li>
 </ul>
 To use Lucene, an application should:
 <ol>
 <li>
 Create <a href="org/apache/lucene/document/Document.html">Document</a>s by
 adding
 <a href="org/apache/lucene/document/Field.html">Field</a>s;</li>

 <li>
 Create an <a href="org/apache/lucene/index/IndexWriter.html">IndexWriter</a>
 and add documents to it with <a href="org/apache/lucene/index/IndexWriter.html#addDocument(org.apache.lucene.document.Document)">addDocument()</a>;</li>

 <li>
 Call <a href="org/apache/lucene/queryParser/QueryParser.html#parse(java.lang.String)">QueryParser.parse()</a>
 to build a query from a string; and</li>

 <li>
 Create an <a href="org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a>
 and pass the query to its <a href="org/apache/lucene/search/Searcher.html#search(org.apache.lucene.search.Query)">search()</a>
 method.</li>
 </ol>
 Some simple examples of code which does this are:
 <ul>
 <li>
 &nbsp;<a href="http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/demo/src/java/org/apache/lucene/demo/IndexFiles.java">IndexFiles.java</a> creates an
 index for all the files contained in a directory.</li>

 <li>
 &nbsp;<a href="http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/demo/src/java/org/apache/lucene/demo/SearchFiles.java">SearchFiles.java</a> prompts for
 queries and searches an index.</li>
 </ul>
 To demonstrate these, try something like:
 <blockquote><tt>> <b>java -cp lucene.jar:lucene-demo.jar:lucene-analyzers-common.jar org.apache.lucene.demo.IndexFiles rec.food.recipes/soups</b></tt>
 <br><tt>adding rec.food.recipes/soups/abalone-chowder</tt>
 <br><tt>&nbsp; </tt>[ ... ]

 <p><tt>> <b>java -cp lucene.jar:lucene-demo.jar:lucene-analyzers-common.jar org.apache.lucene.demo.SearchFiles</b></tt>
 <br><tt>Query: <b>chowder</b></tt>
 <br><tt>Searching for: chowder</tt>
 <br><tt>34 total matching documents</tt>
 <br><tt>1. rec.food.recipes/soups/spam-chowder</tt>
 <br><tt>&nbsp; </tt>[ ... thirty-four documents contain the word "chowder" ... ]

 <p><tt>Query: <b>"clam chowder" AND Manhattan</b></tt>
 <br><tt>Searching for: +"clam chowder" +manhattan</tt>
 <br><tt>2 total matching documents</tt>
 <br><tt>1. rec.food.recipes/soups/clam-chowder</tt>
 <br><tt>&nbsp; </tt>[ ... two documents contain the phrase "clam chowder"
 and the word "manhattan" ... ]
 <br>&nbsp;&nbsp;&nbsp; [ Note: "+" and "-" are canonical, but "AND", "OR"
 and "NOT" may be used. ]</blockquote>

 </body>
 </html>
	<!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->
	<html>
	<!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->
	<head>
	<title>Apache Lucene API</title>
	</head>
	<body>

	<p>Apache Lucene is a high-performance, full-featured text search engine library.
	Here's a simple example how to use Lucene for indexing and searching (using JUnit
	to check if the results are what we expect):</p>

	<!-- code comes from org.apache.lucene.TestDemo: -->
	<!-- ======================================================== -->
	<!-- = Java Sourcecode to HTML automatically converted code = -->
	<!-- = Java2Html Converter 5.0 [2006-03-04] by Markus Gebhard markus@jave.de = -->
	<!-- = Further information: http://www.java2html.de = -->
	<pre class="prettyprint">
	Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);

	// Store the index in memory:
	Directory directory = new RAMDirectory();
	// To store an index on disk, use this instead:
	//Directory directory = FSDirectory.open("/tmp/testindex");
	IndexWriter iwriter = new IndexWriter(directory, analyzer, true,
	new IndexWriter.MaxFieldLength(25000));
	Document doc = new Document();
	String text = "This is the text to be indexed.";
	doc.add(new Field("fieldname", text, Field.Store.YES,
	Field.Index.ANALYZED));
	iwriter.addDocument(doc);
	iwriter.close();

	// Now search the index:
	IndexSearcher isearcher = new IndexSearcher(directory, true); // read-only=true
	// Parse a simple query that searches for "text":
	QueryParser parser = new QueryParser("fieldname", analyzer);
	Query query = parser.parse("text");
	ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;
	assertEquals(1, hits.length);
	// Iterate through the results:
	for (int i = 0; i < hits.length; i++) {
	Document hitDoc = isearcher.doc(hits[i].doc);
	assertEquals("This is the text to be indexed.", hitDoc.get("fieldname"));
	}
	isearcher.close();
	directory.close();</pre>
	<!-- = END of automatically generated HTML code = -->
	<!-- ======================================================== -->



	<p>The Lucene API is divided into several packages:</p>

	<ul>
	<li>
	<b><a href="org/apache/lucene/analysis/package-summary.html">org.apache.lucene.analysis</a></b>
	defines an abstract <a href="org/apache/lucene/analysis/Analyzer.html">Analyzer</a>
	API for converting text from a <a href="http://java.sun.com/products/jdk/1.2/docs/api/java/io/Reader.html">java.io.Reader</a>
	into a <a href="org/apache/lucene/analysis/TokenStream.html">TokenStream</a>,
	an enumeration of token <a href="org/apache/lucene/util/Attribute.html">Attribute</a>s.
	A TokenStream can be composed by applying <a href="org/apache/lucene/analysis/TokenFilter.html">TokenFilter</a>s
	to the output of a <a href="org/apache/lucene/analysis/Tokenizer.html">Tokenizer</a>.
	Tokenizers and TokenFilters are strung together and applied with an <a href="org/apache/lucene/analysis/Analyzer.html">Analyzer</a>.
	A handful of Analyzer implementations are provided, including <a href="org/apache/lucene/analysis/StopAnalyzer.html">StopAnalyzer</a>
	and the grammar-based <a href="org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a>.</li>

	<li>
	<b><a href="org/apache/lucene/document/package-summary.html">org.apache.lucene.document</a></b>
	provides a simple <a href="org/apache/lucene/document/Document.html">Document</a>
	class.  A Document is simply a set of named <a href="org/apache/lucene/document/Field.html">Field</a>s,
	whose values may be strings or instances of <a href="http://java.sun.com/products/jdk/1.2/docs/api/java/io/Reader.html">java.io.Reader</a>.</li>

	<li>
	<b><a href="org/apache/lucene/index/package-summary.html">org.apache.lucene.index</a></b>
	provides two primary classes: <a href="org/apache/lucene/index/IndexWriter.html">IndexWriter</a>,
	which creates and adds documents to indices; and <a href="org/apache/lucene/index/IndexReader.html">IndexReader</a>,
	which accesses the data in the index.</li>

	<li>
	<b><a href="org/apache/lucene/search/package-summary.html">org.apache.lucene.search</a></b>
	provides data structures to represent queries (ie <a href="org/apache/lucene/search/TermQuery.html">TermQuery</a>
	for individual words, <a href="org/apache/lucene/search/PhraseQuery.html">PhraseQuery</a>
	for phrases, and <a href="org/apache/lucene/search/BooleanQuery.html">BooleanQuery</a>
	for boolean combinations of queries) and the abstract <a href="org/apache/lucene/search/Searcher.html">Searcher</a>
	which turns queries into <a href="org/apache/lucene/search/TopDocs.html">TopDocs</a>.
	<a href="org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a>
	implements search over a single IndexReader.</li>

	<li>
	<b><a href="org/apache/lucene/queryParser/package-summary.html">org.apache.lucene.queryParser</a></b>
	uses <a href="http://javacc.dev.java.net">JavaCC</a> to implement a
	<a href="org/apache/lucene/queryParser/QueryParser.html">QueryParser</a>.</li>

	<li>
	<b><a href="org/apache/lucene/store/package-summary.html">org.apache.lucene.store</a></b>
	defines an abstract class for storing persistent data, the <a href="org/apache/lucene/store/Directory.html">Directory</a>,
	which is a collection of named files written by an <a href="org/apache/lucene/store/IndexOutput.html">IndexOutput</a>
	and read by an <a href="org/apache/lucene/store/IndexInput.html">IndexInput</a>.
	Multiple implementations are provided, including <a href="org/apache/lucene/store/FSDirectory.html">FSDirectory</a>,
	which uses a file system directory to store files, and <a href="org/apache/lucene/store/RAMDirectory.html">RAMDirectory</a>
	which implements files as memory-resident data structures.</li>

	<li>
	<b><a href="org/apache/lucene/util/package-summary.html">org.apache.lucene.util</a></b>
	contains a few handy data structures and util classes, ie <a href="org/apache/lucene/util/BitVector.html">BitVector</a>
	and <a href="org/apache/lucene/util/PriorityQueue.html">PriorityQueue</a>.</li>
	</ul>
	To use Lucene, an application should:
	<ol>
	<li>
	Create <a href="org/apache/lucene/document/Document.html">Document</a>s by
	adding
	<a href="org/apache/lucene/document/Field.html">Field</a>s;</li>

	<li>
	Create an <a href="org/apache/lucene/index/IndexWriter.html">IndexWriter</a>
	and add documents to it with <a href="org/apache/lucene/index/IndexWriter.html#addDocument(org.apache.lucene.document.Document)">addDocument()</a>;</li>

	<li>
	Call <a href="org/apache/lucene/queryParser/QueryParser.html#parse(java.lang.String)">QueryParser.parse()</a>
	to build a query from a string; and</li>

	<li>
	Create an <a href="org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a>
	and pass the query to its <a href="org/apache/lucene/search/Searcher.html#search(org.apache.lucene.search.Query)">search()</a>
	method.</li>
	</ol>
	Some simple examples of code which does this are:
	<ul>
	<li>
	<a href="http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/demo/src/java/org/apache/lucene/demo/IndexFiles.java">IndexFiles.java</a> creates an
	index for all the files contained in a directory.</li>

	<li>
	<a href="http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/demo/src/java/org/apache/lucene/demo/SearchFiles.java">SearchFiles.java</a> prompts for
	queries and searches an index.</li>
	</ul>
	To demonstrate these, try something like:
	<blockquote><tt>> <b>java -cp lucene.jar:lucene-demo.jar:lucene-analyzers-common.jar org.apache.lucene.demo.IndexFiles rec.food.recipes/soups</b></tt>
	<br><tt>adding rec.food.recipes/soups/abalone-chowder</tt>
	<br><tt>  </tt>[ ... ]

	<p><tt>> <b>java -cp lucene.jar:lucene-demo.jar:lucene-analyzers-common.jar org.apache.lucene.demo.SearchFiles</b></tt>
	<br><tt>Query: <b>chowder</b></tt>
	<br><tt>Searching for: chowder</tt>
	<br><tt>34 total matching documents</tt>
	<br><tt>1. rec.food.recipes/soups/spam-chowder</tt>
	<br><tt>  </tt>[ ... thirty-four documents contain the word "chowder" ... ]

	<p><tt>Query: <b>"clam chowder" AND Manhattan</b></tt>
	<br><tt>Searching for: +"clam chowder" +manhattan</tt>
	<br><tt>2 total matching documents</tt>
	<br><tt>1. rec.food.recipes/soups/clam-chowder</tt>
	<br><tt>  </tt>[ ... two documents contain the phrase "clam chowder"
	and the word "manhattan" ... ]
	<br>    [ Note: "+" and "-" are canonical, but "AND", "OR"
	and "NOT" may be used. ]</blockquote>

	</body>
	</html>