Apache Lucene.NET is a high-performance, full-featured text search engine library. Here's a simple example how to use Lucene.NET for indexing and searching (using NUnit to check if the results are what we expect):
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); // Store the index in memory: Directory directory = new RAMDirectory(); // To store an index on disk, use this instead: // Construct a machine-independent path for the index //var basePath = Environment.GetFolderPath(Environment.SpecialFolder.CommonApplicationData); //var indexPath = Path.Combine(basePath, "index"); //Directory directory = FSDirectory.Open(indexPath); IndexWriterConfig config = new IndexWriterConfig(LuceneVersion.LUCENE_CURRENT, analyzer); using IndexWriter iwriter = new IndexWriter(directory, config); Document doc = new Document(); String text = "This is the text to be indexed."; doc.Add(new Field("fieldname", text, TextField.TYPE_STORED)); iwriter.AddDocument(doc); iwriter.Dispose(); // Now search the index: using DirectoryReader ireader = DirectoryReader.Open(directory); IndexSearcher isearcher = new IndexSearcher(ireader); // Parse a simple query that searches for "text": QueryParser parser = new QueryParser(LuceneVersion.LUCENE_CURRENT, "fieldname", analyzer); Query query = parser.Parse("text"); ScoreDoc[] hits = isearcher.Search(query, null, 1000).ScoreDocs; Assert.AreEqual(1, hits.Length); // Iterate through the results: for (int i = 0; i < hits.Length; i++) { Document hitDoc = isearcher.Doc(hits[i].Doc); Assert.AreEqual("This is the text to be indexed.", hitDoc.Get("fieldname")); }
The Lucene API is divided into several packages:
xref:Lucene.Net.Analysis defines an abstract Analyzer API for converting text from a System.Text.TextReader into a TokenStream, an enumeration of token Attributes. A TokenStream can be composed by applying TokenFilters to the output of a Tokenizer. Tokenizers and TokenFilters are strung together and applied with an Analyzer. Lucene.Net.Analysis.Common provides a number of Analyzer implementations, including StopAnalyzer and the grammar-based StandardAnalyzer.
xref:Lucene.Net.Codecs provides an abstraction over the encoding and decoding of the inverted index structure, as well as different implementations that can be chosen depending upon application needs.
xref:Lucene.Net.Documents provides a simple Document class. A Document is simply a set of named Fields, whose values may be strings or instances of System.Text.TextReader.
xref:Lucene.Net.Index provides two primary classes: IndexWriter, which creates and adds documents to indices; and IndexReader, which accesses the data in the index.
xref:Lucene.Net.Search provides data structures to represent queries (ie TermQuery for individual words, PhraseQuery for phrases, and BooleanQuery for boolean combinations of queries) and the IndexSearcher which turns queries into TopDocs. A number of QueryParsers are provided for producing query structures from strings or XML.
xref:Lucene.Net.Store defines an abstract class for storing persistent data, the Directory, which is a collection of named files written by an IndexOutput and read by an IndexInput. Multiple implementations are provided, including FSDirectory, which uses a file system directory to store files, and RAMDirectory which implements files as memory-resident data structures.
xref:Lucene.Net.Util contains a few handy data structures and util classes, ie OpenBitSet and PriorityQueue.
To use Lucene, an application should:
Create an IndexWriter and add documents to it with AddDocument();
Call QueryParser.Parse() to build a query from a string; and
Create an IndexSearcher and pass the query to its Search() method.
Some simple examples of code which does this are:
IndexFiles.cs creates an index for all the files contained in a directory.
SearchFiles.cs prompts for queries and searches an index.
[!TIP] These demos can be run and code viewed/exported using the lucene-cli dotnet tool.
To demonstrate this, try something like:
> dotnet demo index-files index rec.food.recipies/soups adding rec.food.recipes/soups/abalone-chowder [...] > dotnet demo search-files index Query: chowder Searching for: chowder 34 total matching documents 1. rec.food.recipes/soups/spam-chowder [ ... thirty-four documents contain the word "chowder" ... ] Query: "clam chowder" AND Manhattan Searching for: +"clam chowder" +manhattan 2 total matching documents 1. rec.food.recipes/soups/clam-chowder [ ... two documents contain the phrase "clam chowder" and the word "manhattan" ... ] [ Note: "+" and "-" are canonical, but "AND", "OR" and "NOT" may be used. ]