| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| <html> |
| <head> |
| <title> |
| QueryParsers |
| </title> |
| </head> |
| <body> |
| Apache Lucene QueryParsers. |
| <p> |
| This module provides a number of queryparsers: |
| <ul> |
| <li><a href="#classic">Classic</a> |
| <li><a href="#analyzing">Analyzing</a> |
| <li><a href="#complexphrase">Complex Phrase</a> |
| <li><a href="#extendable">Extendable</a> |
| <li><a href="#flexible">Flexible</a> |
| <li><a href="#surround">Surround</a> |
| <li><a href="#xml">XML</a> |
| </ul> |
| <hr/> |
| <h2><a name="classic">Classic</a></h2> |
| A Simple Lucene QueryParser implemented with JavaCC. |
| <h2><a name="analyzing">Analyzing</a></h2> |
| QueryParser that passes Fuzzy-, Prefix-, Range-, and WildcardQuerys through the given analyzer. |
| <h2><a name="complexphrase">Complex Phrase</a></h2> |
| QueryParser which permits complex phrase query syntax eg "(john jon jonathan~) peters*" |
| <h2><a name="extendable">Extendable</a></h2> |
| Extendable QueryParser provides a simple and flexible extension mechanism by overloading query field names. |
| <h2><a name="flexible">Flexible</a></h2> |
| <p> |
| This project contains the new Lucene query parser implementation, which matches the syntax of the core QueryParser but offers a more modular architecture to enable customization. |
| </p> |
| |
| <p> |
| It's currently divided in 2 main packages: |
| <ul> |
| <li>{@link org.apache.lucene.queryparser.flexible.core}: it contains the query parser API classes, which should be extended by query parser implementations. </li> |
| <li>{@link org.apache.lucene.queryparser.flexible.standard}: it contains the current Lucene query parser implementation using the new query parser API.</li> |
| </ul> |
| </p> |
| |
| <h3>Features</h3> |
| |
| <ol> |
| <li>Full support for boolean logic (not enabled)</li> |
| <li>QueryNode Trees - support for several syntaxes, |
| that can be converted into similar syntax QueryNode trees.</li> |
| <li>QueryNode Processors - Optimize, validate, rewrite the |
| QueryNode trees</li> |
| <li>Processors Pipelines - Select your favorite Processor |
| and build a processor pipeline, to implement the features you need</li> |
| <li>Config Interfaces - Allow the consumer of the Query Parser to implement |
| a diff Config Handler Objects to suite their needs.</li> |
| <li>Standard Builders - convert QueryNode's into several lucene |
| representations. Supported conversion is using a 2.4 compatible logic</li> |
| <li>QueryNode tree's can be converted to a lucene 2.4 syntax string, using toQueryString</li> |
| </ol> |
| |
| <h3>Design</h3> |
| <p> |
| This new query parser was designed to have very generic |
| architecture, so that it can be easily used for different |
| products with varying query syntaxes. This code is much more |
| flexible and extensible than the Lucene query parser in 2.4.X. |
| </p> |
| <p> |
| The new query parser goal is to separate syntax and semantics of a query. E.g. 'a AND |
| b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. |
| It distinguishes the semantics of the different query components, e.g. |
| whether and how to tokenize/lemmatize/normalize the different terms or |
| which Query objects to create for the terms. It allows to |
| write a parser with a new syntax, while reusing the underlying |
| semantics, as quickly as possible. |
| </p> |
| <p> |
| The query parser has three layers and its core is what we call the |
| QueryNode tree. It is a tree that initially represents the syntax of the |
| original query, e.g. for 'a AND b': |
| </p> |
| <pre> |
| AND |
| / \ |
| A B |
| </pre> |
| <p> |
| The three layers are: |
| </p> |
| <dl> |
| <dt>QueryParser</dt> |
| <dd> |
| This layer is the text parsing layer which simply transforms the |
| query text string into a {@link org.apache.lucene.queryparser.flexible.core.nodes.QueryNode} tree. Every text parser |
| must implement the interface {@link org.apache.lucene.queryparser.flexible.core.parser.SyntaxParser}. |
| Lucene default implementations implements it using JavaCC. |
| </dd> |
| |
| <dt>QueryNodeProcessor</dt> |
| <dd>The query node processors do most of the work. It is in fact a |
| configurable chain of processors. Each processors can walk the tree and |
| modify nodes or even the tree's structure. That makes it possible to |
| e.g. do query optimization before the query is executed or to tokenize |
| terms. |
| </dd> |
| |
| <dt>QueryBuilder</dt> |
| <dd> |
| The third layer is a configurable map of builders, which map {@link org.apache.lucene.queryparser.flexible.core.nodes.QueryNode} types to its specific |
| builder that will transform the QueryNode into Lucene Query object. |
| </dd> |
| |
| </dl> |
| |
| <p> |
| Furthermore, the query parser uses flexible configuration objects. It also uses message classes that |
| allow to attach resource bundles. This makes it possible to translate |
| messages, which is an important feature of a query parser. |
| </p> |
| <p> |
| This design allows to develop different query syntaxes very quickly. |
| </p> |
| |
| <h3>StandardQueryParser and QueryParserWrapper</h3> |
| |
| <p> |
| The classic Lucene query parser is located under |
| {@link org.apache.lucene.queryparser.classic}. |
| <p> |
| To make it simpler to use the new query parser |
| the class {@link org.apache.lucene.queryparser.flexible.standard.StandardQueryParser} may be helpful, |
| specially for people that do not want to extend the Query Parser. |
| It uses the default Lucene query processors, text parser and builders, so |
| you don't need to worry about dealing with those. |
| |
| {@link org.apache.lucene.queryparser.flexible.standard.StandardQueryParser} usage: |
| |
| <pre class="prettyprint"> |
| StandardQueryParser qpHelper = new StandardQueryParser(); |
| StandardQueryConfigHandler config = qpHelper.getQueryConfigHandler(); |
| config.setAllowLeadingWildcard(true); |
| config.setAnalyzer(new WhitespaceAnalyzer()); |
| Query query = qpHelper.parse("apache AND lucene", "defaultField"); |
| </pre> |
| <h2><a name="surround">Surround</a></h2> |
| <p> |
| A QueryParser that supports the Span family of queries as well as pre and infix notation. |
| </p> |
| <h2><a name="xml">XML</a></h2> |
| A QueryParser that produces Lucene Query objects from XML streams. |
| <p> |
| </p> |
| </body> |
| </html> |