lucene/queryparser/src/java/overview.html - lucene-solr - Git at Google

 <!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
   -->
 <html>
   <head>
     <title>
       QueryParsers
     </title>
   </head>
   <body>
   Apache Lucene QueryParsers.
   <p>
   This module provides a number of queryparsers:
   <ul>
      <li><a href="#classic">Classic</a>
      <li><a href="#analyzing">Analyzing</a>
      <li><a href="#complexphrase">Complex Phrase</a>
      <li><a href="#extendable">Extendable</a>
      <li><a href="#flexible">Flexible</a>
      <li><a href="#surround">Surround</a>
      <li><a href="#xml">XML</a>
   </ul>
   <hr/>
   <h2><a name="classic">Classic</a></h2>
   A Simple Lucene QueryParser implemented with JavaCC.
   <h2><a name="analyzing">Analyzing</a></h2>
   QueryParser that passes Fuzzy-, Prefix-, Range-, and WildcardQuerys through the given analyzer.
   <h2><a name="complexphrase">Complex Phrase</a></h2>
   QueryParser which permits complex phrase query syntax eg "(john jon jonathan~) peters*"
   <h2><a name="extendable">Extendable</a></h2>
   Extendable QueryParser provides a simple and flexible extension mechanism by overloading query field names.
   <h2><a name="flexible">Flexible</a></h2>
 <p>
 This project contains the new Lucene query parser implementation, which matches the syntax of the core QueryParser but offers a more modular architecture to enable customization.
 </p>

 <p>
 It's currently divided in 2 main packages:
 <ul>
 <li>{@link org.apache.lucene.queryparser.flexible.core}: it contains the query parser API classes, which should be extended by query parser implementations. </li>
 <li>{@link org.apache.lucene.queryparser.flexible.standard}: it contains the current Lucene query parser implementation using the new query parser API.</li>
 </ul>
 </p>

 <h3>Features</h3>

     <ol>
         <li>Full support for boolean logic (not enabled)</li>
         <li>QueryNode Trees - support for several syntaxes,
             that can be converted into similar syntax QueryNode trees.</li>
         <li>QueryNode Processors - Optimize, validate, rewrite the
             QueryNode trees</li>
     <li>Processors Pipelines - Select your favorite Processor
         and build a processor pipeline, to implement the features you need</li>
         <li>Config Interfaces - Allow the consumer of the Query Parser to implement
             a diff Config Handler Objects to suite their needs.</li>
         <li>Standard Builders - convert QueryNode's into several lucene
             representations. Supported conversion is using a 2.4 compatible logic</li>
         <li>QueryNode tree's can be converted to a lucene 2.4 syntax string, using toQueryString</li>
     </ol>

 <h3>Design</h3>
 <p>
 This new query parser was designed to have very generic
 architecture, so that it can be easily used for different
 products with varying query syntaxes. This code is much more
 flexible and extensible than the Lucene query parser in 2.4.X.
 </p>
 <p>
 The new query parser  goal is to separate syntax and semantics of a query. E.g. 'a AND
 b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query.
 It distinguishes the semantics of the different query components, e.g.
 whether and how to tokenize/lemmatize/normalize the different terms or
 which Query objects to create for the terms. It allows to
 write a parser with a new syntax, while reusing the underlying
 semantics, as quickly as possible.
 </p>
 <p>
 The query parser has three layers and its core is what we call the
 QueryNode tree. It is a tree that initially represents the syntax of the
 original query, e.g. for 'a AND b':
 </p>
 <pre>
       AND
      /   \
     A     B
 </pre>
 <p>
 The three layers are:
 </p>
 <dl>
 <dt>QueryParser</dt>
 <dd>
 This layer is the text parsing layer which simply transforms the
 query text string into a {@link org.apache.lucene.queryparser.flexible.core.nodes.QueryNode} tree. Every text parser
 must implement the interface {@link org.apache.lucene.queryparser.flexible.core.parser.SyntaxParser}.
 Lucene default implementations implements it using JavaCC.
 </dd>

 <dt>QueryNodeProcessor</dt>
 <dd>The query node processors do most of the work. It is in fact a
 configurable chain of processors. Each processors can walk the tree and
 modify nodes or even the tree's structure. That makes it possible to
 e.g. do query optimization before the query is executed or to tokenize
 terms.
 </dd>

 <dt>QueryBuilder</dt>
 <dd>
 The third layer is a configurable map of builders, which map {@link org.apache.lucene.queryparser.flexible.core.nodes.QueryNode} types to its specific
 builder that will transform the QueryNode into Lucene Query object.
 </dd>

 </dl>

 <p>
 Furthermore, the query parser uses flexible configuration objects. It also uses message classes that
 allow to attach resource bundles. This makes it possible to translate
 messages, which is an important feature of a query parser.
 </p>
 <p>
 This design allows to develop different query syntaxes very quickly.
 </p>

 <h3>StandardQueryParser and QueryParserWrapper</h3>

 <p>
 The classic Lucene query parser is located under
 {@link org.apache.lucene.queryparser.classic}.
 <p>
 To make it simpler to use the new query parser
 the class {@link org.apache.lucene.queryparser.flexible.standard.StandardQueryParser} may be helpful,
 specially for people that do not want to extend the Query Parser.
 It uses the default Lucene query processors, text parser and builders, so
 you don't need to worry about dealing with those.

 {@link org.apache.lucene.queryparser.flexible.standard.StandardQueryParser} usage:

 <pre class="prettyprint">
       StandardQueryParser qpHelper = new StandardQueryParser();
       StandardQueryConfigHandler config =  qpHelper.getQueryConfigHandler();
       config.setAllowLeadingWildcard(true);
       config.setAnalyzer(new WhitespaceAnalyzer());
       Query query = qpHelper.parse("apache AND lucene", "defaultField");
 </pre>
 <h2><a name="surround">Surround</a></h2>
 <p>
 A QueryParser that supports the Span family of queries as well as pre and infix notation.
 </p>
 <h2><a name="xml">XML</a></h2>
 A QueryParser that produces Lucene Query objects from XML streams.
 <p>
 </p>
   </body>
 </html>
	<!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->
	<html>
	<head>
	<title>
	QueryParsers
	</title>
	</head>
	<body>
	Apache Lucene QueryParsers.
	<p>
	This module provides a number of queryparsers:
	<ul>
	<li><a href="#classic">Classic</a>
	<li><a href="#analyzing">Analyzing</a>
	<li><a href="#complexphrase">Complex Phrase</a>
	<li><a href="#extendable">Extendable</a>
	<li><a href="#flexible">Flexible</a>
	<li><a href="#surround">Surround</a>
	<li><a href="#xml">XML</a>
	</ul>
	<hr/>
	<h2><a name="classic">Classic</a></h2>
	A Simple Lucene QueryParser implemented with JavaCC.
	<h2><a name="analyzing">Analyzing</a></h2>
	QueryParser that passes Fuzzy-, Prefix-, Range-, and WildcardQuerys through the given analyzer.
	<h2><a name="complexphrase">Complex Phrase</a></h2>
	QueryParser which permits complex phrase query syntax eg "(john jon jonathan~) peters*"
	<h2><a name="extendable">Extendable</a></h2>
	Extendable QueryParser provides a simple and flexible extension mechanism by overloading query field names.
	<h2><a name="flexible">Flexible</a></h2>
	<p>
	This project contains the new Lucene query parser implementation, which matches the syntax of the core QueryParser but offers a more modular architecture to enable customization.
	</p>

	<p>
	It's currently divided in 2 main packages:
	<ul>
	<li>{@link org.apache.lucene.queryparser.flexible.core}: it contains the query parser API classes, which should be extended by query parser implementations. </li>
	<li>{@link org.apache.lucene.queryparser.flexible.standard}: it contains the current Lucene query parser implementation using the new query parser API.</li>
	</ul>
	</p>

	<h3>Features</h3>

	<ol>
	<li>Full support for boolean logic (not enabled)</li>
	<li>QueryNode Trees - support for several syntaxes,
	that can be converted into similar syntax QueryNode trees.</li>
	<li>QueryNode Processors - Optimize, validate, rewrite the
	QueryNode trees</li>
	<li>Processors Pipelines - Select your favorite Processor
	and build a processor pipeline, to implement the features you need</li>
	<li>Config Interfaces - Allow the consumer of the Query Parser to implement
	a diff Config Handler Objects to suite their needs.</li>
	<li>Standard Builders - convert QueryNode's into several lucene
	representations. Supported conversion is using a 2.4 compatible logic</li>
	<li>QueryNode tree's can be converted to a lucene 2.4 syntax string, using toQueryString</li>
	</ol>

	<h3>Design</h3>
	<p>
	This new query parser was designed to have very generic
	architecture, so that it can be easily used for different
	products with varying query syntaxes. This code is much more
	flexible and extensible than the Lucene query parser in 2.4.X.
	</p>
	<p>
	The new query parser goal is to separate syntax and semantics of a query. E.g. 'a AND
	b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query.
	It distinguishes the semantics of the different query components, e.g.
	whether and how to tokenize/lemmatize/normalize the different terms or
	which Query objects to create for the terms. It allows to
	write a parser with a new syntax, while reusing the underlying
	semantics, as quickly as possible.
	</p>
	<p>
	The query parser has three layers and its core is what we call the
	QueryNode tree. It is a tree that initially represents the syntax of the
	original query, e.g. for 'a AND b':
	</p>
	<pre>
	AND
	/ \
	A B
	</pre>
	<p>
	The three layers are:
	</p>
	<dl>
	<dt>QueryParser</dt>
	<dd>
	This layer is the text parsing layer which simply transforms the
	query text string into a {@link org.apache.lucene.queryparser.flexible.core.nodes.QueryNode} tree. Every text parser
	must implement the interface {@link org.apache.lucene.queryparser.flexible.core.parser.SyntaxParser}.
	Lucene default implementations implements it using JavaCC.
	</dd>

	<dt>QueryNodeProcessor</dt>
	<dd>The query node processors do most of the work. It is in fact a
	configurable chain of processors. Each processors can walk the tree and
	modify nodes or even the tree's structure. That makes it possible to
	e.g. do query optimization before the query is executed or to tokenize
	terms.
	</dd>

	<dt>QueryBuilder</dt>
	<dd>
	The third layer is a configurable map of builders, which map {@link org.apache.lucene.queryparser.flexible.core.nodes.QueryNode} types to its specific
	builder that will transform the QueryNode into Lucene Query object.
	</dd>

	</dl>

	<p>
	Furthermore, the query parser uses flexible configuration objects. It also uses message classes that
	allow to attach resource bundles. This makes it possible to translate
	messages, which is an important feature of a query parser.
	</p>
	<p>
	This design allows to develop different query syntaxes very quickly.
	</p>

	<h3>StandardQueryParser and QueryParserWrapper</h3>

	<p>
	The classic Lucene query parser is located under
	{@link org.apache.lucene.queryparser.classic}.
	<p>
	To make it simpler to use the new query parser
	the class {@link org.apache.lucene.queryparser.flexible.standard.StandardQueryParser} may be helpful,
	specially for people that do not want to extend the Query Parser.
	It uses the default Lucene query processors, text parser and builders, so
	you don't need to worry about dealing with those.

	{@link org.apache.lucene.queryparser.flexible.standard.StandardQueryParser} usage:

	<pre class="prettyprint">
	StandardQueryParser qpHelper = new StandardQueryParser();
	StandardQueryConfigHandler config = qpHelper.getQueryConfigHandler();
	config.setAllowLeadingWildcard(true);
	config.setAnalyzer(new WhitespaceAnalyzer());
	Query query = qpHelper.parse("apache AND lucene", "defaultField");
	</pre>
	<h2><a name="surround">Surround</a></h2>
	<p>
	A QueryParser that supports the Span family of queries as well as pre and infix notation.
	</p>
	<h2><a name="xml">XML</a></h2>
	A QueryParser that produces Lucene Query objects from XML streams.
	<p>
	</p>
	</body>
	</html>