| <!doctype html> |
| <html> |
| <head> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE- 2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| <link href="/theme/css/lucene/global.css?v=0e493d7a" rel="stylesheet" type="text/css"> |
| |
| <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> |
| <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"/> |
| <meta name="Distribution" content="Global"/> |
| <meta name="Robots" content="index,follow"/> |
| |
| <script type="text/javascript" src="/theme/javascript/lucene/prototype.js?v=0e493d7a"></script> |
| <script type="text/javascript" src="/theme/javascript/lucene/effects.js?v=0e493d7a"></script> |
| <script type="text/javascript" src="/theme/javascript/lucene/slides.js?v=0e493d7a"></script> |
| <script src="https://www.apachecon.com/event-images/snippet.js"></script> <title>Apache Lucene - Features</title> |
| <meta name="keywords" |
| content="apache, apache lucene, apache solr, solr, lucene |
| search, information retrieval, spell checking, faceting, inverted index, |
| open source"/> <meta property="og:type" content="website" /> |
| <meta property="og:url" content="https://lucene.apache.org/pylucene/features.html"/> |
| <meta property="og:title" content="Features"/> |
| <meta property="og:description" content="Warning Before calling any PyLucene API that requires the Java VM, start it by calling initVM(classpath, ...). More about this..."/> |
| <meta property="og:image" content="https://lucene.apache.org/theme/images/lucene/lucene_og_image.png?v=0e493d7a"/> |
| <meta property="og:image:secure_url" content="https://lucene.apache.org/theme/images/lucene/lucene_og_image.png?v=0e493d7a"/> |
| <link rel="shortcut icon" type="image/png" |
| href="/theme/images/lucene/lucene-favicon.png?v=0e493d7a"/><link href="/theme/css/lucene/pylucene.css?v=0e493d7a" rel="stylesheet" type="text/css"> |
| </head> |
| <body id="home"> |
| <div id="wrap"> |
| <div id="header"> |
| <div id="logo" style="float:left"> |
| <a href="/"> |
| <img border="0" src="/theme/images/lucene/lucene_logo_green_300.png?v=0e493d7a" alt="Lucene Logo"/> |
| </a> |
| </div> |
| <!-- TODO: Search disabled as it does not work, 2021-02-21 |
| <div id="search" style="float:right;zoom:1"> |
| <form id="quick-search" method="GET" action="https://sematext.com/opensee/lucene" name="searchform"> |
| <fieldset> |
| <input type="search" id="q" name="q" placeholder="Search with Apache Solr..." class="class1 class2 hint" accesskey="q"> |
| </fieldset> |
| </form> |
| </div>--> |
| <div id="nav"> |
| <ul> |
| <li><a href="/pylucene/index.html">PyLucene</a></li> |
| <li><a href="/pylucene/news.html">News</a></li> |
| <li><a href="/pylucene/jcc/index.html">JCC</a></li> |
| <li><a href="https://issues.apache.org/jira/browse/PYLUCENE">Issue Tracker</a></li> |
| <li><a href="/pylucene/mailing-lists.html">Mailing Lists</a></li> |
| <li><a class="last" href="/">Lucene TLP</a></li> |
| </ul> |
| </div> |
| |
| </div> <!-- End #header --> |
| |
| <div id="content-wrap" class="clearfix"> |
| <div id="main"> |
| <div> |
| <h1 class="title">Features</h1> |
| <h2 id="warning">Warning</h2> |
| <p>Before calling any PyLucene API that requires the Java VM, start it by |
| calling <em>initVM(classpath, ...)</em>. More about this function in <a href="jcc/features.html">here</a>.</p> |
| <h2 id="installing-pylucene">Installing PyLucene</h2> |
| <p>PyLucene is a Python extension built with <a href="jcc/">JCC</a>.</p> |
| <p>To build PyLucene, JCC needs to be built first. Sources for JCC are |
| included with the PyLucene sources. Instructions for building and |
| installing JCC are <a href="jcc/install.html">here</a>.</p> |
| <p>Instructions for building PyLucene are <a href="install.html">here</a>.</p> |
| <h2 id="api-documentation">API documentation</h2> |
| <p>PyLucene is closely tracking Java |
| Lucene™ releases. |
| It intends to supports the entire Lucene API.</p> |
| <p>PyLucene also includes a number of Lucene contrib packages: the Snowball analyzer |
| and stemmers, the highlighter package, analyzers for other languages than English, |
| regular expression queries, specialized queries such as 'more like this' and more.</p> |
| <p>This document only covers the pythonic extensions to Lucene offered |
| by PyLucene as well as some differences between the Java and Python |
| APIs. For the documentation on Java Lucene APIs, |
| see <a href="https://lucene.apache.org/java/docs/api/index.html">here</a>.</p> |
| <p>To help with debugging and to support some Lucene APIs, PyLucene also |
| exposes some Java runtime APIs.</p> |
| <h2 id="samples">Samples</h2> |
| <p>The best way to learn PyLucene is to look at the samples and tests included with |
| the PyLucene source release or on the web at:</p> |
| <ul> |
| <li><a href="https://svn.apache.org/viewvc/lucene/pylucene/trunk/samples">https://svn.apache.org/viewvc/lucene/pylucene/trunk/samples</a></li> |
| <li><a href="https://svn.apache.org/viewvc/lucene/pylucene/trunk/test3">https://svn.apache.org/viewvc/lucene/pylucene/trunk/test3</a></li> |
| </ul> |
| <h2 id="threading-support-with-attachcurrentthread">Threading support with attachCurrentThread</h2> |
| <p>Before PyLucene APIs can be used from a thread other than the main thread that was |
| not created by the Java Runtime, the <em>attachCurrentThread()</em> method must be |
| called on the <em>JCCEnv</em> object returned by the <em>initVM()</em> or <em>getVMEnv()</em> functions.</p> |
| <h2 id="exception-handling-with-lucenejavaerror">Exception handling with lucene.JavaError</h2> |
| <p>Java exceptions are caught at the language barrier and reported to Python by raising |
| a JavaError instance whose args tuple contains the actual Java Exception instance.</p> |
| <h2 id="handling-java-arrays">Handling Java arrays</h2> |
| <p>Java arrays are returned to Python in a <em>JArray</em> wrapper instance that |
| implements the Python sequence protocol. It is possible to change array elements |
| but not to change the array size.</p> |
| <p>A few Lucene APIs take array arguments and expect values to be returned in them. |
| To call such an API and be able to retrieve the array values after the call, a |
| Java array needs to instantiated first.<br/> For example, accessing termDocs:</p> |
| <div class="highlight"><pre><span></span><code><span class="err">termDocs = reader.termDocs(Term("isbn", isbn))<br/></span> |
| <span class="err">docs = JArray('int')(1) # allocate an int[1] array<br/></span> |
| <span class="err">freq = JArray('int')(1) # allocate an int[1] array<br/></span> |
| <span class="err">if termDocs.read(docs, freq) == 1:<br/></span> |
| <span class="err">&nbsp;&nbsp;bits.set(docs[0]) # access the array's first element<br/></span> |
| </code></pre></div> |
| |
| <p>In addition to <em>int</em>, the <em>JArray</em> function accepts <em>object</em>, <em>string</em>, |
| <em>bool</em>, <em>byte</em>, <em>char</em>, <em>double</em>, <em>float</em>, <em>long</em> and <em>short</em> to create an array |
| of the corresponding type. The <em>JArray('object')</em> constructor takes a second |
| argument denoting the class of the object elements. This argument is optional and |
| defaults to Object.</p> |
| <p>To convert a char array to a Python string use a <em>''.join(array)</em> construct.</p> |
| <p>Instead of an integer denoting the size of the desired Java array, a sequence of |
| objects of the expected element type may be passed in to the array constructor.<br/> |
| For example:</p> |
| <div class="highlight"><pre><span></span><code><span class="err">\# creating a Java array of double from the [1.5, 2.5] list<br/></span> |
| <span class="err">JArray('double')([1.5, 2.5])<br/></span> |
| </code></pre></div> |
| |
| <p>All methods that expect an array also accept a sequence of Python objects of the |
| expected element type. If no values are expected from the array arguments after |
| the call, it is hence not necessary to instantiate a Java array to make such calls.</p> |
| <p>See <a href="jcc/features.html">JCC</a> for more information about handling arrays.</p> |
| <h2 id="differences-between-the-java-lucene-and-pylucene-apis">Differences between the Java Lucene and PyLucene APIs</h2> |
| <ul> |
| <li> |
| <p>The PyLucene API exposes all Java Lucene classes in a flat namespace in the |
| PyLucene module. For example, the Java import statement |
| <code>import org.apache.lucene.index.IndexReader;</code> corresponds to the Python import |
| statement <code>from lucene import IndexReader</code></p> |
| </li> |
| <li> |
| <p>Downcasting is a common operation in Java but not a concept in Python. Because |
| the wrapper objects implementing exactly the APIs of the declared type of the |
| wrapped object, all classes implement two class methods called instance_ and |
| cast_ that verify and cast an instance respectively.</p> |
| </li> |
| </ul> |
| <h2 id="phythonic-extensions-to-the-java-lucene-apis">Phythonic extensions to the Java Lucene APIs</h2> |
| <p>Java is a very verbose language. Python, on the other hand, offers many |
| syntactically attractive constructs for iteration, property access, etc... As |
| the Java Lucene samples from the <em>Lucene in Action</em> book were ported to Python, |
| PyLucene received a number of pythonic extensions listed here:</p> |
| <ul> |
| <li>Iterating search hits is a very common operation. Hits instances are iterable |
| in Python. Two values are returned for each iteration, the zero-based number of |
| the document in the Hits instance and the document instance itself.<br/> |
| The Java loop:</li> |
| </ul> |
| <div class="highlight"><pre><span></span><code><span class="err">for (int i = 0; i &lt; hits.length(); i++) {<br/></span> |
| <span class="err">&nbsp;&nbsp;Document doc = hits.doc(i);<br/></span> |
| <span class="err">&nbsp;&nbsp;System.out.println(hits.score(i) + " : " + doc.get("title"));<br/></span> |
| <span class="err">}<br/></span> |
| </code></pre></div> |
| |
| <p>can be written in Python:</p> |
| <div class="highlight"><pre><span></span><code><span class="err">for hit in hits:<br/></span> |
| <span class="err">&nbsp;&nbsp;hit = Hit.cast_(hit)<br/></span> |
| <span class="err">&nbsp;&nbsp;print hit.getScore(), ':', hit.getDocument['title']<br/></span> |
| </code></pre></div> |
| |
| <p>if hit.iterator()'s next() method were declared to return <em>Hit</em> instead of |
| <em>Object</em>, the above cast_() call would not be unnecessary.<br/> The same java |
| loop can also be written:</p> |
| <div class="highlight"><pre><span></span><code><span class="k">for</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="n">xrange</span><span class="p">(</span><span class="nf">len</span><span class="p">(</span><span class="n">hits</span><span class="p">))</span><span class="err">:</span><span class="o"><</span><span class="n">br</span><span class="o">/></span><span class="w"></span> |
| <span class="o">&</span><span class="n">nbsp</span><span class="p">;</span><span class="o">&</span><span class="n">nbsp</span><span class="p">;</span><span class="k">print</span><span class="w"> </span><span class="n">hits</span><span class="p">.</span><span class="n">score</span><span class="p">(</span><span class="n">i</span><span class="p">),</span><span class="w"> </span><span class="s1">':'</span><span class="p">,</span><span class="w"> </span><span class="n">hits</span><span class="o">[</span><span class="n">i</span><span class="o">][</span><span class="n">'title'</span><span class="o">]<</span><span class="n">br</span><span class="o">/></span><span class="w"></span> |
| </code></pre></div> |
| |
| <ul> |
| <li>Hits instances partially implement the Python 'sequence' protocol.<br/> |
| The Java expressions:</li> |
| </ul> |
| <div class="highlight"><pre><span></span><code><span class="err">hits.length();<br/></span> |
| <span class="err">doc = hits.get(i);<br/></span> |
| </code></pre></div> |
| |
| <p>are better written in Python:</p> |
| <div class="highlight"><pre><span></span><code><span class="nf">len</span><span class="p">(</span><span class="n">hits</span><span class="p">)</span><span class="o"><</span><span class="n">br</span><span class="o">/></span><span class="w"></span> |
| <span class="n">doc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">hits</span><span class="o">[</span><span class="n">i</span><span class="o">]<</span><span class="n">br</span><span class="o">/></span><span class="w"></span> |
| </code></pre></div> |
| |
| <ul> |
| <li>Document instances have fields whose values can be accessed through the mapping |
| protocol.<br/> The Java expression:</li> |
| </ul> |
| <div class="highlight"><pre><span></span><code><span class="err">doc.get("title")</span> |
| </code></pre></div> |
| |
| <p>is better written in Python:</p> |
| <div class="highlight"><pre><span></span><code><span class="err">doc['title']</span> |
| </code></pre></div> |
| |
| <ul> |
| <li>Document instances can be iterated over for their fields.<br/> The Java loop:</li> |
| </ul> |
| <div class="highlight"><pre><span></span><code><span class="err">Enumeration fields = doc.getFields();<br/></span> |
| <span class="err">while (fields.hasMoreElements()) {<br/></span> |
| <span class="err">&nbsp;&nbsp;Field field = (Field) fields.nextElement();<br/></span> |
| <span class="err">&nbsp;&nbsp;...<br/></span> |
| <span class="err">}<br/></span> |
| </code></pre></div> |
| |
| <p>is better written in Python:</p> |
| <div class="highlight"><pre><span></span><code><span class="err">for field in doc.getFields():<br/></span> |
| <span class="err">&nbsp;&nbsp;field = Field.cast_(field)<br/></span> |
| <span class="err">&nbsp;&nbsp;...<br/></span> |
| </code></pre></div> |
| |
| <p>Once JCC heeds Java 1.5 type parameters and once Java Lucene makes use of them, |
| such casting should become unnecessary</p> |
| <h2 id="extending-java-lucene-classes-from-python">Extending Java Lucene classes from Python</h2> |
| <p>Many areas of the Lucene API expect the programmer to provide their own implementation |
| or specialization of a feature where the default is inappropriate. For example, |
| text analyzers and tokenizers are an area where many parameters and environmental |
| or cultural factors are calling for customization.</p> |
| <p>PyLucene enables this by providing Java extension points listed below that serve |
| s proxies for Java to call back into the Python implementations of these customizations.</p> |
| <p>These extension points are simple Java classes that JCC generates the native C++ |
| implementations for. It is easy to add more such extensions classes into the |
| 'java' directory of the PyLucene source tree.</p> |
| <p>To learn more about this topic, please refer to the JCC <a href="jcc/features.html">documentation</a>.</p> |
| <p>Please refer to the classes in the 'java' tree for currently available extension |
| points. Examples of uses of these extension points are to be found in PyLucene's |
| unit tests.</p> |
| </div> |
| </div> |
| <div id="sidebar"> |
| <div class="button-wrapper" style="margin-top: 40px;"> |
| <div class="button-green"> |
| <a href="https://www.apache.org/dyn/closer.lua/lucene/pylucene/">Download</a> |
| <div class="flap top">Click to begin</div> |
| <div class="flap bottom">of Apache PyLucene</div> |
| </div> |
| </div> |
| |
| <h1 id="documentation">Documentation<a class="headerlink" href="#documentation" title="Permanent link">¶</a></h1> |
| <ul> |
| <li><a href="https://www.apache.org/licenses/">License</a></li> |
| <li><a href="/pylucene/features.html">Features</a></li> |
| <li><a href="/pylucene/install.html">Install</a></li> |
| </ul> |
| |
| <h1 id="events">Events<a class="headerlink" href="#events" title="Permanent link">¶</a></h1> |
| <ul> |
| <a class="acevent" data-format="square" data-mode="light" data-width="160" data-style="border: 1px solid lightgrey"></a> |
| </ul> |
| |
| <h1 id="asf-links">ASF links<a class="headerlink" href="#asf-links" title="Permanent link">¶</a></h1> |
| <ul> |
| <li><a href="https://www.apache.org">Apache Software Foundation</a></li> |
| <li><a href="https://www.apache.org/foundation/thanks.html">Thanks</a></li> |
| <li><a href="https://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li> |
| <li><a href="https://www.apache.org/security/">Security</a></li> |
| </ul> |
| |
| <h1 id="related-projects">Related Projects<a class="headerlink" href="#related-projects" title="Permanent link">¶</a></h1> |
| <ul> |
| <li><a href="https://solr.apache.org">Apache Solr</a></li> |
| <li><a href="http://hadoop.apache.org">Apache Hadoop</a></li> |
| <li><a href="http://manifoldcf.apache.org/">Apache ManifoldCF</a></li> |
| <li><a href="http://lucenenet.apache.org/">Apache Lucene.Net</a></li> |
| <li><a href="http://mahout.apache.org">Apache Mahout</a></li> |
| <li><a href="http://nutch.apache.org">Apache Nutch</a></li> |
| <li><a href="http://opennlp.apache.org/">Apache OpenNLP</a></li> |
| <li><a href="http://tika.apache.org">Apache Tika</a></li> |
| <li><a href="http://zookeeper.apache.org">Apache Zookeeper</a></li> |
| </ul> </div> |
| </div> <!-- End #content-wrap --> |
| |
| <div id="footer"> |
| <div class="copyright"> |
| <p> |
| Copyright © 2011-2024 The Apache Software Foundation, Licensed under |
| the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. <a href="/privacy.html">Privacy Policy</a> <br/> |
| Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Apache Lucene, Apache Solr and their |
| respective logos are trademarks of the Apache Software Foundation. Please see the <a href="https://www.apache.org/foundation/marks/">Apache Trademark Policy</a> |
| for more information. |
| </p> |
| </div> |
| </div> </div> <!-- End #wrap --> |
| </body> |
| </html> |