blob: 703b6fb965084f19443d8cfdb35efd2999251e8c [file] [log] [blame]
TextFilters allow Jackrabbit to extract text from binary
properties for indexing purposes.
This project contains TextFilter implementations for the
following binary formats:
1. MsExcel
2. MsPowerPoint
3. MsWord
4. Pdf
How to register in jackrabbit?
Build the jar file and place it in the Jackrabbit
classpath together with the dependencies of these text
filters.
Configure them in the SearchIndex element of the workspace.xml
Sample:
...
<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
<param name="path" value="${wsp.home}/index" />
<param name="textFilterClasses" value="org.apache.jackrabbit.core.query.MsExcelTextFilter,org.apache.jackrabbit.core.query.MsPowerPointTextFilter,org.apache.jackrabbit.core.query.MsWordTextFilter,org.apache.jackrabbit.core.query.PdfTextFilter,org.apache.jackrabbit.core.query.HTMLTextFilter,org.apache.jackrabbit.core.query.XMLTextFilter,org.apache.jackrabbit.core.query..RTFTextFilter" />
</SearchIndex>
...
For further information, see the javadocs for:
org.apache.jackrabbit.core.query.TextFilter