| = The Term Vector Component |
| // Licensed to the Apache Software Foundation (ASF) under one |
| // or more contributor license agreements. See the NOTICE file |
| // distributed with this work for additional information |
| // regarding copyright ownership. The ASF licenses this file |
| // to you under the Apache License, Version 2.0 (the |
| // "License"); you may not use this file except in compliance |
| // with the License. You may obtain a copy of the License at |
| // |
| // http://www.apache.org/licenses/LICENSE-2.0 |
| // |
| // Unless required by applicable law or agreed to in writing, |
| // software distributed under the License is distributed on an |
| // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| // KIND, either express or implied. See the License for the |
| // specific language governing permissions and limitations |
| // under the License. |
| |
| The TermVectorComponent is a search component designed to return additional information about documents matching your search. |
| |
| For each document in the response, the TermVectorCcomponent can return the term vector, the term frequency, inverse document frequency, position, and offset information. |
| |
| == Term Vector Component Configuration |
| |
| The TermVectorComponent is not enabled implicitly in Solr - it must be explicitly configured in your `solrconfig.xml` file. The examples on this page show how it is configured in Solr's "```techproducts```" example: |
| |
| [source,bash] |
| ---- |
| bin/solr -e techproducts |
| ---- |
| |
| To enable the this component, you need to configure it using a `searchComponent` element: |
| |
| [source,xml] |
| ---- |
| <searchComponent name="tvComponent" class="org.apache.solr.handler.component.TermVectorComponent"/> |
| ---- |
| |
| A request handler must then be configured to use this component name. In the `techproducts` example, the component is associated with a special request handler named `/tvrh`, that enables term vectors by default using the `tv=true` parameter; but you can associate it with any request handler: |
| |
| [source,xml] |
| ---- |
| <requestHandler name="/tvrh" class="org.apache.solr.handler.component.SearchHandler"> |
| <lst name="defaults"> |
| <bool name="tv">true</bool> |
| </lst> |
| <arr name="last-components"> |
| <str>tvComponent</str> |
| </arr> |
| </requestHandler> |
| ---- |
| |
| Once your handler is defined, you may use in conjunction with any schema (that has a `uniqueKeyField)` to fetch term vectors for fields configured with the `termVector` attribute, such as in the `techproducts` sample schema. For example: |
| |
| [source,xml] |
| ---- |
| <field name="includes" |
| type="text_general" |
| indexed="true" |
| stored="true" |
| multiValued="true" |
| termVectors="true" |
| termPositions="true" |
| termOffsets="true" /> |
| ---- |
| |
| == Invoking the Term Vector Component |
| |
| The example below shows an invocation of this component using the above configuration: |
| |
| [source,text] |
| http://localhost:8983/solr/techproducts/tvrh?q=*:*&start=0&rows=10&fl=id,includes&wt=xml |
| |
| [source,xml] |
| ---- |
| ... |
| <lst name="termVectors"> |
| <lst name="GB18030TEST"> |
| <str name="uniqueKey">GB18030TEST</str> |
| </lst> |
| <lst name="EN7800GTX/2DHTV/256M"> |
| <str name="uniqueKey">EN7800GTX/2DHTV/256M</str> |
| </lst> |
| <lst name="100-435805"> |
| <str name="uniqueKey">100-435805</str> |
| </lst> |
| <lst name="3007WFP"> |
| <str name="uniqueKey">3007WFP</str> |
| <lst name="includes"> |
| <lst name="cable"/> |
| <lst name="usb"/> |
| </lst> |
| </lst> |
| <lst name="SOLR1000"> |
| <str name="uniqueKey">SOLR1000</str> |
| </lst> |
| <lst name="0579B002"> |
| <str name="uniqueKey">0579B002</str> |
| </lst> |
| <lst name="UTF8TEST"> |
| <str name="uniqueKey">UTF8TEST</str> |
| </lst> |
| <lst name="9885A004"> |
| <str name="uniqueKey">9885A004</str> |
| <lst name="includes"> |
| <lst name="32mb"/> |
| <lst name="av"/> |
| <lst name="battery"/> |
| <lst name="cable"/> |
| <lst name="card"/> |
| <lst name="sd"/> |
| <lst name="usb"/> |
| </lst> |
| </lst> |
| <lst name="adata"> |
| <str name="uniqueKey">adata</str> |
| </lst> |
| <lst name="apple"> |
| <str name="uniqueKey">apple</str> |
| </lst> |
| </lst> |
| ---- |
| |
| === Term Vector Request Parameters |
| |
| The example below shows some of the available request parameters for this component: |
| |
| [source,bash] |
| http://localhost:8983/solr/techproducts/tvrh?q=includes:[* TO *]&rows=10&indent=true&tv=true&tv.tf=true&tv.df=true&tv.positions=true&tv.offsets=true&tv.payloads=true&tv.fl=includes |
| |
| `tv`:: |
| If `true`, the Term Vector Component will run. |
| |
| `tv.docIds`:: |
| For a given comma-separated list of Lucene document IDs (*not* the Solr Unique Key), term vectors will be returned. |
| |
| `tv.fl`:: |
| For a given comma-separated list of fields, term vectors will be returned. If not specified, the `fl` parameter is used. |
| |
| `tv.all`:: |
| If `true`, all the boolean parameters listed below (`tv.df`, `tv.offsets`, `tv.positions`, `tv.payloads`, `tv.tf` and `tv.tf_idf`) will be enabled. |
| |
| `tv.df`:: |
| If `true`, returns the Document Frequency (DF) of the term in the collection. This can be computationally expensive. |
| |
| `tv.offsets`:: |
| If `true`, returns offset information for each term in the document. |
| |
| `tv.positions`:: |
| If `true`, returns position information. |
| |
| `tv.payloads`:: |
| If `true`, returns payload information. |
| |
| `tv.tf`:: |
| If `true`, returns document term frequency info for each term in the document. |
| |
| `tv.tf_idf`:: |
| If `true`, calculates TF / DF (i.e.,: TF * IDF) for each term. Please note that this is a _literal_ calculation of "Term Frequency multiplied by Inverse Document Frequency" and *not* a classical TF-IDF similarity measure. |
| + |
| This parameter requires both `tv.tf` and `tv.df` to be "true". This can be computationally expensive. (The results are not shown in example output) |
| |
| To see an example of TermVector component output, see the Wiki page: http://wiki.apache.org/solr/TermVectorComponentExampleOptions |
| |
| For schema requirements, see also the section <<field-properties-by-use-case.adoc#field-properties-by-use-case, Field Properties by Use Case>>. |
| |
| == SolrJ and the Term Vector Component |
| |
| Neither the `SolrQuery` class nor the `QueryResponse` class offer specific method calls to set Term Vector Component parameters or get the "termVectors" output. However, there is a patch for it: https://issues.apache.org/jira/browse/SOLR-949[SOLR-949]. |