blob: 20dfad6ab34c4b304ce0773f05ae6a4108dcb395 [file] [log] [blame]
= Other Schema Elements
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
This section describes several other important elements of `schema.xml` not covered in earlier sections.
== Unique Key
The `uniqueKey` element specifies which field is a unique identifier for documents. Although `uniqueKey` is not required, it is nearly always warranted by your application design. For example, `uniqueKey` should be used if you will ever update a document in the index.
You can define the unique key field by naming it:
[source,xml]
----
<uniqueKey>id</uniqueKey>
----
Schema defaults and `copyFields` cannot be used to populate the `uniqueKey` field. The `fieldType` of `uniqueKey` must not be analyzed and must not be any of the `*PointField` types. You can use `UUIDUpdateProcessorFactory` to have `uniqueKey` values generated automatically.
Further, the operation will fail if the `uniqueKey` field is used, but is multivalued (or inherits the multivalue-ness from the `fieldtype`). However, `uniqueKey` will continue to work, as long as the field is properly used.
== Similarity
Similarity is a Lucene class used to score a document in searching.
Each collection has one "global" Similarity, and by default Solr uses an implicit {solr-javadocs}/solr-core/org/apache/solr/search/similarities/SchemaSimilarityFactory.html[`SchemaSimilarityFactory`] which allows individual field types to be configured with a "per-type" specific Similarity and implicitly uses `BM25Similarity` for any field type which does not have an explicit Similarity.
This default behavior can be overridden by declaring a top level `<similarity/>` element in your `schema.xml`, outside of any single field type. This similarity declaration can either refer directly to the name of a class with a no-argument constructor, such as in this example showing `BM25Similarity`:
[source,xml]
----
<similarity class="solr.BM25SimilarityFactory"/>
----
or by referencing a `SimilarityFactory` implementation, which may take optional initialization parameters:
[source,xml]
----
<similarity class="solr.DFRSimilarityFactory">
<str name="basicModel">P</str>
<str name="afterEffect">L</str>
<str name="normalization">H2</str>
<float name="c">7</float>
</similarity>
----
In most cases, specifying global level similarity like this will cause an error if your `schema.xml` also includes field type specific `<similarity/>` declarations. One key exception to this is that you may explicitly declare a {solr-javadocs}/solr-core/org/apache/solr/search/similarities/SchemaSimilarityFactory.html[`SchemaSimilarityFactory`] and specify what that default behavior will be for all field types that do not declare an explicit Similarity using the name of field type (specified by `defaultSimFromFieldType`) that _is_ configured with a specific similarity:
[source,xml]
----
<similarity class="solr.SchemaSimilarityFactory">
<str name="defaultSimFromFieldType">text_dfr</str>
</similarity>
<fieldType name="text_dfr" class="solr.TextField">
<analyzer ... />
<similarity class="solr.DFRSimilarityFactory">
<str name="basicModel">I(F)</str>
<str name="afterEffect">B</str>
<str name="normalization">H3</str>
<float name="mu">900</float>
</similarity>
</fieldType>
<fieldType name="text_ib" class="solr.TextField">
<analyzer ... />
<similarity class="solr.IBSimilarityFactory">
<str name="distribution">SPL</str>
<str name="lambda">DF</str>
<str name="normalization">H2</str>
</similarity>
</fieldType>
<fieldType name="text_other" class="solr.TextField">
<analyzer ... />
</fieldType>
----
In the example above `IBSimilarityFactory` (using the Information-Based model) will be used for any fields of type `text_ib`, while `DFRSimilarityFactory` (divergence from random) will be used for any fields of type `text_dfr`, as well as any fields using a type that does not explicitly specify a `<similarity/>`.
If `SchemaSimilarityFactory` is explicitly declared without configuring a `defaultSimFromFieldType`, then `BM25Similarity` is implicitly used as the default for `luceneMatchVersion >= 8.0.0` and otherwise the deprecated `LegacyBM25Similarity` (which will be removed in 9.x) is used to mimic the same BM25 formula that was the default in those versions.
In addition to the various factories mentioned on this page, there are several other similarity implementations that can be used such as the `SweetSpotSimilarityFactory`, `ClassicSimilarityFactory` etc. For details, see the Solr Javadocs for the {solr-javadocs}/solr-core/org/apache/solr/schema/SimilarityFactory.html[similarity factories].