| = Other Schema Elements |
| // Licensed to the Apache Software Foundation (ASF) under one |
| // or more contributor license agreements. See the NOTICE file |
| // distributed with this work for additional information |
| // regarding copyright ownership. The ASF licenses this file |
| // to you under the Apache License, Version 2.0 (the |
| // "License"); you may not use this file except in compliance |
| // with the License. You may obtain a copy of the License at |
| // |
| // http://www.apache.org/licenses/LICENSE-2.0 |
| // |
| // Unless required by applicable law or agreed to in writing, |
| // software distributed under the License is distributed on an |
| // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| // KIND, either express or implied. See the License for the |
| // specific language governing permissions and limitations |
| // under the License. |
| |
| This section describes several other important elements of `schema.xml` not covered in earlier sections. |
| |
| == Unique Key |
| |
| The `uniqueKey` element specifies which field is a unique identifier for documents. Although `uniqueKey` is not required, it is nearly always warranted by your application design. For example, `uniqueKey` should be used if you will ever update a document in the index. |
| |
| You can define the unique key field by naming it: |
| |
| [source,xml] |
| ---- |
| <uniqueKey>id</uniqueKey> |
| ---- |
| |
| Schema defaults and `copyFields` cannot be used to populate the `uniqueKey` field. The `fieldType` of `uniqueKey` must not be analyzed and must not be any of the `*PointField` types. You can use `UUIDUpdateProcessorFactory` to have `uniqueKey` values generated automatically. |
| |
| Further, the operation will fail if the `uniqueKey` field is used, but is multivalued (or inherits the multivalue-ness from the `fieldtype`). However, `uniqueKey` will continue to work, as long as the field is properly used. |
| |
| |
| == Similarity |
| |
| Similarity is a Lucene class used to score a document in searching. |
| |
| Each collection has one "global" Similarity, and by default Solr uses an implicit {solr-javadocs}/solr-core/org/apache/solr/search/similarities/SchemaSimilarityFactory.html[`SchemaSimilarityFactory`] which allows individual field types to be configured with a "per-type" specific Similarity and implicitly uses `BM25Similarity` for any field type which does not have an explicit Similarity. |
| |
| This default behavior can be overridden by declaring a top level `<similarity/>` element in your `schema.xml`, outside of any single field type. This similarity declaration can either refer directly to the name of a class with a no-argument constructor, such as in this example showing `BM25Similarity`: |
| |
| [source,xml] |
| ---- |
| <similarity class="solr.BM25SimilarityFactory"/> |
| ---- |
| |
| or by referencing a `SimilarityFactory` implementation, which may take optional initialization parameters: |
| |
| [source,xml] |
| ---- |
| <similarity class="solr.DFRSimilarityFactory"> |
| <str name="basicModel">P</str> |
| <str name="afterEffect">L</str> |
| <str name="normalization">H2</str> |
| <float name="c">7</float> |
| </similarity> |
| ---- |
| |
| In most cases, specifying global level similarity like this will cause an error if your `schema.xml` also includes field type specific `<similarity/>` declarations. One key exception to this is that you may explicitly declare a {solr-javadocs}/solr-core/org/apache/solr/search/similarities/SchemaSimilarityFactory.html[`SchemaSimilarityFactory`] and specify what that default behavior will be for all field types that do not declare an explicit Similarity using the name of field type (specified by `defaultSimFromFieldType`) that _is_ configured with a specific similarity: |
| |
| [source,xml] |
| ---- |
| <similarity class="solr.SchemaSimilarityFactory"> |
| <str name="defaultSimFromFieldType">text_dfr</str> |
| </similarity> |
| <fieldType name="text_dfr" class="solr.TextField"> |
| <analyzer ... /> |
| <similarity class="solr.DFRSimilarityFactory"> |
| <str name="basicModel">I(F)</str> |
| <str name="afterEffect">B</str> |
| <str name="normalization">H3</str> |
| <float name="mu">900</float> |
| </similarity> |
| </fieldType> |
| <fieldType name="text_ib" class="solr.TextField"> |
| <analyzer ... /> |
| <similarity class="solr.IBSimilarityFactory"> |
| <str name="distribution">SPL</str> |
| <str name="lambda">DF</str> |
| <str name="normalization">H2</str> |
| </similarity> |
| </fieldType> |
| <fieldType name="text_other" class="solr.TextField"> |
| <analyzer ... /> |
| </fieldType> |
| ---- |
| |
| In the example above `IBSimilarityFactory` (using the Information-Based model) will be used for any fields of type `text_ib`, while `DFRSimilarityFactory` (divergence from random) will be used for any fields of type `text_dfr`, as well as any fields using a type that does not explicitly specify a `<similarity/>`. |
| |
| If `SchemaSimilarityFactory` is explicitly declared without configuring a `defaultSimFromFieldType`, then `BM25Similarity` is implicitly used as the default for `luceneMatchVersion >= 8.0.0` and otherwise the deprecated `LegacyBM25Similarity` (which will be removed in 9.x) is used to mimic the same BM25 formula that was the default in those versions. |
| |
| In addition to the various factories mentioned on this page, there are several other similarity implementations that can be used such as the `SweetSpotSimilarityFactory`, `ClassicSimilarityFactory` etc. For details, see the Solr Javadocs for the {solr-javadocs}/solr-core/org/apache/solr/schema/SimilarityFactory.html[similarity factories]. |