blob: a7afaee7d8a208671a1c69965b8a69db94e4b953 [file] [log] [blame]
= UpdateHandlers in SolrConfig
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
The settings in this section are configured in the `<updateHandler>` element in `solrconfig.xml` and may affect the performance of index updates. These settings affect how updates are done internally. `<updateHandler>` configurations do not affect the higher level configuration of <<requesthandlers-and-searchcomponents-in-solrconfig.adoc#,RequestHandlers>> that process client update requests.
[source,xml]
----
<updateHandler class="solr.DirectUpdateHandler2">
...
</updateHandler>
----
== Commits
Data sent to Solr is not searchable until it has been _committed_ to the index. The reason for this is that in some cases commits can be slow and they should be done in isolation from other possible commit requests to avoid overwriting data. So, it's preferable to provide control over when data is committed. Several options are available to control the timing of commits.
=== commit and softCommit
In Solr, a `commit` is an action which asks Solr to "commit" those changes to the Lucene index files. By default commit actions result in a "hard commit" of all the Lucene index files to stable storage (disk). When a client includes a `commit=true` parameter with an update request, this ensures that all index segments affected by the adds & deletes on an update are written to disk as soon as index updates are completed.
If an additional flag `softCommit=true` is specified, then Solr performs a 'soft commit', meaning that Solr will commit your changes to the Lucene data structures quickly but not guarantee that the Lucene index files are written to stable storage. This is an implementation of Near Real Time storage, a feature that boosts document visibility, since you don't have to wait for background merges and storage (to ZooKeeper, if using <<solrcloud.adoc#,SolrCloud>>) to finish before moving on to something else. A full commit means that, if a server crashes, Solr will know exactly where your data was stored; a soft commit means that the data is stored, but the location information isn't yet stored. The tradeoff is that a soft commit gives you faster visibility because it's not waiting for background merges to finish.
For more information about Near Real Time operations, see <<near-real-time-searching.adoc#,Near Real Time Searching>>.
=== autoCommit
These settings control how often pending updates will be automatically pushed to the index. An alternative to `autoCommit` is to use `commitWithin`, which can be defined when making the update request to Solr (i.e., when pushing documents), or in an update RequestHandler.
`maxDocs`::
The number of updates that have occurred since the last commit.
`maxTime`::
The number of milliseconds since the oldest uncommitted update.
`maxSize`::
The maximum size of the transaction log (tlog) on disk, after which a hard commit is triggered. This is useful when the size of documents is unknown and the intention is to restrict the size of the transaction log to reasonable size. Valid values can be bytes (default with no suffix), kilobytes (if defined with a `k` suffix, as in `25k`), megabytes (`m`) or gigabytes (`g`).
`openSearcher`::
Whether to open a new searcher when performing a commit. If this is `false`, the commit will flush recent index changes to stable storage, but does not cause a new searcher to be opened to make those changes visible. The default is `true`.
If any of the `maxDocs`, `maxTime`, or `maxSize` limits are reached, Solr automatically performs a commit operation. If the `autoCommit` tag is missing, then only explicit commits will update the index. The decision whether to use auto-commit or not depends on the needs of your application.
Determining the best auto-commit settings is a tradeoff between performance and accuracy. Settings that cause frequent updates will improve the accuracy of searches because new content will be searchable more quickly, but performance may suffer because of the frequent updates. Less frequent updates may improve performance but it will take longer for updates to show up in queries.
[source,xml]
----
<autoCommit>
<maxDocs>10000</maxDocs>
<maxTime>30000</maxTime>
<maxSize>512m</maxSize>
<openSearcher>false</openSearcher>
</autoCommit>
----
You can also specify 'soft' autoCommits in the same way that you can specify 'soft' commits, except that instead of using `autoCommit` you set the `autoSoftCommit` tag.
[source,xml]
----
<autoSoftCommit>
<maxTime>60000</maxTime>
</autoSoftCommit>
----
=== commitWithin
The `commitWithin` settings allow forcing document commits to happen in a defined time period. This is used most frequently with <<near-real-time-searching.adoc#,Near Real Time Searching>>, and for that reason the default is to perform a soft commit. This does not, however, replicate new documents to follower servers in a leader/follower environment. If that's a requirement for your implementation, you can force a hard commit by adding a parameter, as in this example:
[source,xml]
----
<commitWithin>
<softCommit>false</softCommit>
</commitWithin>
----
With this configuration, when you call `commitWithin` as part of your update message, it will automatically perform a hard commit every time.
== Event Listeners
The UpdateHandler section is also where update-related event listeners can be configured. These can be triggered to occur after any commit (`event="postCommit"`) or only after optimize commands (`event="postOptimize"`).
Users can write custom update event listener classes in Solr plugins. As of Solr 7.1,
`RunExecutableListener` was removed for security reasons.
== Transaction Log
As described in the section <<realtime-get.adoc#,RealTime Get>>, a transaction log is required for that feature. It is configured in the `updateHandler` section of `solrconfig.xml`.
Realtime Get currently relies on the update log feature, which is enabled by default. It relies on an update log, which is configured in `solrconfig.xml`, in a section like:
[source,xml]
----
<updateLog>
<str name="dir">${solr.ulog.dir:}</str>
</updateLog>
----
Three additional expert-level configuration settings affect indexing performance and how far a replica can fall behind on updates before it must enter into full recovery - see the section on <<solrcloud-recoveries-and-write-tolerance.adoc#,write side fault tolerance>> for more information:
`numRecordsToKeep`::
The number of update records to keep per log. The default is `100`.
`maxNumLogsToKeep`::
The maximum number of logs keep. The default is `10`.
`numVersionBuckets`::
The number of buckets used to keep track of max version values when checking for re-ordered updates; increase this value to reduce the cost of synchronizing access to version buckets during high-volume indexing, this requires `(8 bytes (long) * numVersionBuckets)` of heap space per Solr core. The default is `65536`.
An example, to be included under `<config><updateHandler>` in `solrconfig.xml`, employing the above advanced settings:
[source,xml]
----
<updateLog>
<str name="dir">${solr.ulog.dir:}</str>
<int name="numRecordsToKeep">500</int>
<int name="maxNumLogsToKeep">20</int>
<int name="numVersionBuckets">65536</int>
</updateLog>
----
== Other Options
In some cases complex updates (such as spatial/shape) may take very long time to complete. In the default
configuration other updates that fall into the same internal version bucket will wait indefinitely and
eventually these outstanding requests may pile up and lead to thread exhaustion and eventually to
OutOfMemory errors.
The option `versionBucketLockTimeoutMs` in the `updateHandler` section helps to prevent that by
specifying a limited timeout for such extremely long running update requests. If this limit
is reached this update will fail but it won't block forever all other updates. See SOLR-12833 for more details.
There's a memory cost associated with this setting. Values greater than the default 0 (meaning unlimited timeout)
cause Solr to use a different internal implementation of the version bucket, which increases memory consumption
from ~1.5MB to ~6.8MB per Solr core.
An example of specifying this option under `<config>` section of `solrconfig.xml`:
[source,xml]
----
<updateHandler class="solr.DirectUpdateHandler2">
...
<int name="versionBucketLockTimeoutMs">10000</int>
</updateHandler>
----