Solr Cross DC is a simple cross-data-center fail-over solution for Apache Solr. It has three key components: the CrossDC Producer, the CrossDC Consumer, and Apache Kafka. The Producer is a Solr UpdateProcessor plugin that forwards updates from the primary data center, while the Consumer is an update request consumer application that receives updates in the backup data center. Kafka is the distributed queue that connects the two.
Solr Cross DC is designed to provide a simple and reliable way to replicate Solr updates across multiple data centers. It is particularly useful for organizations that need to ensure high availability and disaster recovery for their Solr clusters.
The CrossDC Producer intercepts updates when the node acts as the leader and puts those updates onto the distributed queue. The CrossDC Consumer polls the distributed queue and forwards updates to the configured Solr cluster upon receiving the update requests.
To use Solr Cross DC, follow these steps:
Configure the sharedLib directory in solr.xml (e.g., sharedLIb=lib) and place the CrossDC producer plug-in jar file into the specified folder. solr.xml
<solr> <str name="sharedLib">${solr.sharedLib:}</str>
Configure the new UpdateProcessor in solrconfig.xml.
<updateRequestProcessorChain name="mirrorUpdateChain" default="true"> <processor class="org.apache.solr.update.processor.MirroringUpdateRequestProcessorFactory"> <str name="bootstrapServers">${bootstrapServers:}</str> <str name="topicName">${topicName:}</str> </processor> <processor class="solr.LogUpdateProcessorFactory" /> <processor class="solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain> ```
Add an external version constraint UpdateProcessor to the update chain added to solrconfig.xml to allow user-provided update versions. See https://solr.apache.org/guide/8_11/update-request-processors.html#general-use-updateprocessorfactories and https://solr.apache.org/docs/8_1_1/solr-core/org/apache/solr/update/processor/DocBasedVersionConstraintsProcessor.html
Start or restart the Solr cluster(s).
There are two configuration properties:
bootstrapServers
: list of servers used to connect to the Kafka clustertopicName
: Kafka topicName used to indicate which Kafka queue the Solr updates will be pushed onThe required configuration properties are:
bootstrapServers
: list of servers used to connect to the Kafka clustertopicName
: Kafka topicName used to indicate which Kafka queue the Solr updates will be pushed to. This can be a comma separated list for the Consumer if you would like to consume multiple topics.zkConnectString
: Zookeeper connection string used by Solr to connect to its Zookeeper cluster in the backup data centerThe following additional configuration properties should either be specified for both the producer and the consumer or in the shared Zookeeper central config properties file:
batchSizeBytes
: maximum batch size in bytes for the queuebufferMemoryBytes
: memory allocated by the Producer in total for bufferinglingerMs
: amount of time that the Producer will wait to add to a batchrequestTimeout
: request timeout for the ProducerYou can manage the configuration centrally in Solr's Zookeeper cluster by placing a properties file called crossdc.properties in the root Solr Zookeeper znode, eg, /solr/crossdc.properties. Both bootstrapServers and topicName properties can be put in this file. For the CrossDC Consumer application, you would only have to set zkConnectString for the local Solr cluster.
Use the enabled attribute, false turns the processor into a NOOP in the chain.
cluster.sh* script located in the root of the CrossDC repository. This script is a helpful developer tool for manual testing and it will download Solr and Kafka and then configure both for Cross DC.