solr/solr-ref-guide/src/setting-up-an-external-zookeeper-ensemble.adoc - lucene-solr - Git at Google

 = Setting Up an External ZooKeeper Ensemble
 // Licensed to the Apache Software Foundation (ASF) under one
 // or more contributor license agreements.  See the NOTICE file
 // distributed with this work for additional information
 // regarding copyright ownership.  The ASF licenses this file
 // to you under the Apache License, Version 2.0 (the
 // "License"); you may not use this file except in compliance
 // with the License.  You may obtain a copy of the License at
 //
 //   http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing,
 // software distributed under the License is distributed on an
 // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 // KIND, either express or implied.  See the License for the
 // specific language governing permissions and limitations
 // under the License.

 Although Solr comes bundled with http://zookeeper.apache.org[Apache ZooKeeper], you are strongly encouraged to use an external ZooKeeper setup in production.

 While using Solr's embedded ZooKeeper instance is fine for getting started, you shouldn't use this in production because it does not provide any failover: if the Solr instance that hosts ZooKeeper shuts down, ZooKeeper is also shut down.
 Any shards or Solr instances that rely on it will not be able to communicate with it or each other.

 The solution to this problem is to set up an external ZooKeeper _ensemble_, which is a number of servers running ZooKeeper that communicate with each other to coordinate the activities of the cluster.

 == How Many ZooKeeper Nodes?

 The first question to answer is the number of ZooKeeper nodes you will run in your ensemble.

 When planning how many ZooKeeper nodes to configure, keep in mind that the main principle for a ZooKeeper ensemble is maintaining a majority of servers to serve requests. This majority is called a _quorum_.

 [quote,ZooKeeper Administrator's Guide,http://zookeeper.apache.org/doc/r{ivy-zookeeper-version}/zookeeperAdmin.html]
 ____
 "For a ZooKeeper service to be active, there must be a majority of non-failing machines that can communicate with each other. *To create a deployment that can tolerate the failure of F machines, you should count on deploying 2xF+1 machines*.
 ____

 To properly maintain a quorum, it's highly recommended to have an odd number of ZooKeeper servers in your ensemble, so a majority is maintained.

 To explain why, think about this scenario: If you have two ZooKeeper nodes and one goes down, this means only 50% of available servers are available. Since this is not a majority, ZooKeeper will no longer serve requests.

 However, if you have three ZooKeeper nodes and one goes down, you have 66% of your servers available and ZooKeeper will continue normally while you repair the one down node. If you have 5 nodes, you could continue operating with two down nodes if necessary.

 It's not generally recommended to go above 5 nodes. While it may seem that more nodes provide greater fault-tolerance and availability, in practice it becomes less efficient because of the amount of inter-node coordination that occurs. Unless you have a truly massive Solr cluster (on the scale of 1,000s of nodes), try to stay to 3 as a general rule, or maybe 5 if you have a larger cluster.

 More information on ZooKeeper clusters is available from the ZooKeeper documentation at http://zookeeper.apache.org/doc/r{ivy-zookeeper-version}/zookeeperAdmin.html#sc_zkMulitServerSetup.

 == Download Apache ZooKeeper

 The first step in setting up Apache ZooKeeper is, of course, to download the software. It's available from http://zookeeper.apache.org/releases.html.

 Solr currently uses Apache ZooKeeper v{ivy-zookeeper-version}.

 [WARNING]
 ====
 When using an external ZooKeeper ensemble, you will need need to keep your local installation up-to-date with the latest version distributed with Solr. Since it is a stand-alone application in this scenario, it does not get upgraded as part of a standard Solr upgrade.
 ====

 == ZooKeeper Installation

 Installation consists of extracting the files into a specific target directory where you'd like to have ZooKeeper store its internal data. The actual directory itself doesn't matter, as long as you know where it is.

 The command to unpack the ZooKeeper package is:

 [source,bash,subs="attributes"]
 tar xvf zookeeper-{ivy-zookeeper-version}.tar.gz

 This location is the `<ZOOKEEPER_HOME>` for ZooKeeper on this server.

 Installing and unpacking ZooKeeper must be repeated on each server where ZooKeeper will be run.

 == Configuration for a ZooKeeper Ensemble

 After installation, we'll first take a look at the basic configuration for ZooKeeper, then specific parameters for configuring each node to be part of an ensemble.

 === Initial Configuration

 To configure your ZooKeeper instance, create a file named `<ZOOKEEPER_HOME>/conf/zoo.cfg`. A sample configuration file is included in your ZooKeeper installation, as `conf/zoo_sample.cfg`. You can edit and rename that file instead of creating it new if you prefer.

 The file should have the following information to start:

 [source,properties]
 ----
 tickTime=2000
 dataDir=/var/lib/zookeeper
 clientPort=2181
 4lw.commands.whitelist=mntr,conf,ruok
 ----

 The parameters are as follows:

 `tickTime`:: Part of what ZooKeeper does is determine which servers are up and running at any given time, and the minimum session time out is defined as two "ticks". The `tickTime` parameter specifies in milliseconds how long each tick should be.

 `dataDir`:: This is the directory in which ZooKeeper will store data about the cluster. This directory must be empty before starting ZooKeeper for the first time.

 `clientPort`:: This is the port on which Solr will access ZooKeeper.

 `4lw.commands.whitelist`:: This allows the Solr admin UI to query ZooKeeper. Optionally use "*" to enable all "4 letter words", the three listed will enable the admin UI.

 These are the basic parameters that need to be in use on each ZooKeeper node, so this file must be copied to or created on each node.

 Next we'll customize this configuration to work within an ensemble.

 === Ensemble Configuration

 To complete configuration for an ensemble, we need to set additional parameters so each node knows who it is in the ensemble and where every other node is.

 Each of the examples below assume you are installing ZooKeeper on different servers with different hostnames.

 Once complete, your `zoo.cfg` file might look like this:

 [source,properties]
 ----
 tickTime=2000
 dataDir=/var/lib/zookeeper
 clientPort=2181
 4lw.commands.whitelist=mntr,conf,ruok

 initLimit=5
 syncLimit=2
 server.1=zoo1:2888:3888
 server.2=zoo2:2888:3888
 server.3=zoo3:2888:3888

 autopurge.snapRetainCount=3
 autopurge.purgeInterval=1
 ----

 We've added these parameters to the three we had already:

 `initLimit`:: Amount of time, in ticks, to allow followers to connect and sync to a leader. In this case, you have 5 ticks, each of which is 2000 milliseconds long, so the server will wait as long as 10 seconds to connect and sync with the leader.

 `syncLimit`:: Amount of time, in ticks, to allow followers to sync with ZooKeeper. If followers fall too far behind a leader, they will be dropped.

 `server._X_`:: These are the server IDs (the `_X_` part), hostnames (or IP addresses) and ports for all servers in the ensemble. The IDs differentiate each node of the ensemble, and allow each node to know where each of the other node is located. The ports can be any ports you choose; ZooKeeper's default ports are `2888:3888`.
 +
 Since we've assigned server IDs to specific hosts/ports, we must also define which server in the list this node is. We do this with a `myid` file stored in the data directory (defined by the `dataDir` parameter). The contents of the `myid` file is only the server ID.
 +
 In the case of the configuration example above, you would create the file `/var/lib/zookeeper/1/myid` with the content "1" (without quotes), as in this example:
 +
 [source,bash]
 1

 `autopurge.snapRetainCount`:: The number of snapshots and corresponding transaction logs to retain when purging old snapshots and transaction logs.
 +
 ZooKeeper automatically keeps a transaction log and writes to it as changes are made. A snapshot of the current state is taken periodically, and this snapshot supersedes transaction logs older than the snapshot. However, ZooKeeper never cleans up either the old snapshots or the old transaction logs; over time they will silently fill available disk space on each server.
 +
 To avoid this, set the `autopurge.snapRetainCount` and `autopurge.purgeInterval` parameters to enable an automatic clean up (purge) to occur at regular intervals. The `autopurge.snapRetainCount` parameter will keep the set number of snapshots and transaction logs when a clean up occurs. This parameter can be configured higher than `3`, but cannot be set lower than 3.

 `autopurge.purgeInterval`:: The time in hours between purge tasks. The default for this parameter is `0`, so must be set to `1` or higher to enable automatic clean up of snapshots and transaction logs. Setting it as high as `24`, for once a day, is acceptable if preferred.

 We'll repeat this configuration on each node.

 On the second node, update `<ZOOKEEPER_HOME>/conf/zoo.cfg` file so it matches the content on node 1 (particularly the server hosts and ports):

 [source,properties]
 ----
 tickTime=2000
 dataDir=/var/lib/zookeeper
 clientPort=2181
 4lw.commands.whitelist=mntr,conf,ruok

 initLimit=5
 syncLimit=2
 server.1=zoo1:2888:3888
 server.2=zoo2:2888:3888
 server.3=zoo3:2888:3888

 autopurge.snapRetainCount=3
 autopurge.purgeInterval=1
 ----

 On the second node, create a `myid` file with the contents "2", and put it in the `/var/lib/zookeeper` directory.

 [source,bash]
 2

 On the third node, update `<ZOOKEEPER_HOME>/conf/zoo.cfg` file so it matches the content on nodes 1 and 2 (particularly the server hosts and ports):

 [source,properties]
 ----
 tickTime=2000
 dataDir=/var/lib/zookeeper
 clientPort=2181
 4lw.commands.whitelist=mntr,conf,ruok

 initLimit=5
 syncLimit=2
 server.1=zoo1:2888:3888
 server.2=zoo2:2888:3888
 server.3=zoo3:2888:3888

 autopurge.snapRetainCount=3
 autopurge.purgeInterval=1
 ----

 And create the `myid` file in the `/var/lib/zookeeper` directory:

 [source,bash]
 3

 Repeat this for servers 4 and 5 if you are creating a 5-node ensemble (a rare case).


 === ZooKeeper Environment Configuration

 To ease troubleshooting in case of problems with the ensemble later, it's recommended to run ZooKeeper with logging enabled and with proper JVM garbage collection (GC) settings.

 . Create a file named `zookeeper-env.sh` and put it in the `<ZOOKEEPER_HOME>/conf` directory (the same place you put `zoo.cfg`). This file will need to exist on each server of the ensemble.

 . Add the following settings to the file:
 +
 [source,properties]
 ----
 ZOO_LOG_DIR="/path/for/log/files"
 ZOO_LOG4J_PROP="INFO,ROLLINGFILE"

 SERVER_JVMFLAGS="-Xms2048m -Xmx2048m -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:$ZOO_LOG_DIR/zookeeper_gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M"
 ----
 +
 The property `ZOO_LOG_DIR` defines the location on the server where ZooKeeper will print its logs. `ZOO_LOG4J_PROP` sets the logging level and log appenders.
 +
 With `SERVER_JVMFLAGS`, we've defined several parameters for garbage collection and logging GC-related events. One of the system parameters is `-Xloggc:$ZOO_LOG_DIR/zookeeper_gc.log`, which will put the garbage collection logs in the same directory we've defined for ZooKeeper logs, in a file named `zookeeper_gc.log`.

 . Review the default settings in `<ZOOKEEPER_HOME>/conf/log4j.properties`, especially the `log4j.appender.ROLLINGFILE.MaxFileSize` parameter. This sets the size at which log files will be rolled over, and by default it is 10MB.

 . Copy `zookeeper-env.sh` and any changes to `log4j.properties` to each server in the ensemble.

 NOTE: The above instructions are for Linux servers only. The default `zkServer.sh` script includes support for a `zookeeper-env.sh` file but the Windows version of the script, `zkServer.cmd`, does not. To make the same configuration on a Windows server, the changes would need to be made directly in the `zkServer.cmd`.

 At this point, you are ready to start your ZooKeeper ensemble.

 === More Information about ZooKeeper

 ZooKeeper provides a great deal of power through additional configurations, but delving into them is beyond the scope of Solr's documentation. For more information, see the  http://zookeeper.apache.org/doc/r{ivy-zookeeper-version}[ZooKeeper documentation].

 == Starting and Stopping ZooKeeper

 === Start ZooKeeper

 To start the ensemble, use the `<ZOOKEEPER_HOME>/bin/zkServer.sh` or `zkServer.cmd` script, as with this command:

 .Linux OS
 [source,bash]
 zkServer.sh start

 .Windows OS
 [source,text]
 zkServer.cmd start

 This command needs to be run on each server that will run ZooKeeper.

 TIP: You should see the ZooKeeper logs in the directory where you defined to store them. However, immediately after startup, you may not see the `zookeeper_gc.log` yet, as it likely will not appear until garbage collection has happened the first time.

 === Shut Down ZooKeeper

 To shut down ZooKeeper, use the same `zkServer.sh` or `zkServer.cmd` script on each server with the "stop" command:

 .Linux OS
 [source,bash]
 zkServer.sh stop

 .Windows OS
 [source,text]
 zkServer.cmd stop

 == Solr Configuration

 When starting Solr, you must provide an address for ZooKeeper or Solr won't know how to use it. This can be done in two ways: by defining the _connect string_, a list of servers where ZooKeeper is running, at every startup on every node of the Solr cluster, or by editing Solr's include file as a permanent system parameter. Both approaches are described below.

 When referring to the location of ZooKeeper within Solr, it's best to use the addresses of all the servers in the ensemble. If one happens to be down, Solr will automatically be able to send its request to another server in the list.

 ZooKeeper version 3.5 and later supports dynamic reconfiguration of server addresses and roles. But note that Solr will only be able to talk to the servers listed in the static ZooKeeper connect string.

 === Using a chroot

 If your ensemble is or will be shared among other systems besides Solr, you should consider defining application-specific _znodes_, or a hierarchical namespace that will only include Solr's files.

 Once you create a znode for each application, you add it's name, also called a _chroot_, to the end of your connect string whenever you tell Solr where to access ZooKeeper.

 Creating a chroot is done with a `bin/solr` command:

 [source,text]
 bin/solr zk mkroot /solr -z zk1:2181,zk2:2181,zk3:2181

 See the section <<solr-control-script-reference.adoc#create-a-znode-supports-chroot,Create a znode>> for more examples of this command.

 Once the znode is created, it behaves in a similar way to a directory on a filesystem: the data stored by Solr in ZooKeeper is nested beneath the main data directory and won't be mixed with data from another system or process that uses the same ZooKeeper ensemble.

 Alternatively, Solr can create the znode automatically as part of the first Solr instance's startup, read below for an example.

 === Using the -z Parameter with bin/solr

 Pointing Solr at the ZooKeeper ensemble you've created is a simple matter of using the `-z` parameter when using the `bin/solr` script.

 For example, to point the Solr instance to the ZooKeeper you've started on port 2181 on three servers with chroot `/solr` (see <<Using a chroot>> above), this is what you'd need to do:

 [source,bash]
 ----
 bin/solr start -e cloud -z zk1:2181,zk2:2181,zk3:2181/solr
 ----

 If the znode does not exist, you can set the ZK_CREATE_CHROOT environment variable to true to create it automatically on start-up:

 [source,bash]
 ----
 ZK_CREATE_CHROOT=true bin/solr start -e cloud -z zk1:2181,zk2:2181,zk3:2181/solr
 ----

 === Updating Solr Include Files

 If you update Solr include files (`solr.in.sh` or `solr.in.cmd`), which overrides defaults used with `bin/solr`, you will not have to use the `-z` parameter with `bin/solr` commands.


 [.dynamic-tabs]
 --
 [example.tab-pane#linux1]
 ====
 [.tab-label]*Linux: solr.in.sh*

 The section to look for will be commented out:

 [source,properties]
 ----
 # Set the ZooKeeper connection string if using an external ZooKeeper ensemble
 # e.g. host1:2181,host2:2181/chroot
 # Leave empty if not using SolrCloud
 #ZK_HOST=""
 ----

 Remove the comment marks at the start of the line and enter the ZooKeeper connect string:

 [source,properties]
 ----
 # Set the ZooKeeper connection string if using an external ZooKeeper ensemble
 # e.g. host1:2181,host2:2181/chroot
 # Leave empty if not using SolrCloud
 ZK_HOST="zk1:2181,zk2:2181,zk3:2181/solr"
 ----
 ====

 [example.tab-pane#zkwindows]
 ====
 [.tab-label]*Windows: solr.in.cmd*

 The section to look for will be commented out:

 [source,bat]
 ----
 REM Set the ZooKeeper connection string if using an external ZooKeeper ensemble
 REM e.g. host1:2181,host2:2181/chroot
 REM Leave empty if not using SolrCloud
 REM set ZK_HOST=
 ----

 Remove the comment marks at the start of the line and enter the ZooKeeper connect string:

 [source,bat]
 ----
 REM Set the ZooKeeper connection string if using an external ZooKeeper ensemble
 REM e.g. host1:2181,host2:2181/chroot
 REM Leave empty if not using SolrCloud
 set ZK_HOST=zk1:2181,zk2:2181,zk3:2181/solr
 ----
 ====
 --

 Now you will not have to enter the connection string when starting Solr.

 == Increasing the File Size Limit

 ZooKeeper is designed to hold small files, on the order of kilobytes.  By default, ZooKeeper's file size limit is 1MB.  Attempting to write or read files larger than this will cause errors.

 Some Solr features, e.g., text analysis synonyms, LTR, and OpenNLP named entity recognition, require configuration resources that can be larger than the default limit.  ZooKeeper can be configured, via Java system property https://zookeeper.apache.org/doc/r{ivy-zookeeper-version}/zookeeperAdmin.html#Unsafe+Options[`jute.maxbuffer`], to increase this limit.  Note that this configuration, which is required both for ZooKeeper server(s) and for all clients that connect to the server(s), must be the same everywhere it is specified.

 === Configuring jute.maxbuffer on ZooKeeper Nodes

 `jute.maxbuffer` must be configured on each external ZooKeeper node.  This can be achieved in any of the following ways; note though that only the first option works on Windows:

 . In `<ZOOKEEPER_HOME>/conf/zoo.cfg`, e.g., to increase the file size limit to one byte less than 10MB, add this line:
 +
 [source,properties]
 jute.maxbuffer=0x9fffff
 . In `<ZOOKEEPER_HOME>/conf/zookeeper-env.sh`, e.g., to increase the file size limit to 50MiB, add this line:
 +
 [source,properties]
 JVMFLAGS="$JVMFLAGS -Djute.maxbuffer=50000000"
 . In `<ZOOKEEPER_HOME>/bin/zkServer.sh`, add a `JVMFLAGS` environment variable assignment near the top of the script, e.g., to increase the file size limit to 5MiB:
 +
 [source,properties]
 JVMFLAGS="$JVMFLAGS -Djute.maxbuffer=5000000"

 === Configuring jute.maxbuffer for ZooKeeper Clients

 The `bin/solr` script invokes Java programs that act as ZooKeeper clients. When you use Solr's bundled ZooKeeper server instead of setting up an external ZooKeeper ensemble, the configuration described below will also configure the ZooKeeper server.

 Add the setting to the `SOLR_OPTS` environment variable in Solr's include file (`bin/solr.in.sh` or `solr.in.cmd`):

 [.dynamic-tabs]
 --
 [example.tab-pane#linux2]
 ====
 [.tab-label]*Linux: solr.in.sh*

 The section to look for will start:

 [source,properties]
 ----
 # Anything you add to the SOLR_OPTS variable will be included in the java
 # start command line as-is, in ADDITION to other options. If you specify the
 # -a option on start script, those options will be appended as well. Examples:
 ----

 Add the following line to increase the file size limit to 2MB:

 [source,properties]
 SOLR_OPTS="$SOLR_OPTS -Djute.maxbuffer=0x200000"
 ====

 [example.tab-pane#zkwindows2]
 ====
 [.tab-label]*Windows: solr.in.cmd*

 The section to look for will start:

 [source,bat]
 ----
 REM Anything you add to the SOLR_OPTS variable will be included in the java
 REM start command line as-is, in ADDITION to other options. If you specify the
 REM -a option on start script, those options will be appended as well. Examples:
 ----

 Add the following line to increase the file size limit to 2MB:

 [source,bat]
 ----
 set SOLR_OPTS=%SOLR_OPTS% -Djute.maxbuffer=0x200000
 ----
 ====
 --

 == Securing the ZooKeeper Connection

 You may also want to secure the communication between ZooKeeper and Solr.

 To setup ACL protection of znodes, see the section <<zookeeper-access-control.adoc#,ZooKeeper Access Control>>.