| = Setting Up an External ZooKeeper Ensemble |
| // Licensed to the Apache Software Foundation (ASF) under one |
| // or more contributor license agreements. See the NOTICE file |
| // distributed with this work for additional information |
| // regarding copyright ownership. The ASF licenses this file |
| // to you under the Apache License, Version 2.0 (the |
| // "License"); you may not use this file except in compliance |
| // with the License. You may obtain a copy of the License at |
| // |
| // http://www.apache.org/licenses/LICENSE-2.0 |
| // |
| // Unless required by applicable law or agreed to in writing, |
| // software distributed under the License is distributed on an |
| // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| // KIND, either express or implied. See the License for the |
| // specific language governing permissions and limitations |
| // under the License. |
| |
| Although Solr comes bundled with http://zookeeper.apache.org[Apache ZooKeeper], you should consider yourself discouraged from using this internal ZooKeeper in production. |
| |
| Shutting down a redundant Solr instance will also shut down its ZooKeeper server, which might not be quite so redundant. Because a ZooKeeper ensemble must have a quorum of more than half its servers running at any given time, this can be a problem. |
| |
| The solution to this problem is to set up an external ZooKeeper ensemble. Fortunately, while this process can seem intimidating due to the number of powerful options, setting up a simple ensemble is actually quite straightforward, as described below. |
| |
| .How Many ZooKeepers? |
| [quote,ZooKeeper Administrator's Guide,http://zookeeper.apache.org/doc/r3.4.11/zookeeperAdmin.html] |
| ____ |
| "For a ZooKeeper service to be active, there must be a majority of non-failing machines that can communicate with each other. *To create a deployment that can tolerate the failure of F machines, you should count on deploying 2xF+1 machines*. Thus, a deployment that consists of three machines can handle one failure, and a deployment of five machines can handle two failures. Note that a deployment of six machines can only handle two failures since three machines is not a majority. |
| |
| For this reason, ZooKeeper deployments are usually made up of an odd number of machines." |
| ____ |
| |
| When planning how many ZooKeeper nodes to configure, keep in mind that the main principle for a ZooKeeper ensemble is maintaining a majority of servers to serve requests. This majority is also called a _quorum_. |
| |
| It is generally recommended to have an odd number of ZooKeeper servers in your ensemble, so a majority is maintained. |
| |
| For example, if you only have two ZooKeeper nodes and one goes down, 50% of available servers is not a majority, so ZooKeeper will no longer serve requests. However, if you have three ZooKeeper nodes and one goes down, you have 66% of available servers available, and ZooKeeper will continue normally while you repair the one down node. If you have 5 nodes, you could continue operating with two down nodes if necessary. |
| |
| More information on ZooKeeper clusters is available from the ZooKeeper documentation at http://zookeeper.apache.org/doc/r3.4.11/zookeeperAdmin.html#sc_zkMulitServerSetup. |
| |
| == Download Apache ZooKeeper |
| |
| The first step in setting up Apache ZooKeeper is, of course, to download the software. It's available from http://zookeeper.apache.org/releases.html. |
| |
| [IMPORTANT] |
| ==== |
| When using stand-alone ZooKeeper, you need to take care to keep your version of ZooKeeper updated with the latest version distributed with Solr. Since you are using it as a stand-alone application, it does not get upgraded when you upgrade Solr. |
| |
| Solr currently uses Apache ZooKeeper v3.4.11. |
| ==== |
| |
| == Setting Up a Single ZooKeeper |
| |
| === Create the Instance |
| Creating the instance is a simple matter of extracting the files into a specific target directory. The actual directory itself doesn't matter, as long as you know where it is, and where you'd like to have ZooKeeper store its internal data. |
| |
| === Configure the Instance |
| The next step is to configure your ZooKeeper instance. To do that, create the following file: `<ZOOKEEPER_HOME>/conf/zoo.cfg`. To this file, add the following information: |
| |
| [source,bash] |
| ---- |
| tickTime=2000 |
| dataDir=/var/lib/zookeeper |
| clientPort=2181 |
| ---- |
| |
| The parameters are as follows: |
| |
| `tickTime`:: Part of what ZooKeeper does is to determine which servers are up and running at any given time, and the minimum session time out is defined as two "ticks". The `tickTime` parameter specifies, in miliseconds, how long each tick should be. |
| |
| `dataDir`:: This is the directory in which ZooKeeper will store data about the cluster. This directory should start out empty. |
| |
| `clientPort`:: This is the port on which Solr will access ZooKeeper. |
| |
| Once this file is in place, you're ready to start the ZooKeeper instance. |
| |
| === Run the Instance |
| |
| To run the instance, you can simply use the `ZOOKEEPER_HOME/bin/zkServer.sh` script provided, as with this command: `zkServer.sh start` |
| |
| Again, ZooKeeper provides a great deal of power through additional configurations, but delving into them is beyond the scope of this tutorial. For more information, see the ZooKeeper http://zookeeper.apache.org/doc/r3.4.5/zookeeperStarted.html[Getting Started] page. For this example, however, the defaults are fine. |
| |
| === Point Solr at the Instance |
| |
| Pointing Solr at the ZooKeeper instance you've created is a simple matter of using the `-z` parameter when using the bin/solr script. For example, in order to point the Solr instance to the ZooKeeper you've started on port 2181, this is what you'd need to do: |
| |
| Starting `cloud` example with ZooKeeper already running at port 2181 (with all other defaults): |
| |
| [source,bash] |
| ---- |
| bin/solr start -e cloud -z localhost:2181 -noprompt |
| ---- |
| |
| Add a node pointing to an existing ZooKeeper at port 2181: |
| |
| [source,bash] |
| ---- |
| bin/solr start -cloud -s <path to solr home for new node> -p 8987 -z localhost:2181 |
| ---- |
| |
| NOTE: When you are not using an example to start solr, make sure you upload the configuration set to ZooKeeper before creating the collection. |
| |
| === Shut Down ZooKeeper |
| |
| To shut down ZooKeeper, use the zkServer script with the "stop" command: `zkServer.sh stop`. |
| |
| == Setting up a ZooKeeper Ensemble |
| |
| With an external ZooKeeper ensemble, you need to set things up just a little more carefully as compared to the Getting Started example. |
| |
| The difference is that rather than simply starting up the servers, you need to configure them to know about and talk to each other first. So your original `zoo.cfg` file might look like this: |
| |
| [source,bash] |
| ---- |
| dataDir=/var/lib/zookeeperdata/1 |
| clientPort=2181 |
| initLimit=5 |
| syncLimit=2 |
| server.1=localhost:2888:3888 |
| server.2=localhost:2889:3889 |
| server.3=localhost:2890:3890 |
| ---- |
| |
| Here you see three new parameters: |
| |
| initLimit:: Amount of time, in ticks, to allow followers to connect and sync to a leader. In this case, you have 5 ticks, each of which is 2000 milliseconds long, so the server will wait as long as 10 seconds to connect and sync with the leader. |
| |
| syncLimit:: Amount of time, in ticks, to allow followers to sync with ZooKeeper. If followers fall too far behind a leader, they will be dropped. |
| |
| server.X:: These are the IDs and locations of all servers in the ensemble, the ports on which they communicate with each other. The server ID must additionally stored in the `<dataDir>/myid` file and be located in the `dataDir` of each ZooKeeper instance. The ID identifies each server, so in the case of this first instance, you would create the file `/var/lib/zookeeperdata/1/myid` with the content "1". |
| |
| Now, whereas with Solr you need to create entirely new directories to run multiple instances, all you need for a new ZooKeeper instance, even if it's on the same machine for testing purposes, is a new configuration file. To complete the example you'll create two more configuration files. |
| |
| The `<ZOOKEEPER_HOME>/conf/zoo2.cfg` file should have the content: |
| |
| [source,bash] |
| ---- |
| tickTime=2000 |
| dataDir=/var/lib/zookeeperdata/2 |
| clientPort=2182 |
| initLimit=5 |
| syncLimit=2 |
| server.1=localhost:2888:3888 |
| server.2=localhost:2889:3889 |
| server.3=localhost:2890:3890 |
| ---- |
| |
| You'll also need to create `<ZOOKEEPER_HOME>/conf/zoo3.cfg`: |
| |
| [source,bash] |
| ---- |
| tickTime=2000 |
| dataDir=/var/lib/zookeeperdata/3 |
| clientPort=2183 |
| initLimit=5 |
| syncLimit=2 |
| server.1=localhost:2888:3888 |
| server.2=localhost:2889:3889 |
| server.3=localhost:2890:3890 |
| ---- |
| |
| Finally, create your `myid` files in each of the `dataDir` directories so that each server knows which instance it is. The id in the `myid` file on each machine must match the "server.X" definition. So, the ZooKeeper instance (or machine) named "server.1" in the above example, must have a `myid` file containing the value "1". The `myid` file can be any integer between 1 and 255, and must match the server IDs assigned in the `zoo.cfg` file. |
| |
| To start the servers, you can simply explicitly reference the configuration files: |
| |
| [source,bash] |
| ---- |
| cd <ZOOKEEPER_HOME> |
| bin/zkServer.sh start zoo.cfg |
| bin/zkServer.sh start zoo2.cfg |
| bin/zkServer.sh start zoo3.cfg |
| ---- |
| |
| Once these servers are running, you can reference them from Solr just as you did before: |
| |
| [source,bash] |
| ---- |
| bin/solr start -e cloud -z localhost:2181,localhost:2182,localhost:2183 -noprompt |
| ---- |
| |
| == Securing the ZooKeeper Connection |
| |
| You may also want to secure the communication between ZooKeeper and Solr. |
| |
| To setup ACL protection of znodes, see <<zookeeper-access-control.adoc#zookeeper-access-control,ZooKeeper Access Control>>. |
| |
| For more information on getting the most power from your ZooKeeper installation, check out the http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html[ZooKeeper Administrator's Guide]. |