| = SolrCloud on AWS EC2 |
| :experimental: |
| // Licensed to the Apache Software Foundation (ASF) under one |
| // or more contributor license agreements. See the NOTICE file |
| // distributed with this work for additional information |
| // regarding copyright ownership. The ASF licenses this file |
| // to you under the Apache License, Version 2.0 (the |
| // "License"); you may not use this file except in compliance |
| // with the License. You may obtain a copy of the License at |
| // |
| // http://www.apache.org/licenses/LICENSE-2.0 |
| // |
| // Unless required by applicable law or agreed to in writing, |
| // software distributed under the License is distributed on an |
| // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| // KIND, either express or implied. See the License for the |
| // specific language governing permissions and limitations |
| // under the License. |
| |
| This guide is a tutorial on how to set up a multi-node SolrCloud cluster on https://aws.amazon.com/ec2[Amazon Web Services (AWS) EC2] instances for early development and design. |
| |
| This tutorial is not meant for production systems. |
| For one, it uses Solr's embedded ZooKeeper instance, and for production you should have at least 3 ZooKeeper nodes in an ensemble. |
| There are additional steps you should take for a production installation; refer to xref:deployment-guide:taking-solr-to-production.adoc[] for how to deploy Solr in production. |
| |
| In this guide we are going to: |
| |
| . Launch multiple AWS EC2 instances |
| * Create new _Security Group_ |
| * Configure instances and launch |
| . Install, configure and start Solr on newly launched EC2 instances |
| * Install system prerequisites: Java 11 or later |
| * Download latest version of Solr |
| * Start the Solr nodes in SolrCloud mode |
| . Create a collection, index documents, and query the system |
| * Create collection with multiple shards and replicas |
| * Index documents to the newly created collection |
| * Verify documents presence by querying the collection |
| |
| == Before You Start |
| To use this guide, you must have the following: |
| |
| * An https://aws.amazon.com[AWS] account. |
| * Familiarity with setting up a single-node SolrCloud on local machine. |
| Refer to the xref:solr-tutorial.adoc[] if you have never used Solr before. |
| |
| == Launch EC2 instances |
| |
| === Create new Security Group |
| |
| . Navigate to the https://console.aws.amazon.com/ec2/v2/home[AWS EC2 console] and to the region of your choice. |
| . Configure an http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html[AWS security group] which will limit access to the installation and allow our launched EC2 instances to talk to each other without restrictions. |
| .. From the EC2 Dashboard, click btn:[Security Groups] from the left-hand menu, under "Network & Security". |
| .. Click btn:[Create Security Group] under the _Security Groups_ section. |
| Give your security group a descriptive name. |
| .. You can select one of the existing https://aws.amazon.com/vpc[VPCs] or create a new one. |
| .. We need two ports open for our cloud here: |
| ... Solr port. |
| In this example we will use Solr's default port 8983. |
| ... ZooKeeper Port: We'll use Solr's embedded ZooKeeper, so we'll use the default port 9983 (see the <<Deploying with External ZooKeeper>> to configure external ZooKeeper). |
| .. Click btn:[Inbound] to set inbound network rules, then select btn:[Add Rule]. |
| Select "Custom TCP" as the type. |
| Enter `8983` for the "Port Range" and choose "My IP for the Source, then enter your public IP. |
| Create a second rule with the same type and source, but enter `9983` for the port. |
| + |
| This will limit access to your current machine. |
| If you want wider access to the instance in order to collaborate with others, you can specify that, but make sure you only allow as much access as needed. |
| A Solr instance should never be exposed to general Internet traffic. |
| .. Add another rule for SSH access. |
| Choose "SSH" as the type, and again "My IP" for the source and again enter your public IP. |
| You need SSH access on all instances to install and configure Solr. |
| .. Review the details, your group configuration should look like this: |
| + |
| image::tutorial-aws/aws-security-create.png[image,width=600,height=400] |
| .. Click btn:[Create] when finished. |
| .. We need to modify the rules so that instances that are part of the group can talk to all other instances that are part of the same group. |
| We could not do this while creating the group, so we need to edit the group after creating it to add this. |
| ... Select the newly created group in the Security Group overview table. |
| Under the "Inbound" tab, click btn:[Edit]. |
| ... Click btn:[Add rule]. |
| Choose `All TCP` from the pulldown list for the type, and enter `0-65535` for the port range. |
| Specify the name of the current Security Group as the `solr-sample`. |
| .. Review the details, your group configuration should now look like this: |
| + |
| image::tutorial-aws/aws-security-edit.png[image,width=600,height=400] |
| .. Click btn:[Save] when finished. |
| |
| === Configure Instances and Launch |
| |
| Once the security group is in place, you can choose btn:[Instances] from the left-hand navigation menu. |
| |
| Under Instances, click btn:[Launch Instance] button and follow the wizard steps: |
| |
| . Choose your Amazon Machine Image (AMI): |
| Choose *Amazon Linux AMI, SSD Volume Type* as the AMI. |
| There are both commercial AMIs and Community based AMIs available, e.g., Amazon Linux AMI (HVM), SSD Volume Type, but this is a nice AMI to use for our purposes. |
| Click btn:[Select] next to the image you choose. |
| . The next screen asks you to choose the instance type, *t2.medium* is sufficient. |
| Choose it from the list, then click btn:[Configure Instance Details]. |
| . Configure the instance. |
| Enter *2* in the "Number of instances" field. |
| Make sure the setting for "Auto-assign Public IP" is "Enabled". |
| . When finished, click btn:[Add Storage]. |
| The default of *8 GB* for size and *General Purpose SSD* for the volume type is sufficient for running this quick start. |
| Optionally select "Delete on termination" if you know you won't need the data stored in Solr indexes after you terminate the instances. |
| . When finished, click btn:[Add Tags]. |
| You do not have to add any tags for this quick start, but you can add them if you want. |
| . Click btn:[Configure Security Group]. |
| Choose *Select an existing security group* and select the security group you created earlier: `solr-sample`. |
| You should see the expected inbound rules at the bottom of the page. |
| . Click btn:[Review]. |
| . If everything looks correct, click btn:[Launch]. |
| . Select an existing “private key file” or create a new one and download to your local machine so you will be able to login into the instances via SSH. |
| + |
| image::tutorial-aws/aws-key.png[image,width=600,height=400] |
| . On the instances list, you can watch the states change. |
| You cannot use the instances until they become *“running”*. |
| |
| |
| == Install, Configure and Start |
| |
| . Locate the Public DNS record for the instance by selecting the instance from the list of instances, and log on to each machine one by one. |
| + |
| Using SSH, if your AWS identity key file is `aws-key.pem` and the AMI uses `ec2-user` as login user, on each AWS instance, do the following: |
| + |
| [,console] |
| ---- |
| $ ssh-add aws-key.pem |
| $ ssh -A ec2-user@<instance-public-dns> |
| ---- |
| + |
| . While logged in to each of the AWS EC2 instances, configure Java 11 and download Solr: |
| + |
| [,console] |
| ---- |
| # check if the AWS instance already has java installed |
| $ java -version |
| |
| # install JDK 11 |
| $ sudo yum install java-11 |
| |
| # configure JDK 11 as the default |
| $ sudo /usr/sbin/alternatives --config java |
| |
| # verify that the default java version is now 11 |
| $ java -version |
| ---- |
| + |
| [,console,subs="attributes"] |
| ---- |
| # download desired version of Solr |
| $ wget https://archive.apache.org/dist/solr/solr/{solr-full-version}/solr-{solr-full-version}.tgz |
| |
| # untar the archive |
| $ tar -zxvf solr-{solr-full-version}.tgz |
| |
| # configure SOLR_HOME env variable |
| $ export SOLR_HOME=$PWD/solr-{solr-full-version} |
| |
| # also add the env variable to .bashrc |
| $ vim ~/.bashrc |
| export SOLR_HOME=/home/ec2-user/solr-{solr-full-version} |
| ---- |
| |
| . Resolve the Public DNS to simpler hostnames. |
| + |
| Let’s assume the public DNS hostnames and IPv4 addresses of EC2 instances are as follows: |
| |
| * ec2-101-1-2-3.us-east-2.compute.amazonaws.com: 101.1.2.3 (public), 172.16.2.3 (private) |
| * ec2-101-4-5-6.us-east-2.compute.amazonaws.com: 101.4.5.6 (public), 172.16.5.6 (private) |
| + |
| Edit `/etc/hosts` on each of the instances, and add the following entries: |
| + |
| [,console] |
| ---- |
| $ sudo vim /etc/hosts |
| 172.16.2.3 solr-node-1 |
| 172.16.5.6 solr-node-2 |
| ---- |
| |
| . Configure Solr in running EC2 instances. |
| + |
| In this case, one of the machines will host ZooKeeper embedded along with Solr node, say, `ec2-101-1-2-3.us-east-2.compute.amazonaws.com` (aka, `solr-node-1`). |
| + |
| See <<Deploying with External ZooKeeper>> for configuring external ZooKeeper. |
| + |
| On both machines, edit the `solr.in.sh` script and configure the environment variables that allow |
| Solr and embedded ZooKeeper to listen on all network interfaces, and not just on 127.0.0.1 |
| + |
| [,console] |
| ---- |
| $ cd $SOLR_HOME |
| |
| # uncomment and edit the two variables |
| # vim bin/solr.in.sh |
| SOLR_JETTY_HOST="0.0.0.0" |
| SOLR_ZK_EMBEDDED_HOST="0.0.0.0" |
| ---- |
| + |
| See xref:deployment-guide:securing-solr.adoc#network-configuration[Network Configuration] for more details. |
| + |
| Inside the `ec2-101-1-2-3.us-east-2.compute.amazonaws.com` (`solr-node-1`) |
| + |
| [,console] |
| ---- |
| $ cd $SOLR_HOME |
| |
| # start Solr node on 8983 and ZooKeeper will start on 9983 (8983+1000) |
| $ bin/solr start -c -p 8983 -h solr-node-1 |
| ---- |
| + |
| On the other node, `ec2-101-4-5-6.us-east-2.compute.amazonaws.com` (`solr-node-2`) |
| + |
| [,console] |
| ---- |
| $ cd $SOLR_HOME |
| |
| # start Solr node on 8983 and connect to ZooKeeper running on first node |
| $ bin/solr start -c -p 8983 -h solr-node-2 -z solr-node-1:9983 |
| ---- |
| |
| . Inspect and Verify. |
| + |
| Inspect the Solr nodes state from browser on local machine. |
| Go to: |
| + |
| [source,bash] |
| ---- |
| http://ec2-101-1-2-3.us-east-2.compute.amazonaws.com:8983/solr |
| |
| http://ec2-101-4-5-6.us-east-2.compute.amazonaws.com:8983/solr |
| ---- |
| + |
| You should be able to see Solr UI dashboard for both nodes. |
| |
| == Create Collection, Index and Query |
| |
| You can refer to the xref:solr-tutorial.adoc[] for an extensive walkthrough on creating collections with multiple shards and replicas, indexing data via different methods and querying documents accordingly. |
| |
| == Deploying with External ZooKeeper |
| |
| If you want to configure an external ZooKeeper ensemble to avoid using the embedded single-instance ZooKeeper that runs in the same JVM as the Solr node, you need to make few tweaks in the above listed steps as follows. |
| |
| * When creating the security group, instead of opening port `9983` for ZooKeeper, you'll open `2181` (or whatever port you are using for ZooKeeper: its default is 2181). |
| * When configuring the number of instances to launch, choose to open 3 instances instead of 2. |
| * When modifying the `/etc/hosts` on each machine, add a third line for the 3rd instance and give it a recognizable name: |
| + |
| [source,text,subs="verbatim"] |
| $ sudo vim /etc/hosts |
| 172.16.2.3 solr-node-1 |
| 172.16.5.6 solr-node-2 |
| 172.16.8.9 zookeeper-node |
| |
| * You'll need to install ZooKeeper manually, described in the next section. |
| |
| === Install ZooKeeper |
| |
| These steps will help you install and configure a single instance of ZooKeeper on AWS. |
| This is not sufficient for a production, use, however, where a ZooKeeper ensemble of at least three nodes is recommended. |
| See the section xref:deployment-guide:zookeeper-ensemble.adoc[] for information about how to change this single-instance into an ensemble. |
| |
| . Download a stable version of ZooKeeper. |
| In this example we're using ZooKeeper v{dep-version-zookeeper}. |
| On the node you're using to host ZooKeeper (`zookeeper-node`), download the package and untar it: |
| + |
| [,console,subs="attributes"] |
| ---- |
| # download stable version of ZooKeeper |
| $ wget https://archive.apache.org/dist/zookeeper/zookeeper-{dep-version-zookeeper}/apache-zookeeper-{dep-version-zookeeper}-bin.tar.gz |
| |
| # untar the archive |
| $ tar -zxvf apache-zookeeper-{dep-version-zookeeper}-bin.tar.gz |
| ---- |
| + |
| Add an environment variable for ZooKeeper's home directory (`ZOO_HOME`) to the `.bashrc` for the user who will be running the process. |
| The rest of the instructions assume you have set this variable. |
| Correct the path to the ZooKeeper installation as appropriate if where you put it does not match the below. |
| + |
| [source,bash,subs="attributes"] |
| ---- |
| $ export ZOO_HOME=$PWD/apache-zookeeper-{dep-version-zookeeper}-bin |
| |
| # put the env variable in .bashrc |
| # vim ~/.bashrc |
| export ZOO_HOME=/home/ec2-user/apache-zookeeper-{dep-version-zookeeper}-bin |
| ---- |
| . Change directories to `ZOO_HOME`, and create the ZooKeeper configuration by using the template provided by ZooKeeper. |
| + |
| [,console] |
| ---- |
| $ cd $ZOO_HOME |
| |
| # create ZooKeeper config by using zoo_sample.cfg |
| $ cp conf/zoo_sample.cfg conf/zoo.cfg |
| ---- |
| . Create the ZooKeeper data directory in the filesystem, and edit the `zoo.cfg` file to uncomment the autopurge parameters and define the location of the data directory. |
| + |
| [source,bash] |
| ---- |
| # create data dir for ZooKeeper, edit zoo.cfg, uncomment autopurge parameters |
| $ mkdir data |
| $ vim conf/zoo.cfg |
| |
| # -- uncomment -- |
| autopurge.snapRetainCount=3 |
| autopurge.purgeInterval=1 |
| |
| # -- edit -- |
| dataDir=data |
| |
| # -- add -- |
| 4lw.commands.whitelist=mntr,conf,ruok |
| ---- |
| . Start ZooKeeper. |
| + |
| [,console] |
| ---- |
| $ cd $ZOO_HOME |
| |
| # start ZooKeeper, default port: 2181 |
| $ bin/zkServer.sh start |
| ---- |
| |
| . On the first node being used for Solr (`solr-node-1`), start Solr and tell it where to find ZooKeeper. |
| + |
| [,console] |
| ---- |
| $ cd $SOLR_HOME |
| |
| # start Solr node on 8983 and connect to ZooKeeper running on ZooKeeper node |
| $ bin/solr start -c -p 8983 -h solr-node-1 -z zookeeper-node:2181 |
| ---- |
| + |
| . On the second Solr node (`solr-node-2`), again start Solr and tell it where to find ZooKeeper. |
| + |
| [,console] |
| ---- |
| $ cd $SOLR_HOME |
| |
| # start Solr node on 8983 and connect to ZooKeeper running on ZooKeeper node |
| $ bin/solr start -c -p 8983 -h solr-node-2 -z zookeeper-node:2181 |
| ---- |
| |
| [TIP] |
| ==== |
| As noted earlier, a single ZooKeeper node is not sufficient for a production installation. |
| See these additional resources for more information about deploying Solr in production, which can be used once you have the EC2 instances up and running: |
| |
| * xref:deployment-guide:taking-solr-to-production.adoc[] |
| * xref:deployment-guide:zookeeper-ensemble.adoc[] |
| ==== |