tree: 3dbbc74e4133fa21d25a1938679c5b4e9984a8f7 [path history] [tgz]
  1. aws/
  2. azure/
  3. modules/
  4. shared_state/
  5. .gitignore
  6. QUICKSTART.md
  7. README.md
contrib/terraform-testing-infrastructure/README.md

Accumulo Testing Infrastructure

Description

This Git repository contains several Terraform configurations.

  • shared_state creates Terraform state storage in either Azure or AWS, which is a prerequisite for the Terraform configurations in aws or azure.
    • shared_state/aws creates an AWS S3 Bucket and DynamoDB table that are a prerequisite for the Terraform configuration in aws.
    • shared_state/azure creates an Azure resource group and storage account that are a prerequisite for the Terraform configuration in azure.
  • aws creates the following AWS resources:
    1. Creates one or more EC2 nodes for running the different components. Currently, the configuration uses the m5.2xlarge instance type which provides 8 vCPUs, 32GB RAM, and an EBS backed root volume.
    2. Runs commands on the EC2 nodes after they are started (5 minutes according to the docs) to install software and configure them.
    3. Creates DNS A records for the EC2 nodes.
  • azure creates the following Azure resources:
    1. Creates a resource group to hold all of the created resources.
    2. Creates networking resources (vnet, subnet, network security group).
    3. Creates two or more Azure VMs (along with associated NICs and public IP addresses) for running the different components. The default configuration creates D8s v4 VMs, providing 8 vCPUs and 32GiB RAM with an Azure storage backed OS drive.
    4. Runs commands on the VMs after cloud-init provisioning is complete in order to install and configure Hadoop, Zookeeper, Accumulo, and the Accumulo Testing repository.

Prerequisites

You will need to download and install the correct Terraform CLI for your platform. Put the terraform binary on your PATH. You can optionally install Terraform Docs if you want to be able to generate documentation or an example variables file for either the shared state or aws or azure configurations.

Shared State

The shared_state directory contains Terraform configurations for creating either an AWS S3 Bucket or DynamoDB table, or an Azure resource group, storage account, and container. These objects only need to be created once and are used for sharing the Terraform state with a team. To read more about this see remote state. The AWS shared state instructions are based on this article.

To generate the storage, run terraform init followed by terraform apply. Note that the shell working directory must be the shared_state/aws or shared_state/azure directory when you run the terraform commands for shared state creation.

The default AWS configuration generates the S3 bucket name when terraform apply is run. This ensures that a globally unique S3 bucket name is used. It is not required to set any variables for the shared state. However, if you wish to override any variable values, this can be done by creating an aws.auto.tfvars file in the shared_state/aws directory. For example:

cd shared_state/aws
cat > aws.auto.tfvars << EOF
bucket_force_destroy = true
EOF

Assuming the bucket variable is not overridden, the generated S3 bucket name will appear in the terraform apply output, like the following example:

Outputs:

bucket_name = "terraform-20220209131315353700000001"

This value should be supplied to terraform init in the aws directory as described below. Using the example above, the init command for the aws directory would be:

terraform init -backend-config=bucket=terraform-20220209131315353700000001

If you change any of the backend storage configuration parameters over their defaults, you will need to override them when you initialize terraform for the aws or azure configuration below. For example, if you change the region where the S3 bucket is deployed from us-east-1 to us-west-2, then you would need to run terraform init in the aws directory (not the shared_state initialization, but the main aws directory initialization) with:

terraform init -backend-config=region=us-west-2

The following backend configuration can be overridden from with -backend-config=<name>=<value> options to terraform init. This prevents the need to modify the backend sections in aws/main.tf or azure/main.tf.

For AWS:

  • -backend-config=bucket=<bucket_name>: Override the S3 bucket name
  • -backend-config=key=<key_name>: Override the key in the S3 bucket
  • -backend-config=region=<region>: Override AWS region
  • -backend-config=dynamodb_table=<dynamodb_table_name>: Override the DynamoDB table name

For Azure:

  • -backend-config=resource_group_name=<resource_group_name>: Override the resource group where the storage account is located
  • -backend-config=storage_account_name=<storage_account_name>: Override the name of the Azure storage account holding Terraform state
  • -backend-config=container_name=<container_name>: Override the name of the container within the storage account that is holding Terraform state
  • -backend-config=key=<blob_name>: Override the name of the blob within the container that will be used to hold Terraform state

Test Cluster

The aws and azure directories contain Terraform configurations for creating an Accumulo cluster on AWS or Azure respectively. The aws and azure directories contain the following Terraform configuration items:

  • main.tf - The Terraform configuration file
  • variables.tf - The declaration and default values for Terraform variables These configurations both use shared Terraform module and configuration files that can be found in the following directories/files:
  • modules/ - This contains several shared Terraform modules that are used by the aws and azure Terraform configurations
    • cloud-init-config - contains templates to generate a Cloud Init configuration to configure AWS instances or Azure VMs with necessary Linux packages, user accounts, etc.
    • config-files - contains template configuration files for various components of the cluster (e.g., HDFS, Accumulo, Grafana, etc.) as well as helper scripts to install the software components that cannot be installed via cloud-init.
    • upload-software - if pre-built binaries for downloaded software components (Hadoop, Accumulo, Zookeeper, Maven) are included, this module uploads them to the cluster
    • configure-nodes - this module is responsible for executing scripts on the cluster to install and configure software, initialize the HDFS filesystem and Accumulo cluster, and start them.
  • conf/ - a non-git tracked directory that contains rendered template files with variables replaced by selected runtime configuration. These files are uploaded to the cluster.

AWS Variables

The table below lists the variables and their default values that are used in the aws configuration.

NameDescriptionTypeDefaultRequired
accumulo_branch_nameThe name of the branch to build and installstring"main"no
accumulo_dirThe Accumulo directory on each EC2 nodestring"/data/accumulo"no
accumulo_instance_nameThe accumulo instance name.string"accumulo-testing"no
accumulo_repoURL of the Accumulo git repostring"https://github.com/apache/accumulo.git"no
accumulo_root_passwordThe password for the accumulo root user. A randomly generated password will be used if none is specified here.stringnullno
accumulo_testing_branch_nameThe name of the branch to build and installstring"main"no
accumulo_testing_repoURL of the Accumulo Testing git repostring"https://github.com/apache/accumulo-testing.git"no
accumulo_versionThe branch of Accumulo to download and installstring"2.1.0-SNAPSHOT"no
ami_name_patternThe pattern of the name of the AMI to useanyn/ayes
ami_ownerThe id of the AMI owneranyn/ayes
authorized_ssh_key_filesList of SSH public key files for the developers that will log into the clusterlist(string)[]no
authorized_ssh_keysList of SSH keys for the developers that will log into the clusterlist(string)n/ayes
cloudinit_merge_typeDescribes the merge behavior for overlapping config blocks in cloud-init.stringnullno
create_route53_recordsIndicates whether or not route53 records will be createdboolfalseno
hadoop_dirThe Hadoop directory on each EC2 nodestring"/data/hadoop"no
hadoop_versionThe version of Hadoop to download and installstring"3.3.4"no
instance_countThe number of EC2 instances to createstring"2"no
instance_typeThe type of EC2 instances to createstring"m5.2xlarge"no
local_sources_dirDirectory on local machine that contains Maven, ZooKeeper or Hadoop binary distributions or Accumulo source tarballstring""no
maven_versionThe version of Maven to download and installstring"3.8.8"no
optional_cloudinit_configAn optional config block for the cloud-init script. If you set this, you should consider setting cloudinit_merge_type to handle merging with the default script as you need.stringnullno
private_networkIndicates whether or not the user is on a private network and access to hosts should be through the private IP addresses rather than public ones.boolfalseno
root_volume_gbThe size, in GB, of the EC2 instance root volumestring"300"no
route53_zoneThe name of the Route53 zone in which to create DNS addressesanyn/ayes
security_groupThe Security Group to use when creating AWS objectsanyn/ayes
software_rootThe full directory root where software will be installedstring"/opt/accumulo-testing"no
us_east_1b_subnetThe AWS subnet id for the us-east-1b subnetanyn/ayes
us_east_1e_subnetThe AWS subnet id for the us-east-1e subnetanyn/ayes
zookeeper_dirThe ZooKeeper directory on each EC2 nodestring"/data/zookeeper"no
zookeeper_versionThe version of ZooKeeper to download and installstring"3.8.0"no

The following outputs are returned by the aws Terraform configuration.

NameDescription
accumulo_root_passwordThe supplied, or automatically generated Accumulo root user password.
manager_ipThe IP address of the manager instance.
worker_ipsThe IP addresses of the worker instances.

Azure Variables

The table below lists the variables and their default values that are used in the azure configuration.

NameDescriptionTypeDefaultRequired
accumulo_branch_nameThe name of the branch to build and installstring"main"no
accumulo_dirThe Accumulo directory on each nodestring"/data/accumulo"no
accumulo_instance_nameThe accumulo instance name.string"accumulo-testing"no
accumulo_repoURL of the Accumulo git repostring"https://github.com/apache/accumulo.git"no
accumulo_root_passwordThe password for the accumulo root user. A randomly generated password will be used if none is specified here.stringnullno
accumulo_testing_branch_nameThe name of the branch to build and installstring"main"no
accumulo_testing_repoURL of the Accumulo Testing git repostring"https://github.com/apache/accumulo-testing.git"no
accumulo_versionThe branch of Accumulo to download and installstring"2.1.0-SNAPSHOT"no
admin_usernameThe username of the admin user, that can be authenticated with the first public ssh key.string"azureuser"no
authorized_ssh_key_filesList of SSH public key files for the developers that will log into the clusterlist(string)[]no
authorized_ssh_keysList of SSH keys for the developers that will log into the clusterlist(string)n/ayes
cloudinit_merge_typeDescribes the merge behavior for overlapping config blocks in cloud-init.stringnullno
create_resource_groupIndicates whether or not resource_group_name should be created or is an existing resource group.booltrueno
hadoop_dirThe Hadoop directory on each nodestring"/data/hadoop"no
hadoop_versionThe version of Hadoop to download and installstring"3.3.4"no
local_sources_dirDirectory on local machine that contains Maven, ZooKeeper or Hadoop binary distributions or Accumulo source tarballstring""no
locationThe Azure region where resources are to be created. If an existing resource group is specified, this value is ignored and the resource group's location is used.stringn/ayes
managed_disk_configurationOptional managed disk configuration. If supplied, the managed disks on each VM will be combined into an LVM volume mounted at the named mount point.object({
mount_point = string
disk_count = number
storage_account_type = string
disk_size_gb = number
})
nullno
maven_versionThe version of Maven to download and installstring"3.8.8"no
network_address_spaceThe network address space to use for the virtual network.list(string)[
“10.0.0.0/16”
]
no
optional_cloudinit_configAn optional config block for the cloud-init script. If you set this, you should consider setting cloudinit_merge_type to handle merging with the default script as you need.stringnullno
os_disk_cachingThe type of caching to use for the OS disk. Possible values are None, ReadOnly, and ReadWrite.string"ReadOnly"no
os_disk_size_gbThe size, in GB, of the OS disknumber300no
os_disk_typeThe disk type to use for OS disks. Possible values are Standard_LRS, StandardSSD_LRS, and Premium_LRS.string"Standard_LRS"no
resource_group_nameThe name of the resource group to create or reuse. If not specified, the name is generated based on resource_name_prefix.string""no
resource_name_prefixA prefix applied to all resource names created by this template.string"accumulo-testing"no
software_rootThe full directory root where software will be installedstring"/opt/accumulo-testing"no
subnet_address_prefixesThe subnet address prefixes to use for the accumulo testing subnet.list(string)[
“10.0.2.0/24”
]
no
vm_imagen/aobject({
publisher = string
offer = string
sku = string
version = string
})
{
“offer”: “0001-com-ubuntu-server-focal”,
“publisher”: “Canonical”,
“sku”: “20_04-lts-gen2”,
“version”: “latest”
}
no
vm_skuThe SKU of Azure VMs to createstring"Standard_D8s_v4"no
worker_countThe number of worker VMs to createnumber1no
zookeeper_dirThe ZooKeeper directory on each nodestring"/data/zookeeper"no
zookeeper_versionThe version of ZooKeeper to download and installstring"3.8.0"no

The following outputs are returned by the azure Terraform configuration.

NameDescription
accumulo_root_passwordThe user-supplied or automatically generated Accumulo root user password.
manager_ipThe public IP address of the manager VM.
worker_ipsThe public IP addresses of the worker VMs.

Configuration

When using either the aws or azure configuration, you will need to supply values for required variables that have no default value. There are several ways to do this. If you installed Terraform Docs, it can generate the file for you. You can then edit the generated file to configure values as desired:

CLOUD=<enter either aws or azure>
cd $CLOUD
terraform-docs tfvars hcl . > ${CLOUD}.auto.tfvars
# If you prefer JSON over HCL, then the command would be
# terraform-docs tfvars json . > ${CLOUD}.auto.tfvars.json

Note that these generated variable files will include values for all variables, where those with defaults will be set to their default value. You can also refer to the tables above and simply add the values that are required (and have no default, or a default that you wish to change). Below is an example JSON file containing configuration for aws. This content can be customized and placed in the aws directory in a file whose name ends with .auto.tfvars.json. Any variable files whose name ends in .auto.tfvars or .auto.tfvars.json are automatically included when terraform commands are executed.

{
  "security_group": "sg-ABCDEF001",
  "route53_zone": "some.domain.com",
  "us_east_1b_subnet": "subnet-ABCDEF123",
  "us_east_1e_subnet": "subnet-ABCDEF124",
  "ami_owner": "000000000001",
  "ami_name_pattern": "MY_AMI_*",
  "authorized_ssh_keys": [
    "ssh-rsa dev_key_1",
    "ssh-rsa dev_key_2"
  ]
}

Cloud-Init Customization

The cloud-init template can be found in cloud-init.tftpl. If you need to customize this configuration, one method is to use the Terraform variable optional_cloudinit_config to supply your own additional configuration. For example, some CentOS 7 images are out of date, and will need software packages to be updated before the rest of the software download/install will work. This can be accomplished by adding the following to your .auto.tfvars file:

optional_cloudinit_config = <<-EOT
  package_upgrade: true
EOT

You can add any other cloud-init configuration that you wish here. One factor to consider here is the cloud-init merging behavior with sections in the default template. The merging behavior can be controlled by setting the cloudinit_merge_type variable to your desired merge algorithm. The default is set to dict(recurse_array,no_replace)+list(append) which will attempt to keep all lists from the default configuration, rather than new ones overwriting them.

Another factor to consider is the size of the generated cloud-init template. Cloud providers place a limit on the size of this file. AWS limits this content to 16KB, before Base64 encoding, and Azure limits it to 64KB after Base64 encoding.

AWS Resources

This Terraform configuration creates:

  1. ${instance_count} EC2 nodes of ${instance_type} with the latest AMI matching ${ami_name_pattern} from the ${ami_owner}. Each EC2 node will have a ${root_volume_gb}GB root volume. The EFS filesystem is NFS mounted to each node at ${software_root}.
  2. DNS entries in Route53 for each EC2 node.

Software Layout

This Terraform configuration:

  1. Downloads, if necessary, the Apache Maven ${maven_version} binary tarball to ${software_root}/sources, then untars it to ${software_root}/apache-maven/apache-maven-${maven_version}
  2. Downloads, if necessary, the Apache Zookeeper ${zookeer_version} binary tarball to ${software_root}/sources, then untars it to ${software_root}/zookeeper/apache-zookeeper-${zookeeper_version}-bin
  3. Downloads, if necessary, the Apache Hadoop ${hadoop_version} binary tarball to ${software_root}/sources, then untars it to ${software_root}/hadoop/hadoop-${hadoop_version}
  4. Clones, if necessary, the Apache Accumulo Git repo from ${accumulo_repo} into ${software_root}/sources/accumulo-repo. It switches to the ${accumulo_branch_name} branch and builds the software using Maven, then untars the binary tarball to ${software_root}/accumulo/accumulo-${accumulo_version}
  5. Downloads the OpenTelemetry Java Agent jar file and copies it to ${software_root}/accumulo/accumulo-${accumulo_version}/lib/opentelemetry-javaagent-1.32.0.jar
  6. Copies the Accumulo test jar to ${software_root}/accumulo/accumulo-${accumulo_version}/lib so that org.apache.accumulo.test.metrics.TestStatsDRegistryFactory is on the classpath
  7. Downloads the Micrometer StatsD Registry jar file and copies it to ${software_root}/accumulo/accumulo-${accumulo_version}/lib/micrometer-registry-statsd-1.12.1.jar
  8. Clones, if necessary, the Apache Accumulo Testing Git repo from ${accumulo_testing_repo} into ${software_root}/sources/accumulo-testing-repo. It switches to the ${accumulo_testing_branch_name} branch and builds the software using Maven.

Supplying your own software

If you want to supply your own Apache Maven, Apache ZooKeeper, Apache Hadoop, Apache Accumulo, or Apache Accumulo Testing binary tar files, then you can put them into a directory on your local machine and set the ${local_sources_dir} variable to the full path to the directory. These files will be uploaded to ${software_root}/sources and the installation script will use them instead of downloading them. If the version of the supplied binary tarball is different than the default version, then you will also need to override that property. Supplying your own binary tarballs does speed up the deployment. However, if you provide the Apache Accumulo binary tarball, then it will be harder to update the software on the cluster.

NOTE: If you supply your own binary tarball of Accumulo, then you will need to copy the accumulo-test-${accumulo_version}.jar file to the lib directory manually as it's not part of the binary tarball.

Updating Apache Accumulo on the cluster

If you did not provide a binary tarball, then you can update the software running on the cluster by doing the following and then restarting Accumulo:

cd ${software_root}/sources/accumulo-repo
git pull
mvn clean package -DskipTests -DskipITs
# Backup the Accumulo configs
mkdir -p ~/accumulo-config-backup
cp ${software_root}/accumulo/accumulo-${accumulo_version}/conf/* ~/accumulo-config-backup/.
# Lay down the updated Accumulo distribution
tar zxf assemble/target/accumulo-${accumulo_version}-bin.tar.gz -C ${software_root}/accumulo
# Restore the Accumulo configs
cp ~/accumulo-config-backup/* ${software_root}/accumulo/accumulo-${accumulo_version}/conf/.
# Sync the Accumulo changes with the worker nodes
pdsh -R exec -g worker rsync -az ${software_root}/accumulo/ %h:${software_root}/accumulo/

Updating Apache Accumulo Testing on the cluster

If you did not provide a binary tarball, then you can update the software running on the cluster by doing the following:

cd ${software_root}/sources/accumulo-testing-repo
git pull
mvn clean
./bin/build

Accumulo Testing builds a shaded jar. The build script above determines the versions of ZK and Accumulo on your system and places those in the shaded jar. The build script will not rebuild the shaded jar, so mvn clean must be run before build. If using an unreleased version of Accumulo you must ensure its jars are in the local maven repo before rebuilding Accumulo Testing.

Deployment Overview

The first node that is created is called the manager, the others are worker nodes. The following components will run on the manager node:

  • Apache ZooKeeper
  • Apache Hadoop NameNode
  • Apache Hadoop Yarn ResourceManager
  • Apache Accumulo Manager
  • Apache Accumulo Monitor
  • Apache Accumulo GarbageCollector
  • Apache Accumulo CompactionCoordinator
  • Docker
  • Jaeger Tracing Docker Container
  • Telegraf/InfluxDB/Grafana Docker Container

The following components will run on the worker nodes:

  • Apache Hadoop DataNode
  • Apache Hadoop Yarn NodeManager
  • Apache Accumulo TabletServer
  • Apache Accumulo Compactor(s)
  • Apache Accumulo Scan Server(s)

Logs

The logs for each service (zookeeper, hadoop, accumulo) are located in their respective local directory on each node (/data/${service}/logs unless you changed the properties).

DNS entries

The aws Terraform configuration creates DNS entries of the following form:

<node_name>-<branch_name>-<workspace_name>.${route53_zone}

For example:

  • manager-main-default.${route53_zone}
  • worker#-main-default.${route53_zone} (where # is 0, 1, 2, ...)

The azure configuration does not current create public DNS entries for the nodes, and it is recommended that the public IP addresses be used instead.

Instructions

  1. Change to either the aws or azure directory in your shell. This must be the current directory when you run the following terraform commands.
  2. Once you have created a .auto.tfvars file, or set the properties some other way, run terraform init. If you have modified shared_state backend configuration over the default, you can override the values here. For example, the following configuration updates the resource_group_name and storage_account_name for the azurerm backend:
    terraform init -backend-config=resource_group_name=my-tfstate-resource-group -backend-config=storage_account_name=mystorageaccountname
    
    Once values are supplied to terraform init, they are stored in the local state and it is not necessary to supply these overrides to the terraform apply or terraform destroy commands.
  3. Ensure that the private key associated with the first public SSH key listed for the value of either authorized_ssh_keys or authorized_ssh_key_files in your .auto.tfvars file is loaded into your SSH agent. During resource creation, Terraform will connect to the newly created VMs using SSH in order copy files and configure the VMs to run Accumulo. If the appropriate private key is not available to your SSH agent, then the connection will fail and resource creation will eventually fail.
  4. Run terraform apply to create the AWS/Azure resources.
  5. Run terraform destroy to tear down the AWS/Azure resources.

NOTE: If you are working with aws and get an Access Denied error then try setting the AWS Short Term access keys in your environment

Accessing Web Pages

For an aws cluster, you can access the software configuration/management web pages here:

  • Hadoop NameNode: http://manager-main-default.${route53_zone}:9870
  • Yarn ResourceManager: http://manager-main-default.${route53_zone}:8088
  • Hadoop DataNode: http://worker#-main-default.${route53_zone}:9864
  • Yarn NodeManager: http://worker#-main-default.${route53_zone}:8042
  • Accumulo Monitor: http://manager-main-default.${route53_zone}:9995
  • Jaeger Tracing UI: http://manager-main-default.${route53_zone}:16686
  • Grafana: http://manager-main-default.${route53_zone}:3003

The azure cluster creates a network security group that limits public access to port 22 (SSH). Therefore, to access configuration/management web pages, you should create a SOCKS proxy and use a browser plugin such as FoxyProxy Standard to point the browser to the SOCKS proxy. Create the proxy with

ssh -C2qTnNf -D 9876 hadoop@<manager-public-ip-address>

Configure FoxyProxy (or your browser directly) to connect to the proxy on localhost port 9876 (change the port specified in the -D option above to use a different proxy port). If you configure FoxyProxy with a SOCKS 5 proxy to match the URL regex patterns https?://manager:.* and https?://worker[0-9]+:.*, then you can leave FoxyProxy set to “Use proxies based on their pre-defined patterns and priorities” and access the web pages through the proxy while other web pages will not use the proxy.

Accessing the cluster nodes

The cloud-init configuration applied to each AWS instance or Azure VM creates a hadoop user. Any public SSH keys specified in the Terraform configuration variable authorized_ssh_keys (or public key file named in authorized_ssh_key_files) will be included in the cloud-init template as an authorized key for the hadoop user.

If you wish to use your default ssh key, typically stored in ~/.ssh/id_rsa.pub, you would add the following to your HCL .auto.tfvars file:

authorized_ssh_key_files = [ "~/.ssh/id_rsa.pub" ]

Then, when the cluster is created, you can log in to a node with ssh hadoop@<node-public-ip-address>.

SSH'ing to other nodes

The /etc/hosts file on each node has been updated with the names (manager, worker0, worker1, etc.) and IP addresses of the nodes. pdsh has been installed and /etc/genders has been configured. You should be able to ssh to any node as the hadoop user without a password. Likewise, you should be able to pdsh commands to groups of nodes as the hadoop user. The pdsh genders group manager specifies the manager node, and the worker group will specify all worker nodes.

Shutdown / Startup Instructions

Once the cluster is created you can simply stop or start the nodes from the AWS console or Azure portal. Terraform is just for creating, updating, or destroying the resources. ZooKeeper and Hadoop are setup to use SystemD service files, but Accumulo is not. You could log into the manager node and run accumulo-cluster stop before stopping the nodes. Or, you could just shut them down and force Accumulo to recover (which might be good for testing). When restarting the nodes from the AWS Console/Azure Portal, ZooKeeper and Hadoop should start on their own. For Accumulo, you should only need to run accumulo-cluster start on the manager node.