Automated Deployment Guide

This guide provides detailed instructions for automated deployment of Apache Ambari and Hadoop ecosystem components using both Docker and bare metal/VM approaches.

Overview

The automated deployment script supports two deployment modes:

Docker Deployment: Quick one-click deployment of a Hadoop cluster for testing, development environment, or demo purposes.
- Fastest way to get a working cluster
- Minimal configuration required
- Perfect for development and testing
- Automatic container orchestration
- Easy cleanup and redeployment
Bare Metal/VM Deployment: Production-ready automated deployment that supports both one-click installation and advanced customization options, such as adjusting cluster topology, component placement, external database integration, and data directory configuration.

Prerequisites

For Bare Metal/VM Deployment

YUM Repository Configuration
- Ensure all machines have properly configured YUM repositories (base and dev repositories must be available)
- Install Ansible dependencies by running:
```
sh deploy_py/shell/utils/setup-env-centos.sh false
```

SSH Configuration (Rocky 9)

Modify sshd configuration on all nodes:

vi /etc/ssh/sshd_config
# Change these settings:
PasswordAuthentication yes  # Make sure there's only one instance of this setting
PermitRootLogin yes        # Ensure this exists and is uncommented

# Restart sshd
systemctl restart sshd.service

# Test SSH connectivity
ssh root@<hostname>  # Verify password login works

Firewall and SELinux Configuration

For RHEL 7, Rocky Linux 8/9:

# Disable firewall
systemctl stop firewalld
systemctl disable firewalld
systemctl status firewalld

# Disable SELinux temporarily
setenforce 0

# Disable SELinux permanently (requires reboot)
sed -i 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config

Verify status:

# Check firewall status
systemctl status firewalld

# Check SELinux status
getenforce
# or
sestatus

Hostname Configuration
- Ensure all machines have unique and properly formatted hostnames
- Edit /etc/hostname and reboot if changes are made:
```
vi /etc/hostname
# Edit hostname
reboot
```

For Docker Deployment

Docker deployment uses Apache Bigtop puppet containers. This is the quickest way to get a working cluster up and running.

Update base images with dependencies (one-time setup):

cd deploy_py/shell/utils/
# For Rocky 8
chmod +x ./update_bigtop_image.sh && ./update_bigtop_image.sh rocky8

# For Rocky 9
chmod +x ./update_bigtop_image.sh && ./update_bigtop_image.sh rocky9

Quick Start Configuration

For Docker deployment, you only need to modify a few essential parameters in base_conf.yml:

# Local package directory path (where you placed the downloaded packages)
repo_pkgs_dir: "/data1/apache/ambari-3.0_pkgs"

# Components you want to install
components_to_install: ["hbase","hdfs","yarn","hive","zookeeper","ambari"]

# Docker resource allocation
docker_options:
  instance_num: 4  # Number of containers
  memory_limit: "16g"  # Memory per container
  components_port_map:
    AMBARI_SERVER: 8083  # Port for accessing Ambari UI

Other configurations can be left as default for initial testing, and can be customized later based on your needs.

Deploy
```
source setup_pypath.sh
python3 deploy_py/main.py -docker-deploy
```
That's it! The script will automatically:
- Create and configure containers
- Set up networking
- Install selected components
- Configure the cluster
You can monitor the deployment progress at http://localhost:8083 (or whatever port you configured for AMBARI_SERVER).

Installation Package Setup

Download Ambari and Hadoop ecosystem packages from: https://ambari.apache.org/docs/3.0.0/quick-start/download
Place all packages in a fixed directory (e.g., /data1/ambari/)

Ensure directory permissions:

chmod 755 /data1/ambari
chmod 755 /data1

Deployment Configuration

Initialize Configuration

cd bigdata-deploy
cp conf/base_conf.yml.template conf/base_conf.yml

Configuration Complexity
- For Docker Deployment: Only requires minimal configuration:
  - repo_pkgs_dir: Location of installation packages
  - components_to_install: Components you want to install
  - docker_options: Resource limits and port mappings
  - Other settings can use defaults for testing
- For Bare Metal/VM Deployment: Requires more detailed configuration:
  - Host information
  - Network settings
  - Storage configuration
  - Security settings
  - etc.

Configure base_conf.yml

Below is a comprehensive explanation of all configuration parameters:

# Default password for all services (used for Ambari Web UI, database access, etc.)
default_password: 'B767610qa4Z'

# Data directories for Hadoop components
# Multiple directories can be specified for HDFS DataNode storage
# Example: ["/data/sdv1", "/data/sdv2"]
# Ensure all nodes have these directories available
data_dirs: ["/data/sdv1"]

# Repository configuration
# Two options available:
# 1. Use existing repository:
repos:
  - {"name": "ambari_repo", "url": "http://server0:8881/repository/yum/udh3/"}

# 2. Use local package directory (script will create repo automatically):
repos:
  - {"name": "ambari_repo", "url": "file:///data1/apache/ambari-3.0_pkgs"}

# Local package directory path (used when creating local repo)
repo_pkgs_dir: "/data1/apache/ambari-3.0_pkgs"

# Host configuration (not needed for Docker deployment)
# Format: IP_ADDRESS HOSTNAME PASSWORD
# Can use ansible-style expressions for multiple hosts
hosts:
# Single host entry:
- 192.168.56.10 vm1 B767610qa4Z
# Multiple hosts using range:
- 10.1.1.1[0-4] server[0-4] password

# Deployment user (must have sudo privileges)
# Recommended to use root, otherwise ensure user has sudo access
user: root

# Ambari Stack version
stack_version: '3.3.0'

# Components to install
# Available components:
# - Basic cluster: ["ambari", "hdfs", "zookeeper", "yarn"]
# - Full stack: ["hbase","hdfs","yarn","hive","zookeeper","kafka","spark",
#               "flink","ranger","infra_solr","ambari","ambari_metrics",
#               "kerberos","alluxio"]
components_to_install: ["hbase","hdfs","yarn","hive","zookeeper","ambari"]

# Cluster name (avoid special characters)
cluster_name: 'cluster'

# HDFS HA name configuration
hdfs_ha_name: 'ambari-cluster'

# SSH port for ansible deployment
# Change if using non-standard SSH port
ansible_ssh_port: 22

# Ambari Server port configuration
ambari_server_port: 8080

# Docker deployment specific options
docker_options:
  # Number of docker containers (minimum 3)
  instance_num: 4

  # Memory limit per container
  memory_limit: "16g"

  # Port mapping for accessing services from host
  # Format: COMPONENT_NAME: HOST_PORT
  components_port_map:
    AMBARI_SERVER: 8083
    # Optional mappings:
    # NAMENODE: 50070
    # RESOURCEMANAGER: 8088
    # HBASE_MASTER: 16010
    # FLINK_HISTORYSERVER: 8082
    # RANGER_ADMIN: 6080

  # Container distribution configuration
  distro:
    # Options: "centos" or "ubuntu"
    name: "centos"
    # For CentOS: 8 or 9
    # For Ubuntu: 22 or 24 (package support pending)
    version: 8

# Component memory configurations (in MB)
# Adjust based on your resource availability
# These are initial values, can be modified later in Ambari UI
hbase_heapsize: 1024
hadoop_heapsize: 1024
hive_heapsize: 1024
infra_solr_memory: 1024
spark_daemon_memory: 1024
zookeeper_heapsize: 1024
yarn_heapsize: 1024
alluxio_memory: 1024

Configuration Notes
- Repository Setup:
  - For production environments, it's recommended to set up a proper HTTP repository
  - For testing, the automatic local repository creation is sufficient
- Host Configuration:
  - Ensure all hostnames are unique and properly formatted
  - Password must be accessible for the specified user
  - For large clusters, use range notation to simplify configuration
- Component Selection:
  - Start with basic components for initial testing
  - Add additional components based on your needs
  - Ensure dependencies are considered (e.g., Ranger requires Infra Solr)
- Memory Configuration:
  - Default values are conservative
  - For production, adjust based on your hardware specifications
  - Consider total memory available when configuring multiple components
- Docker Deployment:
  - Port mapping is optional but useful for external access
  - Memory limits should account for host system resources
  - Instance number should be at least 3 for HA features

Deployment Process

For Bare Metal/VM Deployment

Setup Python Environment
```
source setup_pypath.sh
```

Generate Deployment Configuration

python3 deploy_py/main.py -generate-conf

Start Deployment

nohup python3 deploy_py/main.py -deploy &
tail -f logs/ansible-playbook.log

For Docker Deployment

Setup and Deploy

source setup_pypath.sh
python3 deploy_py/main.py -docker-deploy

Troubleshooting

Ambari Agent Registration Issues

If Ambari agents fail to register, verify hostname configuration:

python3
import socket
socket.getfqdn()

Ensure the output matches the hostname configured in:

Automated installation script
/etc/hosts file
Actual machine hostname

Monitoring Deployment

Access the Ambari Web UI at http://<AMBARI_SERVER>:8080 to monitor deployment progress.

Default credentials:

Username: admin
Password: (value of default_password in configuration)

Advanced Configuration

For advanced deployment scenarios such as:

Customizing cluster topology
Using external databases
Configuring custom directories
Enabling Ranger
Customizing Ambari settings

Please refer to our Advanced Deployment Guide.