This guide provides detailed instructions for automated deployment of Apache Ambari and Hadoop ecosystem components using both Docker and bare metal/VM approaches.
The automated deployment script supports two deployment modes:
Docker Deployment: Quick one-click deployment of a Hadoop cluster for testing, development environment, or demo purposes.
Bare Metal/VM Deployment: Production-ready automated deployment that supports both one-click installation and advanced customization options, such as adjusting cluster topology, component placement, external database integration, and data directory configuration.
YUM Repository Configuration
sh deploy_py/shell/utils/setup-env-centos.sh false
SSH Configuration (Rocky 9)
vi /etc/ssh/sshd_config # Change these settings: PasswordAuthentication yes # Make sure there's only one instance of this setting PermitRootLogin yes # Ensure this exists and is uncommented # Restart sshd systemctl restart sshd.service # Test SSH connectivity ssh root@<hostname> # Verify password login works
Firewall and SELinux Configuration
For RHEL 7, Rocky Linux 8/9:
# Disable firewall systemctl stop firewalld systemctl disable firewalld systemctl status firewalld # Disable SELinux temporarily setenforce 0 # Disable SELinux permanently (requires reboot) sed -i 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config
Verify status:
# Check firewall status systemctl status firewalld # Check SELinux status getenforce # or sestatus
Hostname Configuration
vi /etc/hostname # Edit hostname reboot
Docker deployment uses Apache Bigtop puppet containers. This is the quickest way to get a working cluster up and running.
Update base images with dependencies (one-time setup):
cd deploy_py/shell/utils/ # For Rocky 8 chmod +x ./update_bigtop_image.sh && ./update_bigtop_image.sh rocky8 # For Rocky 9 chmod +x ./update_bigtop_image.sh && ./update_bigtop_image.sh rocky9
Quick Start Configuration
For Docker deployment, you only need to modify a few essential parameters in base_conf.yml
:
# Local package directory path (where you placed the downloaded packages) repo_pkgs_dir: "/data1/apache/ambari-3.0_pkgs" # Components you want to install components_to_install: ["hbase","hdfs","yarn","hive","zookeeper","ambari"] # Docker resource allocation docker_options: instance_num: 4 # Number of containers memory_limit: "16g" # Memory per container components_port_map: AMBARI_SERVER: 8083 # Port for accessing Ambari UI
Other configurations can be left as default for initial testing, and can be customized later based on your needs.
Deploy
source setup_pypath.sh python3 deploy_py/main.py -docker-deploy
That's it! The script will automatically:
You can monitor the deployment progress at http://localhost:8083
(or whatever port you configured for AMBARI_SERVER).
Download Ambari and Hadoop ecosystem packages from: https://ambari.apache.org/docs/3.0.0/quick-start/download
Place all packages in a fixed directory (e.g., /data1/ambari/)
Ensure directory permissions:
chmod 755 /data1/ambari chmod 755 /data1
Initialize Configuration
cd bigdata-deploy cp conf/base_conf.yml.template conf/base_conf.yml
Configuration Complexity
For Docker Deployment: Only requires minimal configuration:
repo_pkgs_dir
: Location of installation packagescomponents_to_install
: Components you want to installdocker_options
: Resource limits and port mappingsFor Bare Metal/VM Deployment: Requires more detailed configuration:
Configure base_conf.yml
Below is a comprehensive explanation of all configuration parameters:
# Default password for all services (used for Ambari Web UI, database access, etc.) default_password: 'B767610qa4Z' # Data directories for Hadoop components # Multiple directories can be specified for HDFS DataNode storage # Example: ["/data/sdv1", "/data/sdv2"] # Ensure all nodes have these directories available data_dirs: ["/data/sdv1"] # Repository configuration # Two options available: # 1. Use existing repository: repos: - {"name": "ambari_repo", "url": "http://server0:8881/repository/yum/udh3/"} # 2. Use local package directory (script will create repo automatically): repos: - {"name": "ambari_repo", "url": "file:///data1/apache/ambari-3.0_pkgs"} # Local package directory path (used when creating local repo) repo_pkgs_dir: "/data1/apache/ambari-3.0_pkgs" # Host configuration (not needed for Docker deployment) # Format: IP_ADDRESS HOSTNAME PASSWORD # Can use ansible-style expressions for multiple hosts hosts: # Single host entry: - 192.168.56.10 vm1 B767610qa4Z # Multiple hosts using range: - 10.1.1.1[0-4] server[0-4] password # Deployment user (must have sudo privileges) # Recommended to use root, otherwise ensure user has sudo access user: root # Ambari Stack version stack_version: '3.3.0' # Components to install # Available components: # - Basic cluster: ["ambari", "hdfs", "zookeeper", "yarn"] # - Full stack: ["hbase","hdfs","yarn","hive","zookeeper","kafka","spark", # "flink","ranger","infra_solr","ambari","ambari_metrics", # "kerberos","alluxio"] components_to_install: ["hbase","hdfs","yarn","hive","zookeeper","ambari"] # Cluster name (avoid special characters) cluster_name: 'cluster' # HDFS HA name configuration hdfs_ha_name: 'ambari-cluster' # SSH port for ansible deployment # Change if using non-standard SSH port ansible_ssh_port: 22 # Ambari Server port configuration ambari_server_port: 8080 # Docker deployment specific options docker_options: # Number of docker containers (minimum 3) instance_num: 4 # Memory limit per container memory_limit: "16g" # Port mapping for accessing services from host # Format: COMPONENT_NAME: HOST_PORT components_port_map: AMBARI_SERVER: 8083 # Optional mappings: # NAMENODE: 50070 # RESOURCEMANAGER: 8088 # HBASE_MASTER: 16010 # FLINK_HISTORYSERVER: 8082 # RANGER_ADMIN: 6080 # Container distribution configuration distro: # Options: "centos" or "ubuntu" name: "centos" # For CentOS: 8 or 9 # For Ubuntu: 22 or 24 (package support pending) version: 8 # Component memory configurations (in MB) # Adjust based on your resource availability # These are initial values, can be modified later in Ambari UI hbase_heapsize: 1024 hadoop_heapsize: 1024 hive_heapsize: 1024 infra_solr_memory: 1024 spark_daemon_memory: 1024 zookeeper_heapsize: 1024 yarn_heapsize: 1024 alluxio_memory: 1024
Configuration Notes
Repository Setup:
Host Configuration:
Component Selection:
Memory Configuration:
Docker Deployment:
Setup Python Environment
source setup_pypath.sh
Generate Deployment Configuration
python3 deploy_py/main.py -generate-conf
Start Deployment
nohup python3 deploy_py/main.py -deploy & tail -f logs/ansible-playbook.log
source setup_pypath.sh python3 deploy_py/main.py -docker-deploy
If Ambari agents fail to register, verify hostname configuration:
python3 import socket socket.getfqdn()
Ensure the output matches the hostname configured in:
Access the Ambari Web UI at http://<AMBARI_SERVER>:8080
to monitor deployment progress.
Default credentials:
For advanced deployment scenarios such as:
Please refer to our Advanced Deployment Guide.