| # Automated Deployment Guide |
| |
| This guide provides detailed instructions for automated deployment of Apache Ambari and Hadoop ecosystem components using both Docker and bare metal/VM approaches. |
| |
| ## Overview |
| |
| The automated deployment script supports two deployment modes: |
| |
| 1. **Docker Deployment**: Quick one-click deployment of a Hadoop cluster for testing, development environment, or demo purposes. |
| - Fastest way to get a working cluster |
| - Minimal configuration required |
| - Perfect for development and testing |
| - Automatic container orchestration |
| - Easy cleanup and redeployment |
| |
| 2. **Bare Metal/VM Deployment**: Production-ready automated deployment that supports both one-click installation and advanced customization options, such as adjusting cluster topology, component placement, external database integration, and data directory configuration. |
| |
| ## Prerequisites |
| |
| ### For Bare Metal/VM Deployment |
| |
| 1. **YUM Repository Configuration** |
| - Ensure all machines have properly configured YUM repositories (base and dev repositories must be available) |
| - Install Ansible dependencies by running: |
| ```bash |
| sh deploy_py/shell/utils/setup-env-centos.sh false |
| ``` |
| |
| 2. **SSH Configuration (Rocky 9)** |
| - Modify sshd configuration on all nodes: |
| ```bash |
| vi /etc/ssh/sshd_config |
| # Change these settings: |
| PasswordAuthentication yes # Make sure there's only one instance of this setting |
| PermitRootLogin yes # Ensure this exists and is uncommented |
| |
| # Restart sshd |
| systemctl restart sshd.service |
| |
| # Test SSH connectivity |
| ssh root@<hostname> # Verify password login works |
| ``` |
| |
| 3. **Firewall and SELinux Configuration** |
| |
| For RHEL 7, Rocky Linux 8/9: |
| ```bash |
| # Disable firewall |
| systemctl stop firewalld |
| systemctl disable firewalld |
| systemctl status firewalld |
| |
| # Disable SELinux temporarily |
| setenforce 0 |
| |
| # Disable SELinux permanently (requires reboot) |
| sed -i 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config |
| ``` |
| |
| Verify status: |
| ```bash |
| # Check firewall status |
| systemctl status firewalld |
| |
| # Check SELinux status |
| getenforce |
| # or |
| sestatus |
| ``` |
| |
| 4. **Hostname Configuration** |
| - Ensure all machines have unique and properly formatted hostnames |
| - Edit /etc/hostname and reboot if changes are made: |
| ```bash |
| vi /etc/hostname |
| # Edit hostname |
| reboot |
| ``` |
| |
| ### For Docker Deployment |
| |
| Docker deployment uses Apache Bigtop puppet containers. This is the quickest way to get a working cluster up and running. |
| |
| 1. Update base images with dependencies (one-time setup): |
| ```bash |
| cd deploy_py/shell/utils/ |
| # For Rocky 8 |
| chmod +x ./update_bigtop_image.sh && ./update_bigtop_image.sh rocky8 |
| |
| # For Rocky 9 |
| chmod +x ./update_bigtop_image.sh && ./update_bigtop_image.sh rocky9 |
| ``` |
| |
| 2. **Quick Start Configuration** |
| |
| For Docker deployment, you only need to modify a few essential parameters in `base_conf.yml`: |
| |
| ```yaml |
| # Local package directory path (where you placed the downloaded packages) |
| repo_pkgs_dir: "/data1/apache/ambari-3.0_pkgs" |
| |
| # Components you want to install |
| components_to_install: ["hbase","hdfs","yarn","hive","zookeeper","ambari"] |
| |
| # Docker resource allocation |
| docker_options: |
| instance_num: 4 # Number of containers |
| memory_limit: "16g" # Memory per container |
| components_port_map: |
| AMBARI_SERVER: 8083 # Port for accessing Ambari UI |
| ``` |
| |
| Other configurations can be left as default for initial testing, and can be customized later based on your needs. |
| |
| 3. **Deploy** |
| ```bash |
| source setup_pypath.sh |
| python3 deploy_py/main.py -docker-deploy |
| ``` |
| |
| That's it! The script will automatically: |
| - Create and configure containers |
| - Set up networking |
| - Install selected components |
| - Configure the cluster |
| |
| You can monitor the deployment progress at `http://localhost:8083` (or whatever port you configured for AMBARI_SERVER). |
| |
| ## Installation Package Setup |
| |
| 1. Download Ambari and Hadoop ecosystem packages from: |
| https://ambari.apache.org/docs/3.0.0/quick-start/download |
| |
| 2. Place all packages in a fixed directory (e.g., /data1/ambari/) |
| |
| 3. Ensure directory permissions: |
| ```bash |
| chmod 755 /data1/ambari |
| chmod 755 /data1 |
| ``` |
| |
| ## Deployment Configuration |
| |
| 1. **Initialize Configuration** |
| ```bash |
| cd bigdata-deploy |
| cp conf/base_conf.yml.template conf/base_conf.yml |
| ``` |
| |
| 2. **Configuration Complexity** |
| |
| - **For Docker Deployment**: Only requires minimal configuration: |
| - `repo_pkgs_dir`: Location of installation packages |
| - `components_to_install`: Components you want to install |
| - `docker_options`: Resource limits and port mappings |
| - Other settings can use defaults for testing |
| |
| - **For Bare Metal/VM Deployment**: Requires more detailed configuration: |
| - Host information |
| - Network settings |
| - Storage configuration |
| - Security settings |
| - etc. |
| |
| 3. **Configure base_conf.yml** |
| |
| Below is a comprehensive explanation of all configuration parameters: |
| |
| ```yaml |
| # Default password for all services (used for Ambari Web UI, database access, etc.) |
| default_password: 'B767610qa4Z' |
| |
| # Data directories for Hadoop components |
| # Multiple directories can be specified for HDFS DataNode storage |
| # Example: ["/data/sdv1", "/data/sdv2"] |
| # Ensure all nodes have these directories available |
| data_dirs: ["/data/sdv1"] |
| |
| # Repository configuration |
| # Two options available: |
| # 1. Use existing repository: |
| repos: |
| - {"name": "ambari_repo", "url": "http://server0:8881/repository/yum/udh3/"} |
| |
| # 2. Use local package directory (script will create repo automatically): |
| repos: |
| - {"name": "ambari_repo", "url": "file:///data1/apache/ambari-3.0_pkgs"} |
| |
| # Local package directory path (used when creating local repo) |
| repo_pkgs_dir: "/data1/apache/ambari-3.0_pkgs" |
| |
| # Host configuration (not needed for Docker deployment) |
| # Format: IP_ADDRESS HOSTNAME PASSWORD |
| # Can use ansible-style expressions for multiple hosts |
| hosts: |
| # Single host entry: |
| - 192.168.56.10 vm1 B767610qa4Z |
| # Multiple hosts using range: |
| - 10.1.1.1[0-4] server[0-4] password |
| |
| # Deployment user (must have sudo privileges) |
| # Recommended to use root, otherwise ensure user has sudo access |
| user: root |
| |
| # Ambari Stack version |
| stack_version: '3.3.0' |
| |
| # Components to install |
| # Available components: |
| # - Basic cluster: ["ambari", "hdfs", "zookeeper", "yarn"] |
| # - Full stack: ["hbase","hdfs","yarn","hive","zookeeper","kafka","spark", |
| # "flink","ranger","infra_solr","ambari","ambari_metrics", |
| # "kerberos","alluxio"] |
| components_to_install: ["hbase","hdfs","yarn","hive","zookeeper","ambari"] |
| |
| # Cluster name (avoid special characters) |
| cluster_name: 'cluster' |
| |
| # HDFS HA name configuration |
| hdfs_ha_name: 'ambari-cluster' |
| |
| # SSH port for ansible deployment |
| # Change if using non-standard SSH port |
| ansible_ssh_port: 22 |
| |
| # Ambari Server port configuration |
| ambari_server_port: 8080 |
| |
| # Docker deployment specific options |
| docker_options: |
| # Number of docker containers (minimum 3) |
| instance_num: 4 |
| |
| # Memory limit per container |
| memory_limit: "16g" |
| |
| # Port mapping for accessing services from host |
| # Format: COMPONENT_NAME: HOST_PORT |
| components_port_map: |
| AMBARI_SERVER: 8083 |
| # Optional mappings: |
| # NAMENODE: 50070 |
| # RESOURCEMANAGER: 8088 |
| # HBASE_MASTER: 16010 |
| # FLINK_HISTORYSERVER: 8082 |
| # RANGER_ADMIN: 6080 |
| |
| # Container distribution configuration |
| distro: |
| # Options: "centos" or "ubuntu" |
| name: "centos" |
| # For CentOS: 8 or 9 |
| # For Ubuntu: 22 or 24 (package support pending) |
| version: 8 |
| |
| # Component memory configurations (in MB) |
| # Adjust based on your resource availability |
| # These are initial values, can be modified later in Ambari UI |
| hbase_heapsize: 1024 |
| hadoop_heapsize: 1024 |
| hive_heapsize: 1024 |
| infra_solr_memory: 1024 |
| spark_daemon_memory: 1024 |
| zookeeper_heapsize: 1024 |
| yarn_heapsize: 1024 |
| alluxio_memory: 1024 |
| ``` |
| |
| 3. **Configuration Notes** |
| |
| - **Repository Setup**: |
| - For production environments, it's recommended to set up a proper HTTP repository |
| - For testing, the automatic local repository creation is sufficient |
| |
| - **Host Configuration**: |
| - Ensure all hostnames are unique and properly formatted |
| - Password must be accessible for the specified user |
| - For large clusters, use range notation to simplify configuration |
| |
| - **Component Selection**: |
| - Start with basic components for initial testing |
| - Add additional components based on your needs |
| - Ensure dependencies are considered (e.g., Ranger requires Infra Solr) |
| |
| - **Memory Configuration**: |
| - Default values are conservative |
| - For production, adjust based on your hardware specifications |
| - Consider total memory available when configuring multiple components |
| |
| - **Docker Deployment**: |
| - Port mapping is optional but useful for external access |
| - Memory limits should account for host system resources |
| - Instance number should be at least 3 for HA features |
| |
| ## Deployment Process |
| |
| ### For Bare Metal/VM Deployment |
| |
| 1. **Setup Python Environment** |
| ```bash |
| source setup_pypath.sh |
| ``` |
| |
| 2. **Generate Deployment Configuration** |
| ```bash |
| python3 deploy_py/main.py -generate-conf |
| ``` |
| |
| 3. **Start Deployment** |
| ```bash |
| nohup python3 deploy_py/main.py -deploy & |
| tail -f logs/ansible-playbook.log |
| ``` |
| |
| ### For Docker Deployment |
| |
| 1. **Setup and Deploy** |
| ```bash |
| source setup_pypath.sh |
| python3 deploy_py/main.py -docker-deploy |
| ``` |
| |
| ## Troubleshooting |
| |
| ### Ambari Agent Registration Issues |
| |
| If Ambari agents fail to register, verify hostname configuration: |
| |
| ```python |
| python3 |
| import socket |
| socket.getfqdn() |
| ``` |
| |
| Ensure the output matches the hostname configured in: |
| - Automated installation script |
| - /etc/hosts file |
| - Actual machine hostname |
| |
| ## Monitoring Deployment |
| |
| Access the Ambari Web UI at `http://<AMBARI_SERVER>:8080` to monitor deployment progress. |
| |
| Default credentials: |
| - Username: admin |
| - Password: (value of default_password in configuration) |
| |
| ## Advanced Configuration |
| |
| For advanced deployment scenarios such as: |
| - Customizing cluster topology |
| - Using external databases |
| - Configuring custom directories |
| - Enabling Ranger |
| - Customizing Ambari settings |
| |
| Please refer to our [**Advanced Deployment Guide**](advanced-deployment.md). |