This guide provides detailed instructions for advanced deployment scenarios and customizations of Apache Ambari and Hadoop ecosystem components.
When using non-Docker deployment with advanced configurations:
After modifying conf/base_conf.yml
, always regenerate the advanced configuration:
source setup_pypath.sh python3 deploy_py/main.py -generate-conf
Modify the generated conf/conf.yml
according to your needs
Start deployment:
nohup python3 deploy_py/main.py -deploy & tail -f logs/ansible-playbook.log
Important Note: Running python3 deploy_py/main.py -generate-conf
again will overwrite your customized conf/conf.yml
.
The cluster topology is primarily controlled by two configuration sections in conf.yml
: host_groups
and group_services
.
host_groups: group0: [server0] group1: [server1] group2: [server2,server3]
Each group contains a set of hostnames that will be treated as a unit for service deployment.
group_services: group0: [AMBARI_SERVER, NAMENODE, ZKFC, ...] group1: [NAMENODE, ZKFC, JOURNALNODE, ...] group2: [ZOOKEEPER_SERVER, JOURNALNODE, ...]
Here's a comprehensive list of available components and their functions:
To use external databases, you need to:
Create necessary users and databases manually:
-- PostgreSQL example CREATE USER hive WITH PASSWORD 'hive'; CREATE DATABASE hive OWNER hive; GRANT ALL PRIVILEGES ON DATABASE hive TO hive; CREATE USER ranger WITH PASSWORD 'ranger'; CREATE DATABASE ranger OWNER ranger; GRANT ALL PRIVILEGES ON DATABASE ranger TO ranger;
Configure database settings in conf.yml
:
database: 'postgres' database_options: external_hostname: 'your-db-host' hive_db_name: 'hive' hive_db_username: 'hive' hive_db_password: 'your-password' rangeradmin_db_name: 'ranger' rangeradmin_db_username: 'ranger' rangeradmin_db_password: 'your-password'
You can customize various directory locations:
data_dirs: ["/data/sdv1"] # Base data directory # HDFS directories hdfs_dfs_namenode_name_dir: "{{ hadoop_base_dir }}/hdfs/namenode" hdfs_dfs_datanode_data_dir: "{% for dr in data_dirs %}{{ dr }}/hadoop/hdfs/data{% if not loop.last %},{% endif %}{% endfor %}" # YARN directories yarn_nodemanager_local_dirs: "{{ hadoop_base_dir }}/yarn/local" yarn_nodemanager_log_dirs: "{{ hadoop_base_dir }}/yarn/log" # Other service directories zookeeper_data_dir: "{{ hadoop_base_dir }}/zookeeper" kafka_log_base_dir: "{% for dr in data_dirs %}{{ dr }}/kafka-logs{% if not loop.last %},{% endif %}{% endfor %}"
To enable Ranger and its plugins:
ranger_options: enable_plugins: yes ranger_security_options: ranger_admin_password: "your-password" ranger_keyadmin_password: "your-password" kms_master_key_password: "your-password"
Modify Ambari-specific settings:
ambari_options: ambari_agent_run_user: 'ambari' ambari_server_run_user: 'ambari' ambari_admin_user: 'admin' ambari_admin_password: 'your-password' config_recommendation_strategy: 'ALWAYS_APPLY'
High Availability Setup
Resource Planning
Security Considerations
Below is a complete configuration example with detailed explanations for each section:
############################ ## Host Groups & Services ## ############################ # Define machine groups and their members host_groups: group0: [e977d8bea74d.bigtop.apache.org] group1: [b76d76c80f15.bigtop.apache.org] group2: [8874239dee4b.bigtop.apache.org] # Define which services run on which groups group_services: group0: [AMBARI_SERVER, NAMENODE, ZKFC, JOURNALNODE, RESOURCEMANAGER, ZOOKEEPER_SERVER, HBASE_MASTER, HIVE_METASTORE, SPARK_THRIFTSERVER, FLINK_HISTORYSERVER, HISTORYSERVER, RANGER_TAGSYNC, RANGER_USERSYNC] group1: [NAMENODE, ZKFC, JOURNALNODE, RESOURCEMANAGER, ZOOKEEPER_SERVER, HBASE_MASTER, DATANODE, NODEMANAGER, APP_TIMELINE_SERVER, RANGER_ADMIN, METRICS_GRAFANA, SPARK_JOBHISTORYSERVER, INFRA_SOLR] group2: [ZOOKEEPER_SERVER, JOURNALNODE, DATANODE, NODEMANAGER, TIMELINE_READER, YARN_REGISTRY_DNS, METRICS_COLLECTOR, HBASE_REGIONSERVER, HIVE_SERVER, WEBHCAT_SERVER, INFRA_SOLR] ############################ ## Basic Configuration ## ############################ # Default password for all services default_password: B767610qa4Z # Data directories for all components # Can specify multiple directories for HDFS DataNode storage data_dirs: [/data/sdv1] # Repository configuration # Option 1: Use existing repository repos: null # Option 2: Use local package directory repo_pkgs_dir: /data1/apache/ambari-3.0_pkgs # Stack version for Ambari stack_version: 3.3.0 # Cluster naming cluster_name: cluster hdfs_ha_name: ambari-cluster # Network configuration ansible_ssh_port: 22 ambari_server_port: 8083 http_repo_port: 8881 ############################ ## Docker Configuration ## ############################ docker_options: # Minimum 3 instances required for HA instance_num: 3 # Memory limit per container memory_limit: 8g # Enable local repository enable_local_repo: true # Port mappings for accessing services components_port_map: {AMBARI_SERVER: 8083} # Container distribution settings distro: {name: centos, version: 8} # Components to install in Docker environment components: [hbase, hdfs, yarn, hive, zookeeper, ambari, spark, flink, ranger, infra_solr, ambari_metrics] default_password: B767610qa4Z ############################ ## Memory Configuration ## ############################ # Component memory settings (in MB) hbase_heapsize: 1024 hadoop_heapsize: 1024 hive_heapsize: 1024 infra_solr_memory: 1024 spark_daemon_memory: 1024 zookeeper_heapsize: 1024 yarn_heapsize: 1024 alluxio_memory: 1024 ############################ ## Repository Settings ## ############################ skip_cluster_clear: true local_repo_ipaddress: 172.30.0.3 create_http_repo_for_local_pkgs: false ############################ ## Deployment Control ## ############################ deploy_ambari_only: false prepare_nodes_only: false backup_old_repo: no should_deploy_ambari_mpack: false ############################ ## Database Configuration ## ############################ # Database type selection database: 'postgres' # Options: 'postgres', 'mysql' postgres_port: 5432 mysql_port: 3306 # Database options for all components database_options: # External database configuration repo_url: '' external_hostname: '' # Empty for local database installation # Ambari database ambari_db_name: 'ambari' ambari_db_username: 'ambari' ambari_db_password: '{{ default_password }}' # Hive database hive_db_name: 'hive' hive_db_username: 'hive' hive_db_password: '{{ default_password }}' # Ranger databases rangeradmin_db_name: 'ranger' rangeradmin_db_username: 'ranger' rangeradmin_db_password: '{{ default_password }}' rangerkms_db_name: 'rangerkms' rangerkms_db_username: 'rangerkms' rangerkms_db_password: '{{ default_password }}' # Other component databases dolphin_db_name: 'dolphinscheduler' dolphin_db_username: 'dolphin' dolphin_db_password: '{{ default_password }}' superset_db_name: 'superset' superset_db_username: 'superset' superset_db_password: '{{ default_password }}' cloudbeaver_db_name: 'cloudbeaver' cloudbeaver_db_username: 'cloudbeaver' cloudbeaver_db_password: '{{ default_password }}' nightingale_db_name: 'nightingale' nightingale_db_username: 'n9e' nightingale_db_password: '{{ default_password }}' ############################ ## Security Configuration ## ############################ # Security type security: 'none' # Options: 'none', 'mit-kdc' # Kerberos security options security_options: external_hostname: '' # Empty for local KDC installation external_hostip: '' # For /etc/hosts DNS lookup realm: 'MY-REALM.COM' admin_principal: 'admin/admin' # Kerberos admin principal admin_password: "{{ default_password }}" kdc_master_key: "{{ default_password }}" # Only for 'mit-kdc' http_authentication: yes # Enable HTTP authentication manage_krb5_conf: yes # Set to no for FreeIPA/IdM ############################ ## Ambari Configuration ## ############################ ambari_options: # Run users ambari_agent_run_user: 'ambari' ambari_server_run_user: 'ambari' # Admin user settings ambari_admin_user: 'admin' ambari_admin_password: '{{ default_password }}' ambari_admin_default_password: 'admin' # Configuration strategy config_recommendation_strategy: 'ALWAYS_APPLY' # Options: 'NEVER_APPLY', 'ONLY_STACK_DEFAULTS_APPLY', # 'ALWAYS_APPLY', 'ALWAYS_APPLY_DONT_OVERRIDE_CUSTOM_VALUES' ############################ ## Ranger Configuration ## ############################ # Ranger plugin options ranger_options: enable_plugins: no # Enable plugins for installed services # Ranger security settings ranger_security_options: ranger_admin_password: "{{ default_password }}" # Password for admin users ranger_keyadmin_password: "{{ default_password }}" # Password for keyadmin (HDP3 only) kms_master_key_password: "{{ default_password }}" # Master key encryption password ############################ ## General Configuration ## ############################ # System settings external_dns: yes # Use existing DNS or update /etc/hosts disable_firewall: yes # Disable local firewall service timezone: Asia/Shanghai # NTP configuration external_ntp_server_hostname: '' # Empty for local NTP server # Additional settings packages_need_install: [] registry_dns_bind_port: "54" blueprint_name: 'blueprint' # Blueprint name in Ambari wait: true # Wait for cluster installation wait_timeout: 60 accept_gpl: yes # Accept GPL licenses ############################ ## Directory Configuration## ############################ # Base directories base_log_dir: "/var/log" base_tmp_dir: "/tmp" # Service data directories kafka_log_base_dir: "{% for dr in data_dirs %}{{ dr }}/kafka-logs{% if not loop.last %},{% endif %}{% endfor %}" ams_base_dir: "/var/lib" ranger_audit_hdfs_filespool_base_dir: "{{ base_log_dir }}" ranger_audit_solr_filespool_base_dir: "{{ base_log_dir }}" # HDFS directories hdfs_dfs_namenode_checkpoint_dir: "{{ hadoop_base_dir }}/hdfs/namesecondary" hdfs_dfs_namenode_name_dir: "{{ hadoop_base_dir }}/hdfs/namenode" hdfs_dfs_journalnode_edits_dir: "{{ hadoop_base_dir }}/hdfs/journalnode" hdfs_dfs_datanode_data_dir: "{% for dr in data_dirs %}{{ dr }}/hadoop/hdfs/data{% if not loop.last %},{% endif %}{% endfor %}" # YARN directories yarn_nodemanager_local_dirs: "{{ hadoop_base_dir }}/yarn/local" yarn_nodemanager_log_dirs: "{{ hadoop_base_dir }}/yarn/log" yarn_timeline_leveldb_dir: "{{ hadoop_base_dir }}/yarn/timeline" # Other service directories zookeeper_data_dir: "{{ hadoop_base_dir }}/zookeeper" infra_solr_datadir: "{{ hadoop_base_dir }}/ambari-infra-solr/data" heap_dump_location: "{{ base_tmp_dir }}" hive_downloaded_resources_dir: "{{ base_tmp_dir }}/hive/${hive.session.id}_resources" # Temporary directories ansible_tmp_dir: /tmp/ansible
When using external databases:
database_options
data_dirs
: Primary configuration for all data storage