| # Advanced Deployment Configuration |
| |
| This guide provides detailed instructions for advanced deployment scenarios and customizations of Apache Ambari and Hadoop ecosystem components. |
| |
| ## Configuration Workflow |
| |
| When using non-Docker deployment with advanced configurations: |
| |
| 1. After modifying `conf/base_conf.yml`, always regenerate the advanced configuration: |
| ```bash |
| source setup_pypath.sh |
| python3 deploy_py/main.py -generate-conf |
| ``` |
| |
| 2. Modify the generated `conf/conf.yml` according to your needs |
| |
| 3. Start deployment: |
| ```bash |
| nohup python3 deploy_py/main.py -deploy & |
| tail -f logs/ansible-playbook.log |
| ``` |
| |
| **Important Note**: Running `python3 deploy_py/main.py -generate-conf` again will overwrite your customized `conf/conf.yml`. |
| |
| ## Advanced Configuration Scenarios |
| |
| ### 1. Customizing Cluster Topology |
| |
| The cluster topology is primarily controlled by two configuration sections in `conf.yml`: `host_groups` and `group_services`. |
| |
| #### Host Groups Configuration |
| ```yaml |
| host_groups: |
| group0: [server0] |
| group1: [server1] |
| group2: [server2,server3] |
| ``` |
| Each group contains a set of hostnames that will be treated as a unit for service deployment. |
| |
| #### Service Groups Configuration |
| ```yaml |
| group_services: |
| group0: [AMBARI_SERVER, NAMENODE, ZKFC, ...] |
| group1: [NAMENODE, ZKFC, JOURNALNODE, ...] |
| group2: [ZOOKEEPER_SERVER, JOURNALNODE, ...] |
| ``` |
| |
| ### 2. Component Reference Guide |
| |
| Here's a comprehensive list of available components and their functions: |
| |
| #### HDFS Components |
| - **NAMENODE** (HA capable) |
| - Core component managing filesystem metadata |
| - Deploy 2 instances for HA mode |
| - **DATANODE** |
| - Stores actual data blocks |
| - **JOURNALNODE** |
| - Required for HA: minimum 3 instances (odd number) |
| - **ZKFC** |
| - Must be co-located with NAMENODE |
| - Required for HA: 2 instances |
| |
| #### YARN Components |
| - **RESOURCEMANAGER** (HA capable) |
| - Central resource management |
| - Deploy 2 instances for HA mode |
| - **NODEMANAGER** |
| - Manages resources on individual nodes |
| - **HISTORYSERVER** |
| - Stores job history |
| - **APP_TIMELINE_SERVER** |
| - Application timeline tracking |
| |
| #### HBase Components |
| - **HBASE_MASTER** (HA capable) |
| - Cluster management and coordination |
| - Deploy 2 instances for HA mode |
| - **HBASE_REGIONSERVER** |
| - Data storage and management |
| |
| #### Hive Components |
| - **HIVE_METASTORE** |
| - Metadata storage |
| - **HIVE_SERVER** (HA capable) |
| - Query processing |
| - **WEBHCAT_SERVER** |
| - REST interface |
| |
| #### Other Core Components |
| - **ZOOKEEPER_SERVER** |
| - Must deploy odd number of instances |
| - **KAFKA_BROKER** |
| - Message broker |
| - **SPARK_JOBHISTORYSERVER** |
| - Spark job history |
| - **FLINK_HISTORYSERVER** |
| - Flink job history |
| |
| ### 3. Using External Databases |
| |
| To use external databases, you need to: |
| |
| 1. Create necessary users and databases manually: |
| ```sql |
| -- PostgreSQL example |
| CREATE USER hive WITH PASSWORD 'hive'; |
| CREATE DATABASE hive OWNER hive; |
| GRANT ALL PRIVILEGES ON DATABASE hive TO hive; |
| |
| CREATE USER ranger WITH PASSWORD 'ranger'; |
| CREATE DATABASE ranger OWNER ranger; |
| GRANT ALL PRIVILEGES ON DATABASE ranger TO ranger; |
| ``` |
| |
| 2. Configure database settings in `conf.yml`: |
| ```yaml |
| database: 'postgres' |
| database_options: |
| external_hostname: 'your-db-host' |
| hive_db_name: 'hive' |
| hive_db_username: 'hive' |
| hive_db_password: 'your-password' |
| rangeradmin_db_name: 'ranger' |
| rangeradmin_db_username: 'ranger' |
| rangeradmin_db_password: 'your-password' |
| ``` |
| |
| ### 4. Customizing Directory Locations |
| |
| You can customize various directory locations: |
| |
| ```yaml |
| data_dirs: ["/data/sdv1"] # Base data directory |
| |
| # HDFS directories |
| hdfs_dfs_namenode_name_dir: "{{ hadoop_base_dir }}/hdfs/namenode" |
| hdfs_dfs_datanode_data_dir: "{% for dr in data_dirs %}{{ dr }}/hadoop/hdfs/data{% if not loop.last %},{% endif %}{% endfor %}" |
| |
| # YARN directories |
| yarn_nodemanager_local_dirs: "{{ hadoop_base_dir }}/yarn/local" |
| yarn_nodemanager_log_dirs: "{{ hadoop_base_dir }}/yarn/log" |
| |
| # Other service directories |
| zookeeper_data_dir: "{{ hadoop_base_dir }}/zookeeper" |
| kafka_log_base_dir: "{% for dr in data_dirs %}{{ dr }}/kafka-logs{% if not loop.last %},{% endif %}{% endfor %}" |
| ``` |
| |
| ### 5. Enabling Ranger |
| |
| To enable Ranger and its plugins: |
| |
| ```yaml |
| ranger_options: |
| enable_plugins: yes |
| |
| ranger_security_options: |
| ranger_admin_password: "your-password" |
| ranger_keyadmin_password: "your-password" |
| kms_master_key_password: "your-password" |
| ``` |
| |
| ### 6. Customizing Ambari Configuration |
| |
| Modify Ambari-specific settings: |
| |
| ```yaml |
| ambari_options: |
| ambari_agent_run_user: 'ambari' |
| ambari_server_run_user: 'ambari' |
| ambari_admin_user: 'admin' |
| ambari_admin_password: 'your-password' |
| config_recommendation_strategy: 'ALWAYS_APPLY' |
| ``` |
| |
| ## Best Practices |
| |
| 1. **High Availability Setup** |
| - Deploy NAMENODE, RESOURCEMANAGER, and HBASE_MASTER in pairs for HA |
| - Use odd number of JOURNALNODE and ZOOKEEPER_SERVER instances |
| - Co-locate ZKFC with NAMENODE |
| |
| 2. **Resource Planning** |
| - Distribute memory-intensive services across nodes |
| - Consider network topology when placing services |
| - Plan for future scaling |
| |
| 3. **Security Considerations** |
| - Use external databases for production |
| - Enable Ranger for centralized security |
| - Configure proper authentication mechanisms |
| |
| ## Complete Configuration Reference |
| |
| Below is a complete configuration example with detailed explanations for each section: |
| |
| ```yaml |
| ############################ |
| ## Host Groups & Services ## |
| ############################ |
| # Define machine groups and their members |
| host_groups: |
| group0: [e977d8bea74d.bigtop.apache.org] |
| group1: [b76d76c80f15.bigtop.apache.org] |
| group2: [8874239dee4b.bigtop.apache.org] |
| |
| # Define which services run on which groups |
| group_services: |
| group0: [AMBARI_SERVER, NAMENODE, ZKFC, JOURNALNODE, RESOURCEMANAGER, ZOOKEEPER_SERVER, |
| HBASE_MASTER, HIVE_METASTORE, SPARK_THRIFTSERVER, FLINK_HISTORYSERVER, HISTORYSERVER, |
| RANGER_TAGSYNC, RANGER_USERSYNC] |
| group1: [NAMENODE, ZKFC, JOURNALNODE, RESOURCEMANAGER, ZOOKEEPER_SERVER, HBASE_MASTER, |
| DATANODE, NODEMANAGER, APP_TIMELINE_SERVER, RANGER_ADMIN, METRICS_GRAFANA, SPARK_JOBHISTORYSERVER, |
| INFRA_SOLR] |
| group2: [ZOOKEEPER_SERVER, JOURNALNODE, DATANODE, NODEMANAGER, TIMELINE_READER, |
| YARN_REGISTRY_DNS, METRICS_COLLECTOR, HBASE_REGIONSERVER, HIVE_SERVER, WEBHCAT_SERVER, |
| INFRA_SOLR] |
| |
| ############################ |
| ## Basic Configuration ## |
| ############################ |
| # Default password for all services |
| default_password: B767610qa4Z |
| |
| # Data directories for all components |
| # Can specify multiple directories for HDFS DataNode storage |
| data_dirs: [/data/sdv1] |
| |
| # Repository configuration |
| # Option 1: Use existing repository |
| repos: null |
| # Option 2: Use local package directory |
| repo_pkgs_dir: /data1/apache/ambari-3.0_pkgs |
| |
| # Stack version for Ambari |
| stack_version: 3.3.0 |
| |
| # Cluster naming |
| cluster_name: cluster |
| hdfs_ha_name: ambari-cluster |
| |
| # Network configuration |
| ansible_ssh_port: 22 |
| ambari_server_port: 8083 |
| http_repo_port: 8881 |
| |
| ############################ |
| ## Docker Configuration ## |
| ############################ |
| docker_options: |
| # Minimum 3 instances required for HA |
| instance_num: 3 |
| # Memory limit per container |
| memory_limit: 8g |
| # Enable local repository |
| enable_local_repo: true |
| # Port mappings for accessing services |
| components_port_map: {AMBARI_SERVER: 8083} |
| # Container distribution settings |
| distro: {name: centos, version: 8} |
| # Components to install in Docker environment |
| components: [hbase, hdfs, yarn, hive, zookeeper, ambari, spark, flink, ranger, infra_solr, |
| ambari_metrics] |
| default_password: B767610qa4Z |
| |
| ############################ |
| ## Memory Configuration ## |
| ############################ |
| # Component memory settings (in MB) |
| hbase_heapsize: 1024 |
| hadoop_heapsize: 1024 |
| hive_heapsize: 1024 |
| infra_solr_memory: 1024 |
| spark_daemon_memory: 1024 |
| zookeeper_heapsize: 1024 |
| yarn_heapsize: 1024 |
| alluxio_memory: 1024 |
| |
| ############################ |
| ## Repository Settings ## |
| ############################ |
| skip_cluster_clear: true |
| local_repo_ipaddress: 172.30.0.3 |
| create_http_repo_for_local_pkgs: false |
| |
| ############################ |
| ## Deployment Control ## |
| ############################ |
| deploy_ambari_only: false |
| prepare_nodes_only: false |
| backup_old_repo: no |
| should_deploy_ambari_mpack: false |
| |
| ############################ |
| ## Database Configuration ## |
| ############################ |
| # Database type selection |
| database: 'postgres' # Options: 'postgres', 'mysql' |
| postgres_port: 5432 |
| mysql_port: 3306 |
| |
| # Database options for all components |
| database_options: |
| # External database configuration |
| repo_url: '' |
| external_hostname: '' # Empty for local database installation |
| |
| # Ambari database |
| ambari_db_name: 'ambari' |
| ambari_db_username: 'ambari' |
| ambari_db_password: '{{ default_password }}' |
| |
| # Hive database |
| hive_db_name: 'hive' |
| hive_db_username: 'hive' |
| hive_db_password: '{{ default_password }}' |
| |
| # Ranger databases |
| rangeradmin_db_name: 'ranger' |
| rangeradmin_db_username: 'ranger' |
| rangeradmin_db_password: '{{ default_password }}' |
| rangerkms_db_name: 'rangerkms' |
| rangerkms_db_username: 'rangerkms' |
| rangerkms_db_password: '{{ default_password }}' |
| |
| # Other component databases |
| dolphin_db_name: 'dolphinscheduler' |
| dolphin_db_username: 'dolphin' |
| dolphin_db_password: '{{ default_password }}' |
| |
| superset_db_name: 'superset' |
| superset_db_username: 'superset' |
| superset_db_password: '{{ default_password }}' |
| |
| cloudbeaver_db_name: 'cloudbeaver' |
| cloudbeaver_db_username: 'cloudbeaver' |
| cloudbeaver_db_password: '{{ default_password }}' |
| |
| nightingale_db_name: 'nightingale' |
| nightingale_db_username: 'n9e' |
| nightingale_db_password: '{{ default_password }}' |
| |
| ############################ |
| ## Security Configuration ## |
| ############################ |
| # Security type |
| security: 'none' # Options: 'none', 'mit-kdc' |
| |
| # Kerberos security options |
| security_options: |
| external_hostname: '' # Empty for local KDC installation |
| external_hostip: '' # For /etc/hosts DNS lookup |
| realm: 'MY-REALM.COM' |
| admin_principal: 'admin/admin' # Kerberos admin principal |
| admin_password: "{{ default_password }}" |
| kdc_master_key: "{{ default_password }}" # Only for 'mit-kdc' |
| http_authentication: yes # Enable HTTP authentication |
| manage_krb5_conf: yes # Set to no for FreeIPA/IdM |
| |
| ############################ |
| ## Ambari Configuration ## |
| ############################ |
| ambari_options: |
| # Run users |
| ambari_agent_run_user: 'ambari' |
| ambari_server_run_user: 'ambari' |
| # Admin user settings |
| ambari_admin_user: 'admin' |
| ambari_admin_password: '{{ default_password }}' |
| ambari_admin_default_password: 'admin' |
| # Configuration strategy |
| config_recommendation_strategy: 'ALWAYS_APPLY' # Options: 'NEVER_APPLY', 'ONLY_STACK_DEFAULTS_APPLY', |
| # 'ALWAYS_APPLY', 'ALWAYS_APPLY_DONT_OVERRIDE_CUSTOM_VALUES' |
| |
| ############################ |
| ## Ranger Configuration ## |
| ############################ |
| # Ranger plugin options |
| ranger_options: |
| enable_plugins: no # Enable plugins for installed services |
| |
| # Ranger security settings |
| ranger_security_options: |
| ranger_admin_password: "{{ default_password }}" # Password for admin users |
| ranger_keyadmin_password: "{{ default_password }}" # Password for keyadmin (HDP3 only) |
| kms_master_key_password: "{{ default_password }}" # Master key encryption password |
| |
| ############################ |
| ## General Configuration ## |
| ############################ |
| # System settings |
| external_dns: yes # Use existing DNS or update /etc/hosts |
| disable_firewall: yes # Disable local firewall service |
| timezone: Asia/Shanghai |
| |
| # NTP configuration |
| external_ntp_server_hostname: '' # Empty for local NTP server |
| |
| # Additional settings |
| packages_need_install: [] |
| registry_dns_bind_port: "54" |
| blueprint_name: 'blueprint' # Blueprint name in Ambari |
| wait: true # Wait for cluster installation |
| wait_timeout: 60 |
| accept_gpl: yes # Accept GPL licenses |
| |
| ############################ |
| ## Directory Configuration## |
| ############################ |
| # Base directories |
| base_log_dir: "/var/log" |
| base_tmp_dir: "/tmp" |
| |
| # Service data directories |
| kafka_log_base_dir: "{% for dr in data_dirs %}{{ dr }}/kafka-logs{% if not loop.last %},{% endif %}{% endfor %}" |
| ams_base_dir: "/var/lib" |
| ranger_audit_hdfs_filespool_base_dir: "{{ base_log_dir }}" |
| ranger_audit_solr_filespool_base_dir: "{{ base_log_dir }}" |
| |
| # HDFS directories |
| hdfs_dfs_namenode_checkpoint_dir: "{{ hadoop_base_dir }}/hdfs/namesecondary" |
| hdfs_dfs_namenode_name_dir: "{{ hadoop_base_dir }}/hdfs/namenode" |
| hdfs_dfs_journalnode_edits_dir: "{{ hadoop_base_dir }}/hdfs/journalnode" |
| hdfs_dfs_datanode_data_dir: "{% for dr in data_dirs %}{{ dr }}/hadoop/hdfs/data{% if not loop.last %},{% endif %}{% endfor %}" |
| |
| # YARN directories |
| yarn_nodemanager_local_dirs: "{{ hadoop_base_dir }}/yarn/local" |
| yarn_nodemanager_log_dirs: "{{ hadoop_base_dir }}/yarn/log" |
| yarn_timeline_leveldb_dir: "{{ hadoop_base_dir }}/yarn/timeline" |
| |
| # Other service directories |
| zookeeper_data_dir: "{{ hadoop_base_dir }}/zookeeper" |
| infra_solr_datadir: "{{ hadoop_base_dir }}/ambari-infra-solr/data" |
| heap_dump_location: "{{ base_tmp_dir }}" |
| hive_downloaded_resources_dir: "{{ base_tmp_dir }}/hive/${hive.session.id}_resources" |
| |
| # Temporary directories |
| ansible_tmp_dir: /tmp/ansible |
| ``` |
| |
| ## Configuration Notes |
| |
| ### Host Groups and Services |
| |
| - Each service can be configured for high availability (HA): |
| - **NAMENODE**: Deploy 2 instances for HA |
| - **RESOURCEMANAGER**: Deploy 2 instances for HA |
| - **HBASE_MASTER**: Deploy 2 instances for HA |
| - **HIVE_SERVER**: Deploy multiple instances for HA |
| - **ZOOKEEPER_SERVER**: Must deploy odd number of instances |
| |
| ### Database Configuration |
| |
| When using external databases: |
| 1. Create users and databases manually before deployment |
| 2. Ensure database is accessible from all nodes |
| 3. Configure connection details in `database_options` |
| |
| ### Directory Configuration |
| |
| - `data_dirs`: Primary configuration for all data storage |
| - First directory is used for logs and default paths |
| - All directories are used for HDFS DataNode storage |
| - Multiple directories improve I/O performance |
| |
| ### Security Configuration |
| |
| - Kerberos setup requires proper DNS resolution |
| - Ranger plugins can be enabled for all compatible services |
| - HTTP authentication can be enabled for web UIs |
| |
| ### Memory Configuration |
| |
| - All memory settings are in MB |
| - Configure based on available hardware |
| - Consider service co-location when setting values |