| --- |
| title: Hive Engine |
| sidebar_position: 2 |
| --- |
| |
| This article mainly introduces the installation, usage and configuration of the `Hive` engine plugin in `Linkis`. |
| |
| ## 1. Preliminary work |
| ### 1.1 Environment configuration before engine use |
| |
| If you want to use the `hive` engine on your server, you need to ensure that the following environment variables have been set correctly and the engine startup user has these environment variables. |
| |
| It is strongly recommended that you check these environment variables for the executing user before executing `hive` tasks. |
| |
| | Environment variable name | Environment variable content | Remarks | |
| |-----------------|----------------|------| |
| | JAVA_HOME | JDK installation path | Required | |
| | HADOOP_HOME | Hadoop installation path | Required | |
| | HADOOP_CONF_DIR | Hadoop configuration path | required | |
| | HIVE_CONF_DIR | Hive configuration path | required | |
| |
| ### 1.1 Environment verification |
| ``` |
| # link hive |
| bin/hive |
| |
| # test command |
| show databases; |
| |
| # Being able to link successfully and output database information normally means that the environment configuration is successful |
| hive (default)> show databases; |
| OK |
| databases_name |
| default |
| ``` |
| |
| ## 2. Engine plugin installation [default engine](./overview.md) |
| |
| The binary installation package released by `linkis` includes the `Hive` engine plug-in by default, and users do not need to install it additionally. |
| |
| The version of `Hive` supports `hive1.x` and `hive2.x`. The default is to support `hive on MapReduce`. If you want to change to `Hive on Tez`,Linkis is compatible with hive on tez and requires the following steps: |
| - You need to copy Tez-related dependencies to {LINKIS_HOME}/lib/linkis-engineconn-plugins/hive/dist/3.1.3/lib is the dist not plugin directory . You can also modify hive ec pom to add tez dependency compile |
| - vim {LINKIS_HOME}/lib/linkis-engineconn-plugins/hive/dist/3.1.3/conf/linkis-engineconn.properties and update linkis.hive.engine.type=tez |
| - sh linkis-daemon.sh restart linkis-cg-manager |
| |
| |
| The `hive` version supported by default is 3.1.3, if you want to modify the `hive` version, you can find the `linkis-engineplugin-hive` module, modify the \<hive.version\> tag, and then compile this module separately Can |
| |
| [EngineConnPlugin engine plugin installation](../deployment/install-engineconn.md) |
| |
| ## 3. Engine usage |
| |
| ### 3.1 Submitting tasks via `Linkis-cli` |
| |
| ```shell |
| sh ./bin/linkis-cli -engineType hive-3.1.3 \ |
| -codeType hql -code "show databases" \ |
| -submitUser hadoop -proxyUser hadoop |
| ``` |
| |
| More `Linkis-Cli` command parameter reference: [`Linkis-Cli` usage](../user-guide/linkiscli-manual.md) |
| |
| ### 3.2 Submit tasks through Linkis SDK |
| |
| `Linkis` provides `SDK` of `Java` and `Scala` to submit tasks to `Linkis` server. For details, please refer to [JAVA SDK Manual](../user-guide/sdk-manual.md). |
| For the `Hive` task, you only need to modify `EngineConnType` and `CodeType` parameters in `Demo`: |
| |
| ```java |
| Map<String, Object> labels = new HashMap<String, Object>(); |
| labels.put(LabelKeyConstant.ENGINE_TYPE_KEY, "hive-3.1.3"); // required engineType Label |
| labels.put(LabelKeyConstant.USER_CREATOR_TYPE_KEY, "hadoop-IDE");// required execute user and creator |
| labels.put(LabelKeyConstant.CODE_TYPE_KEY, "hql"); // required codeType |
| ``` |
| |
| ## 4. Engine configuration instructions |
| |
| ### 4.1 Default Configuration Description |
| | Configuration | Default | Required | Description | |
| | ------------------------ | ------------------- | ---| ------------------------------------------- | |
| | wds.linkis.rm.instance | 10 | no | engine maximum concurrency | |
| | wds.linkis.engineconn.java.driver.memory | 1g | No | engine initialization memory size | |
| | wds.linkis.engineconn.max.free.time | 1h | no | engine idle exit time | |
| |
| ### 4.2 Queue resource configuration |
| The `MapReduce` task of `hive` needs to use `yarn` resources, so a queue needs to be set |
| |
|  |
| |
| ### 4.3 Configuration modification |
| If the default parameters are not satisfied, there are the following ways to configure some basic parameters |
| |
| #### 4.3.1 Management Console Configuration |
| |
|  |
| |
| Note: After modifying the configuration under the `IDE` tag, you need to specify `-creator IDE` to take effect (other tags are similar), such as: |
| |
| ```shell |
| sh ./bin/linkis-cli -creator IDE \ |
| -engineType hive-3.1.3 -codeType hql \ |
| -code "show databases" \ |
| -submitUser hadoop -proxyUser hadoop |
| ``` |
| |
| #### 4.3.2 Task interface configuration |
| Submit the task interface, configure it through the parameter `params.configuration.runtime` |
| |
| ```shell |
| Example of http request parameters |
| { |
| "executionContent": {"code": "show databases;", "runType": "sql"}, |
| "params": { |
| "variable": {}, |
| "configuration": { |
| "runtime": { |
| "wds.linkis.rm.instance":"10" |
| } |
| } |
| }, |
| "labels": { |
| "engineType": "hive-3.1.3", |
| "userCreator": "hadoop-IDE" |
| } |
| } |
| ``` |
| |
| ### 4.4 Engine related data table |
| |
| `Linkis` is managed through engine tags, and the data table information involved is as follows. |
| |
| ``` |
| linkis_ps_configuration_config_key: Insert the key and default values โโโโof the configuration parameters of the engine |
| linkis_cg_manager_label: insert engine label such as: hive-3.1.3 |
| linkis_ps_configuration_category: Insert the directory association of the engine |
| linkis_ps_configuration_config_value: The configuration that the insertion engine needs to display |
| linkis_ps_configuration_key_engine_relation: The relationship between the configuration item and the engine |
| ``` |
| |
| The initial data related to the engine in the table is as follows |
| |
| ```sql |
| -- set variable |
| SET @HIVE_LABEL="hive-3.1.3"; |
| SET @HIVE_ALL=CONCAT('*-*,',@HIVE_LABEL); |
| SET @HIVE_IDE=CONCAT('*-IDE,',@HIVE_LABEL); |
| |
| -- engine label |
| insert into `linkis_cg_manager_label` (`label_key`, `label_value`, `label_feature`, `label_value_size`, `update_time`, `create_time`) VALUES ('combined_userCreator_engineType', @HIVE_ALL, 'OPTIONAL', 2, now(), now()); |
| insert into `linkis_cg_manager_label` (`label_key`, `label_value`, `label_feature`, `label_value_size`, `update_time`, `create_time`) VALUES ('combined_userCreator_engineType', @HIVE_IDE, 'OPTIONAL', 2, now(), now()); |
| |
| select @label_id := id from linkis_cg_manager_label where `label_value` = @HIVE_IDE; |
| insert into linkis_ps_configuration_category (`label_id`, `level`) VALUES (@label_id, 2); |
| |
| -- configuration key |
| INSERT INTO `linkis_ps_configuration_config_key` (`key`, `description`, `name`, `default_value`, `validate_type`, `validate_range`, `is_hidden`, `is_advanced`, `level`, `treeName`, `engine_conn_type`) VALUES ('wds.linkis.rm.instance', 'range: 1-20, unit: piece', 'hive engine maximum concurrent number', '10', 'NumInterval', '[1,20]', '0 ', '0', '1', 'Queue resource', 'hive'); |
| INSERT INTO `linkis_ps_configuration_config_key` (`key`, `description`, `name`, `default_value`, `validate_type`, `validate_range`, `is_hidden`, `is_advanced`, `level`, `treeName`, `engine_conn_type`) VALUES ('wds.linkis.engineconn.java.driver.memory', 'Value range: 1-10, unit: G', 'hive engine initialization memory size', '1g', 'Regex', '^([ 1-9]|10)(G|g)$', '0', '0', '1', 'hive engine settings', 'hive'); |
| INSERT INTO `linkis_ps_configuration_config_key` (`key`, `description`, `name`, `default_value`, `validate_type`, `validate_range`, `is_hidden`, `is_advanced`, `level`, `treeName`, `engine_conn_type`) VALUES ('hive.client.java.opts', 'hive client process parameters', 'jvm parameters when the hive engine starts','', 'None', NULL, '1', '1', '1', 'hive engine settings', 'hive'); |
| INSERT INTO `linkis_ps_configuration_config_key` (`key`, `description`, `name`, `default_value`, `validate_type`, `validate_range`, `is_hidden`, `is_advanced`, `level`, `treeName`, `engine_conn_type`) VALUES ('mapred.reduce.tasks', 'Range: -1-10000, unit: number', 'reduce number', '-1', 'NumInterval', '[-1,10000]', '0', '1', '1', 'hive resource settings', 'hive'); |
| INSERT INTO `linkis_ps_configuration_config_key` (`key`, `description`, `name`, `default_value`, `validate_type`, `validate_range`, `is_hidden`, `is_advanced`, `level`, `treeName`, `engine_conn_type`) VALUES ('wds.linkis.engineconn.max.free.time', 'Value range: 3m,15m,30m,1h,2h', 'Engine idle exit time','1h', 'OFT', '[\ "1h\",\"2h\",\"30m\",\"15m\",\"3m\"]', '0', '0', '1', 'hive engine settings', ' hive'); |
| |
| -- key engine relation |
| insert into `linkis_ps_configuration_key_engine_relation` (`config_key_id`, `engine_type_label_id`) |
| (select config.id as `config_key_id`, label.id AS `engine_type_label_id` FROM linkis_ps_configuration_config_key config |
| INNER JOIN linkis_cg_manager_label label ON config.engine_conn_type = 'hive' and label_value = @HIVE_ALL); |
| |
| -- engine default configuration |
| insert into `linkis_ps_configuration_config_value` (`config_key_id`, `config_value`, `config_label_id`) |
| (select `relation`.`config_key_id` AS `config_key_id`, '' AS `config_value`, `relation`.`engine_type_label_id` AS `config_label_id` FROM linkis_ps_configuration_key_engine_relation relation |
| INNER JOIN linkis_cg_manager_label label ON relation.engine_type_label_id = label.id AND label.label_value = @HIVE_ALL); |
| ``` |
| |
| ## 5. Hive modification log display |
| The default log interface does not display `application_id` and the number of `task` completed, users can output the log according to their needs |
| The code blocks that need to be modified in the `log4j2-engineconn.xml/log4j2.xml` configuration file in the engine are as follows |
| 1. Need to add under the `appenders` component |
| ```xml |
| <Send name="SendPackage" > |
| <PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss.SSS} %-5level [%t] %logger{36} %L %M - %msg%xEx%n"/> |
| </Send> |
| ``` |
| 2. Need to add under `root` component |
| ```xml |
| <appender-ref ref="SendPackage"/> |
| ``` |
| 3. Need to add under `loggers` component |
| ```xml |
| <logger name="org.apache.hadoop.hive.ql.exec.StatsTask" level="info" additivity="true"> |
| <appender-ref ref="SendPackage"/> |
| </logger> |
| ``` |
| After making the above related modifications, the log can add task `task` progress information, which is displayed in the following style |
| ``` |
| 2022-04-08 11:06:50.228 INFO [Linkis-Default-Scheduler-Thread-3] SessionState 1111 printInfo - Status: Running (Executing on YARN cluster with App id application_1631114297082_432445) |
| 2022-04-08 11:06:50.248 INFO [Linkis-Default-Scheduler-Thread-3] SessionState 1111 printInfo - Map 1: -/- Reducer 2: 0/1 |
| 2022-04-08 11:06:52.417 INFO [Linkis-Default-Scheduler-Thread-3] SessionState 1111 printInfo - Map 1: 0/1 Reducer 2: 0/1 |
| 2022-04-08 11:06:55.060 INFO [Linkis-Default-Scheduler-Thread-3] SessionState 1111 printInfo - Map 1: 0(+1)/1 Reducer 2: 0/1 |
| 2022-04-08 11:06:57.495 INFO [Linkis-Default-Scheduler-Thread-3] SessionState 1111 printInfo - Map 1: 1/1 Reducer 2: 0(+1)/1 |
| 2022-04-08 11:06:57.899 INFO [Linkis-Default-Scheduler-Thread-3] SessionState 1111 printInfo - Map 1: 1/1 Reducer 2: 1/1 |
| ``` |
| |
| An example of a complete `xml` configuration file is as follows: |
| ```xml |
| <!-- |
| ~ Copyright 2019 WeBank |
| ~ |
| ~ Licensed under the Apache License, Version 2.0 (the "License"); |
| ~ you may not use this file except in compliance with the License. |
| ~ You may obtain a copy of the License at |
| ~ |
| ~ http://www.apache.org/licenses/LICENSE-2.0 |
| ~ |
| ~ Unless required by applicable law or agreed to in writing, software |
| ~ distributed under the License is distributed on an "AS IS" BASIS, |
| ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| ~ See the License for the specific language governing permissions and |
| ~ limitations under the License. |
| --> |
| |
| <configuration status="error" monitorInterval="30"> |
| <appenders> |
| <Console name="Console" target="SYSTEM_OUT"> |
| <ThresholdFilter level="INFO" onMatch="ACCEPT" onMismatch="DENY"/> |
| <PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss.SSS} %-5level [%t] %logger{36} %L %M - %msg%xEx%n"/> |
| </Console> |
| |
| <Send name="Send" > |
| <Filters> |
| <ThresholdFilter level="WARN" onMatch="ACCEPT" onMismatch="DENY" /> |
| </Filters> |
| <PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss.SSS} %-5level [%t] %logger{36} %L %M - %msg%xEx%n"/> |
| </Send> |
| |
| <Send name="SendPackage" > |
| <PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss.SSS} %-5level [%t] %logger{36} %L %M - %msg%xEx%n"/> |
| </Send> |
| |
| <Console name="stderr" target="SYSTEM_ERR"> |
| <ThresholdFilter level="ERROR" onMatch="ACCEPT" onMismatch="DENY" /> |
| <PatternLayout pattern="%d{HH:mm:ss.SSS} %-5level %class{36} %L %M - %msg%xEx%n"/> |
| </Console> |
| </appenders> |
| |
| <loggers> |
| <root level="INFO"> |
| <appender-ref ref="stderr"/> |
| <appender-ref ref="Console"/> |
| <appender-ref ref="Send"/> |
| <appender-ref ref="SendPackage"/> |
| </root> |
| <logger name="org.apache.hadoop.hive.ql.exec.StatsTask" level="info" additivity="true"> |
| <appender-ref ref="SendPackage"/> |
| </logger> |
| <logger name="org.springframework.boot.diagnostics.LoggingFailureAnalysisReporter" level="error" additivity="true"> |
| <appender-ref ref="stderr"/> |
| </logger> |
| <logger name="com.netflix.discovery" level="warn" additivity="true"> |
| <appender-ref ref="Send"/> |
| </logger> |
| <logger name="org.apache.hadoop.yarn" level="warn" additivity="true"> |
| <appender-ref ref="Send"/> |
| </logger> |
| <logger name="org.springframework" level="warn" additivity="true"> |
| <appender-ref ref="Send"/> |
| </logger> |
| <logger name="org.apache.linkis.server.security" level="warn" additivity="true"> |
| <appender-ref ref="Send"/> |
| </logger> |
| <logger name="org.apache.hadoop.hive.ql.exec.mr.ExecDriver" level="fatal" additivity="true"> |
| <appender-ref ref="Send"/> |
| </logger> |
| <logger name="org.apache.hadoop.hdfs.KeyProviderCache" level="fatal" additivity="true"> |
| <appender-ref ref="Send"/> |
| </logger> |
| <logger name="org.spark_project.jetty" level="ERROR" additivity="true"> |
| <appender-ref ref="Send"/> |
| </logger> |
| <logger name="org.eclipse.jetty" level="ERROR" additivity="true"> |
| <appender-ref ref="Send"/> |
| </logger> |
| <logger name="org.springframework" level="ERROR" additivity="true"> |
| <appender-ref ref="Send"/> |
| </logger> |
| <logger name="org.reflections.Reflections" level="ERROR" additivity="true"> |
| <appender-ref ref="Send"/> |
| </logger> |
| |
| <logger name="org.apache.hadoop.ipc.Client" level="ERROR" additivity="true"> |
| <appender-ref ref="Send"/> |
| </logger> |
| |
| </loggers> |
| </configuration> |
| ``` |