docs/engine-usage/sqoop.md

title: Sqoop Engine sidebar_position: 9

This article mainly introduces the installation, usage and configuration of the Sqoop engine plugin in Linkis.

1. Preliminary work

1.1 Environment Installation

The Sqoop engine mainly depends on the Hadoop basic environment. If the node needs to deploy the Sqoop engine, you need to deploy the Hadoop client environment, and ![Download](https://archive.apache.org/dist/sqoop /) Install the Sqoop client.

1.2 Environment verification

Before executing the Sqoop task, use the native Sqoop to execute the test task on the node to check whether the node environment is normal.

#Verify whether the sqoop environment is available Reference example: Import the /user/hive/warehouse/hadoop/test_linkis_sqoop file data of hdfs into the mysql table test_sqoop

sqoop export \
--connect jdbc:mysql://10.10.10.10/test \
--username test \
--password test123\
--table test_sqoop \
--columns user_id,user_code,user_name,email,status \
--export-dir /user/hive/warehouse/hadoop/test_linkis_sqoop \
--update-mode allowinsert \
--verbose ;

| Environment variable name | Environment variable content | Remarks | |-----------------|----------------|-------------- -----------------------------| | JAVA_HOME | JDK installation path | Required | | HADOOP_HOME | Hadoop installation path | Required | | HADOOP_CONF_DIR | Hadoop configuration path | required | | SQOOP_HOME | Sqoop installation path | Required | | SQOOP_CONF_DIR | Sqoop configuration path | not required | | HCAT_HOME | HCAT configuration path | not required | | HBASE_HOME | HBASE configuration path | not required |

| Linkis System Parameters | Parameters | Remarks | | ------------------------------------- | --------------------- ---------- | --------------------------------------- --------------------- | | wds.linkis.hadoop.site.xml | Set sqoop to load hadoop parameter file location | Generally, no separate configuration is required, the default value is “core-site.xml;hdfs-site.xml;yarn-site.xml;mapred-site. xml” | | sqoop.fetch.status.interval | Set the interval for obtaining sqoop execution status | Generally, no separate configuration is required, the default value is 5s |

2. Engine plugin deployment

2.1 Engine plugin preparation (choose one) non-default engine

Method 1: Download the engine plug-in package directly

Linkis Engine Plugin Download

Method 2: Compile the engine plug-in separately (requires a maven environment)

# compile
cd ${linkis_code_dir}/linkis-engineconn-plugins/sqoop/
mvn clean install
# The compiled engine plug-in package is located in the following directory
${linkis_code_dir}/linkis-engineconn-plugins/sqoop/target/out/

EngineConnPlugin engine plugin installation

2.2 Upload and load engine plugins

Upload the engine package in 2.1 to the engine directory of the server

${LINKIS_HOME}/lib/linkis-engineconn-plugins

The directory structure after uploading is as follows

linkis-engineconn-plugins/
├── sqoop
│ ├── dist
│ │ └── 1.4.6
│ │ ├── conf
│ │ └── lib
│ └── plugin
│ └── 1.4.6

2.3 Engine refresh

2.3.1 Restart and refresh

Refresh the engine by restarting the linkis-cg-linkismanager service

cd ${LINKIS_HOME}/sbin
sh linkis-daemon.sh restart cg-linkismanager

2.3.2 Check if the engine is refreshed successfully

You can check whether the last_update_time of the linkis_engine_conn_plugin_bml_resources table in the database is the time to trigger the refresh.

#Login to the `linkis` database
select * from linkis_cg_engine_conn_plugin_bml_resources;

3 `Sqoop` engine usage

3.1 Submitting tasks via `Linkis-cli`

3.1.1 `hdfs` file export to `mysql`

sh linkis-cli-sqoop export \
-D mapreduce.job.queuename=ide\
--connect jdbc:mysql://10.10.10.10:9600/testdb\
--username password@123 \
--password password@123 \
--table test_sqoop_01_copy \
--columns user_id,user_code,user_name,email,status \
--export-dir /user/hive/warehouse/hadoop/test_linkis_sqoop_2 \
--update-mode allowinsert --verbose ;

3.1.2 `mysql` data import to `hive` library

`mysql` is imported into `hive` library `linkis_test_ind.test_import_sqoop_1`, table `test_import_sqoop_1` does not exist, need to add parameter `--create-hive-table`

sh linkis-cli-sqoop import -D mapreduce.job.queuename=dws\
--connect jdbc:mysql://10.10.10.10:3306/casion_test\
--username hadoop\
--password password@123 \
--table test_sqoop_01 \
--columns user_id,user_code,user_name,email,status \
--fields-terminated-by ',' \
--hive-import --create-hive-table \
--hive-database casionxia_ind\
--hive-table test_import_sqoop_1 \
--hive-drop-import-delims \
--delete-target-dir \
--input-null-non-string '\\N' \
--input-null-string '\\N' \
--verbose ;


`mysql` is imported into the `hive` library `linkis_test_ind.test_import_sqoop_1`, the table `test_import_sqoop_1` exists to remove the parameter `--create-hive-table`

sh linkis-cli-sqoop import -D mapreduce.job.queuename=dws\
--connect jdbc:mysql://10.10.10.10:9600/testdb\
--username testdb \
--password password@123 \
--table test_sqoop_01 \
--columns user_id,user_code,user_name,email,status \
--fields-terminated-by ',' \
--hive-import \
--hive-database linkis_test_ind \
--hive-table test_import_sqoop_1 \
--hive-overwrite \
--hive-drop-import-delims \
--delete-target-dir \
--input-null-non-string '\\N' \
--input-null-string '\\N' \
--verbose ;

3.2 Submit tasks through `OnceEngineConn`

The usage of OnceEngineConn is to call the createEngineConn interface of LinkisManager through LinkisManagerClient, and send the code to the created Sqoop engine, and then the Sqoop engine starts to execute, which can be performed by other systems. Calls such as Exchangeis. The usage of Client is also very simple, first create a maven project, or introduce the following dependencies into your project

<dependency>
    <groupId>org.apache.linkis</groupId>
    <artifactId>linkis-computation-client</artifactId>
    <version>${linkis.version}</version>
</dependency>

Test case:


import java.util.concurrent.TimeUnit

import java.util

import org.apache.linkis.computation.client.LinkisJobBuilder
import org.apache.linkis.computation.client.once.simple.{SimpleOnceJob, SimpleOnceJobBuilder, SubmittableSimpleOnceJob}
import org.apache.linkis.computation.client.operator.impl.{EngineConnLogOperator, EngineConnMetricsOperator, EngineConnProgressOperator}
import org.apache.linkis.computation.client.utils.LabelKeyUtils

import scala.collection.JavaConverters._

object SqoopOnceJobTest extends App {
  LinkisJobBuilder.setDefaultServerUrl("http://127.0.0.1:9001")
  val logPath = "C:\\Users\\resources\\log4j.properties"
  System.setProperty("log4j.configurationFile", logPath)
  val startUpMap = new util. HashMap[String, Any]
  startUpMap.put("wds.linkis.engineconn.java.driver.memory", "1g")
   val builder = SimpleOnceJob. builder(). setCreateService("Linkis-Client")
     .addLabel(LabelKeyUtils.ENGINE_TYPE_LABEL_KEY, "sqoop-1.4.6")
     .addLabel(LabelKeyUtils.USER_CREATOR_LABEL_KEY, "Client")
     .addLabel(LabelKeyUtils.ENGINE_CONN_MODE_LABEL_KEY, "once")
     .setStartupParams(startUpMap)
     .setMaxSubmitTime(30000)
     .addExecuteUser("freeuser")
  val onceJob = importJob(builder)
  val time = System. currentTimeMillis()
  onceJob.submit()

  println(onceJob. getId)
  val logOperator = onceJob.getOperator(EngineConnLogOperator.OPERATOR_NAME).asInstanceOf[EngineConnLogOperator]
  println(onceJob.getECMServiceInstance)
  logOperator.setFromLine(0)
  logOperator.setECMServiceInstance(onceJob.getECMServiceInstance)
  logOperator.setEngineConnType("sqoop")
  logOperator.setIgnoreKeywords("[main],[SpringContextShutdownHook]")
  var progressOperator = onceJob.getOperator(EngineConnProgressOperator.OPERATOR_NAME).asInstanceOf[EngineConnProgressOperator]
  var metricOperator = onceJob.getOperator(EngineConnMetricsOperator.OPERATOR_NAME).asInstanceOf[EngineConnMetricsOperator]
  var end = false
  var rowBefore = 1
  while (!end || rowBefore > 0){
       if(onceJob.isCompleted) {
         end = true
         metricOperator = null
       }
      logOperator.setPageSize(100)
      Utils. tryQuietly{
        val logs = logOperator.apply()
        logs.logs.asScala.foreach( log => {
          println(log)
        })
        rowBefore = logs. logs. size
    }
    Thread.sleep(3000)
    Option(metricOperator).foreach( operator => {
      if (!onceJob.isCompleted){
        println(s"Metric monitoring: ${operator.apply()}")
        println(s"Progress: ${progressOperator.apply()}")
      }
    })
  }
  onceJob. isCompleted
  onceJob.waitForCompleted()
  println(onceJob. getStatus)
  println(TimeUnit. SECONDS. convert(System. currentTimeMillis() - time, TimeUnit. MILLISECONDS) + "s")
  System. exit(0)


   def importJob(jobBuilder: SimpleOnceJobBuilder): SubmittableSimpleOnceJob = {
     jobBuilder
       .addJobContent("sqoop.env.mapreduce.job.queuename", "queue_10")
       .addJobContent("sqoop. mode", "import")
       .addJobContent("sqoop.args.connect", "jdbc:mysql://127.0.0.1:3306/exchangis")
       .addJobContent("sqoop.args.username", "free")
       .addJobContent("sqoop.args.password", "testpwd")
       .addJobContent("sqoop.args.query", "select id as order_number, sno as time from" +
         "exchangis where sno =1 and $CONDITIONS")
       .addJobContent("sqoop.args.hcatalog.database", "freedb")
       .addJobContent("sqoop.args.hcatalog.table", "zy_test")
       .addJobContent("sqoop.args.hcatalog.partition.keys", "month")
       .addJobContent("sqoop.args.hcatalog.partition.values", "3")
       .addJobContent("sqoop.args.num.mappers", "1")
       .build()
   }

   def exportJob(jobBuilder: SimpleOnceJobBuilder): SubmittableSimpleOnceJob = {
      jobBuilder
        .addJobContent("sqoop.env.mapreduce.job.queuename", "queue1")
        .addJobContent("sqoop.mode", "import")
        .addJobContent("sqoop.args.connect", "jdbc:mysql://127.0.0.1:3306/exchangis")
        .addJobContent("sqoop.args.query", "select id as order, sno as great_time from" +
          "exchangis_table where sno =1 and $CONDITIONS")
        .addJobContent("sqoop.args.hcatalog.database", "hadoop")
        .addJobContent("sqoop.args.hcatalog.table", "partition_33")
        .addJobContent("sqoop.args.hcatalog.partition.keys", "month")
        .addJobContent("sqoop.args.hcatalog.partition.values", "4")
        .addJobContent("sqoop.args.num.mappers", "1")
        .build()
   }

4 Engine configuration instructions

4.1 Default Configuration Description

4.2 Import and export parameters

4.3 Import control parameters

| parameter | key | description | | -------------------------------------------------- -------------------------------------------------- ------------------ | ------------------------------- -------- | ----------------------------------------- -------------------------------------------------- ----------------------- | | --append | sqoop.args.append | Imports data in append mode | | --as-avrodatafile | sqoop.args.as.avrodatafile | Imports data to Avro data files | | --as-parquetfile | sqoop.args.as.parquetfile | Imports data to Parquet files | | --as-sequencefile | sqoop.args.as.sequencefile | Imports data to SequenceFiles | | --as-textfile | sqoop.args.as.textfile | Imports data as plain text (default) | | --autoreset-to-one-mapper | sqoop.args.autoreset.to.one.mapper | Reset the number of mappers to one mapper if no split key available | | --boundary-query <statement> | sqoop.args.boundary.query | Set boundary query for retrieving max and min value of the primary key | | --case-insensitive | sqoop.args.case.insensitive | Data Base is case insensitive, split where condition transfrom to lower case! | | --columns <col,col,col...> | sqoop.args.columns | Columns to import from table | | --compression-codec <codec> | sqoop.args.compression.codec | Compression codec to use for import | | --delete-target-dir | sqoop.args.delete.target.dir | Imports data in delete mode | | --direct | sqoop.args.direct | Use direct import fast path | | --direct-split-size <n> | sqoop.args.direct.split.size | Split the input stream every ‘n’ bytes when importing in direct mode | | -e,--query <statement> | sqoop.args.query | Import results of SQL ‘statement’ | | --fetch-size <n> | sqoop.args.fetch.size | Set number ‘n’ of rows to fetch from the database when more rows are needed | | --inline-lob-limit <n> | sqoop.args.inline.lob.limit | Set the maximum size for an inline LOB | | -m,--num-mappers <n> | sqoop.args.num.mappers | Use ‘n’ map tasks to import in parallel | | --mapreduce-job-name <name> | sqoop.args.mapreduce.job.name | Set name for generated mapreduce job | | --merge-key <column> | sqoop.args.merge.key | Key column to use to join results | | --split-by <column-name> | sqoop.args.split.by | Column of the table used to split work units | | --table <table-name> | sqoop.args.table | Table to read | | --target-dir <dir> | sqoop.args.target.dir | HDFS plain table destination | | --validate | sqoop.args.validate | Validate the copy using the configured validator | | --validation-failurehandler <validation-failurehandler> | sqoop.args.validation.failurehandler | Fully qualified class name for ValidationFa ilureHandler | | --validation-threshold <validation-threshold> | sqoop.args.validation.threshold | Fully qualified class name for ValidationThreshold | | --validator <validator> | sqoop.args.validator | Fully qualified class name for the Validator | | --warehouse-dir <dir> | sqoop.args.warehouse.dir | HDFS parent for table destination | | --where <where clause> | sqoop.args.where | WHERE clause to use during import | | -z,--compress | sqoop.args.compress | Enable compression | | | | |

4.4 Incremental import parameters

4.5 Output line formatting parameters

4.6 Input parsing parameters

4.7 `Hive` parameters

4.8 `HBase` parameters

4.9 `HCatalog` parameters

4.10 `Accumulo` parameters

4.11 Code Generation Parameters

4.12 Generic `Hadoop` command line arguments

must preceed any tool-specific arguments,Generic options supported are