| --- |
| { |
| "title": "CREATE-RESOURCE", |
| "language": "en" |
| } |
| --- |
| |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| |
| ## CREATE-RESOURCE |
| |
| ### Name |
| |
| CREATE RESOURCE |
| |
| ### Description |
| |
| This statement is used to create a resource. Only the root or admin user can create resources. Currently supports Spark, ODBC, S3 external resources. |
| In the future, other external resources may be added to Doris for use, such as Spark/GPU for query, HDFS/S3 for external storage, MapReduce for ETL, etc. |
| |
| grammar: |
| |
| ```sql |
| CREATE [EXTERNAL] RESOURCE "resource_name" |
| PROPERTIES ("key"="value", ...); |
| ``` |
| |
| illustrate: |
| |
| - The type of resource needs to be specified in PROPERTIES "type" = "[spark|odbc_catalog|s3]", currently supports spark, odbc_catalog, s3. |
| - PROPERTIES differs depending on the resource type, see the example for details. |
| |
| ### Example |
| |
| 1. Create a Spark resource named spark0 in yarn cluster mode. |
| |
| ```sql |
| CREATE EXTERNAL RESOURCE "spark0" |
| PROPERTIES |
| ( |
| "type" = "spark", |
| "spark.master" = "yarn", |
| "spark.submit.deployMode" = "cluster", |
| "spark.jars" = "xxx.jar,yyy.jar", |
| "spark.files" = "/tmp/aaa,/tmp/bbb", |
| "spark.executor.memory" = "1g", |
| "spark.yarn.queue" = "queue0", |
| "spark.hadoop.yarn.resourcemanager.address" = "127.0.0.1:9999", |
| "spark.hadoop.fs.defaultFS" = "hdfs://127.0.0.1:10000", |
| "working_dir" = "hdfs://127.0.0.1:10000/tmp/doris", |
| "broker" = "broker0", |
| "broker.username" = "user0", |
| "broker.password" = "password0" |
| ); |
| ``` |
| |
| Spark related parameters are as follows: |
| - spark.master: Required, currently supports yarn, spark://host:port. |
| - spark.submit.deployMode: The deployment mode of the Spark program, required, supports both cluster and client. |
| - spark.hadoop.yarn.resourcemanager.address: Required when master is yarn. |
| - spark.hadoop.fs.defaultFS: Required when master is yarn. |
| - Other parameters are optional, refer to [here](http://spark.apache.org/docs/latest/configuration.html) |
| |
| |
| |
| Working_dir and broker need to be specified when Spark is used for ETL. described as follows: |
| |
| - working_dir: The directory used by the ETL. Required when spark is used as an ETL resource. For example: hdfs://host:port/tmp/doris. |
| - broker: broker name. Required when spark is used as an ETL resource. Configuration needs to be done in advance using the `ALTER SYSTEM ADD BROKER` command. |
| - broker.property_key: The authentication information that the broker needs to specify when reading the intermediate file generated by ETL. |
| |
| 2. Create an ODBC resource |
| |
| ```sql |
| CREATE EXTERNAL RESOURCE `oracle_odbc` |
| PROPERTIES ( |
| "type" = "odbc_catalog", |
| "host" = "192.168.0.1", |
| "port" = "8086", |
| "user" = "test", |
| "password" = "test", |
| "database" = "test", |
| "odbc_type" = "oracle", |
| "driver" = "Oracle 19 ODBC driver" |
| ); |
| ``` |
| |
| The relevant parameters of ODBC are as follows: |
| - hosts: IP address of the external database |
| - driver: The driver name of the ODBC appearance, which must be the same as the Driver name in be/conf/odbcinst.ini. |
| - odbc_type: the type of the external database, currently supports oracle, mysql, postgresql |
| - user: username of the foreign database |
| - password: the password information of the corresponding user |
| - charset: connection charset |
| - There is also support for implementing custom parameters per ODBC Driver, see the description of the corresponding ODBC Driver |
| |
| 3. Create S3 resource |
| |
| ```sql |
| CREATE RESOURCE "remote_s3" |
| PROPERTIES |
| ( |
| "type" = "s3", |
| "s3.endpoint" = "bj.s3.com", |
| "s3.region" = "bj", |
| "s3.access_key" = "bbb", |
| "s3.secret_key" = "aaaa", |
| -- the followings are optional |
| "s3.connection.maximum" = "50", |
| "s3.connection.request.timeout" = "3000", |
| "s3.connection.timeout" = "1000" |
| ); |
| ``` |
| |
| If S3 resource is used for [cold hot separation](../../../../../docs/advanced/cold_hot_separation.md), we should add more required fields. |
| ```sql |
| CREATE RESOURCE "remote_s3" |
| PROPERTIES |
| ( |
| "type" = "s3", |
| "s3.endpoint" = "bj.s3.com", |
| "s3.region" = "bj", |
| "s3.access_key" = "bbb", |
| "s3.secret_key" = "aaaa", |
| -- required by cooldown |
| "s3.root.path" = "/path/to/root", |
| "s3.bucket" = "test-bucket" |
| ); |
| ``` |
| |
| S3 related parameters are as follows: |
| - Required parameters |
| - `s3.endpoint`: s3 endpoint |
| - `s3.region`:s3 region |
| - `s3.root.path`: s3 root directory |
| - `s3.access_key`: s3 access key |
| - `s3.secret_key`: s3 secret key |
| - `s3.bucket`:s3 bucket |
| - optional parameter |
| - `s3.connection.maximum`: the maximum number of s3 connections, the default is 50 |
| - `s3.connection.request.timeout`: s3 request timeout, in milliseconds, the default is 3000 |
| - `s3.connection.timeout`: s3 connection timeout, in milliseconds, the default is 1000 |
| |
| 4. Create JDBC resource |
| |
| ```sql |
| CREATE RESOURCE mysql_resource PROPERTIES ( |
| "type"="jdbc", |
| "user"="root", |
| "password"="123456", |
| "jdbc_url" = "jdbc:mysql://127.0.0.1:3316/doris_test?useSSL=false", |
| "driver_url" = "https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/jdbc_driver/mysql-connector-java-8.0.25.jar", |
| "driver_class" = "com.mysql.cj.jdbc.Driver" |
| ); |
| ``` |
| |
| JDBC related parameters are as follows: |
| - user:The username used to connect to the database |
| - password:The password used to connect to the database |
| - jdbc_url: The identifier used to connect to the specified database |
| - driver_url: The url of JDBC driver package |
| - driver_class: The class of JDBC driver |
| |
| 5. Create HDFS resource |
| |
| ```sql |
| CREATE RESOURCE hdfs_resource PROPERTIES ( |
| "type"="hdfs", |
| "hadoop.username"="user", |
| "root_path"="your_path", |
| "dfs.nameservices" = "my_ha", |
| "dfs.ha.namenodes.my_ha" = "my_namenode1, my_namenode2", |
| "dfs.namenode.rpc-address.my_ha.my_namenode1" = "nn1_host:rpc_port", |
| "dfs.namenode.rpc-address.my_ha.my_namenode2" = "nn2_host:rpc_port", |
| "dfs.client.failover.proxy.provider.my_ha" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider" |
| ); |
| ``` |
| |
| HDFS related parameters are as follows: |
| - fs.defaultFS: namenode address and port |
| - hadoop.username: hdfs username |
| - dfs.nameservices: if hadoop enable HA, please set fs nameservice. See hdfs-site.xml |
| - dfs.ha.namenodes.[nameservice ID]:unique identifiers for each NameNode in the nameservice. See hdfs-site.xml |
| - dfs.namenode.rpc-address.[nameservice ID].[name node ID]`:the fully-qualified RPC address for each NameNode to listen on. See hdfs-site.xml |
| - dfs.client.failover.proxy.provider.[nameservice ID]:the Java class that HDFS clients use to contact the Active NameNode, usually it is org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider |
| |
| 6. Create HMS resource |
| |
| HMS resource is used to create [hms catalog](../../../../lakehouse/multi-catalog/multi-catalog.md) |
| ```sql |
| CREATE RESOURCE hms_resource PROPERTIES ( |
| 'type'='hms', |
| 'hive.metastore.uris' = 'thrift://127.0.0.1:7004', |
| 'dfs.nameservices'='HANN', |
| 'dfs.ha.namenodes.HANN'='nn1,nn2', |
| 'dfs.namenode.rpc-address.HANN.nn1'='nn1_host:rpc_port', |
| 'dfs.namenode.rpc-address.HANN.nn2'='nn2_host:rpc_port', |
| 'dfs.client.failover.proxy.provider.HANN'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider' |
| ); |
| ``` |
| |
| HMS related parameters are as follows: |
| - hive.metastore.uris: hive metastore server address |
| Optional: |
| - dfs.*: If hive data is on hdfs, HDFS resource parameters should be added, or copy hive-site.xml into fe/conf. |
| - s3.*: If hive data is on s3, S3 resource parameters should be added. If using [Aliyun Data Lake Formation](https://www.aliyun.com/product/bigdata/dlf), copy hive-site.xml into fe/conf. |
| |
| 7. Create ES resource |
| |
| ```sql |
| CREATE RESOURCE es_resource PROPERTIES ( |
| "type"="es", |
| "hosts"="http://127.0.0.1:29200", |
| "nodes_discovery"="false", |
| "enable_keyword_sniff"="true" |
| ); |
| ``` |
| |
| ES related parameters are as follows: |
| - hosts: ES Connection Address, maybe one or more node, load-balance is also accepted |
| - user: username for ES |
| - password: password for the user |
| - enable_docvalue_scan: whether to enable ES/Lucene column storage to get the value of the query field, the default is true |
| - enable_keyword_sniff: Whether to probe the string segmentation type text.fields in ES, query by keyword (the default is true, false matches the content after the segmentation) |
| - nodes_discovery: Whether or not to enable ES node discovery, the default is true. In network isolation, set this parameter to false. Only the specified node is connected |
| - http_ssl_enabled: Whether ES cluster enables https access mode, the current FE/BE implementation is to trust all |
| |
| ### Keywords |
| |
| CREATE, RESOURCE |
| |
| ### Best Practice |