Used to test the doris external table on object storage for cloud vendors
Supported storage formats: HDFS, Alibaba Cloud OSS, Tencent Cloud COS, Huawei Cloud OBS
Supported data lake table formats: Iceberg
The following provides the example of the command line options:
sh tools/emr_storage_regression/emr_tools.sh --profile default_emr_env.sh
Or
sh tools/emr_storage_regression/emr_tools.sh --case CASE --endpoint ENDPOINT --region REGION --service SERVICE --ak AK --sk SK --host HOST --user USER --port PORT
The usage of each option is described below.
When the --case
option is set to ping
, will check Doris's connectivity on EMR:
--endpoint
, Object Storage Endpoint.
--region
, Object Storage Region.
--ak
, Object Storage Access Key.
--sk
, Object Storage Secret Key.
--host
, Doris Mysql Client IP.
--user
, Doris Mysql Client Username.
--port
, Doris Mysql Client Port.
--service
, EMR cloud vendors: ali(Alibaba), hw(Huawei), tx(tencent).
Need modify the environment variable in default_emr_env.sh
, the script will execute source default_emr_env.sh
to make the environment variable take effect.
If environment variables are configured, you can run the test script directly with the following command:
sh emr_tools.sh --profile default_emr_env.sh
ping.sql
sh emr_tools.sh --profile default_emr_env.sh
Or
Set --service
to ali
, and then test connectivity on Huawei Cloud.
sh emr_tools.sh --case ping --endpoint oss-cn-beijing-internal.aliyuncs.com --region cn-beijing --service ali --ak ak --sk sk --host 127.0.0.1 --user root --port 9030 > log
Alibaba Cloud EMR also supports testing connectivity for both Doris with DLF metadata and Doris on OSS-HDFS storage.
The DLF metadata connectivity test needs to be performed on the EMR cluster where the DLF serves as the metadata store, Default value of DLF_ENDPOINT
is datalake-vpc.cn-beijing.aliyuncs.com
, configured at ping_test/ping_poc.sh.
To test the OSS-HDFS storage connectivity, need to enable the HDFS service on the OSS storage and configure, Default value of JINDO_ENDPOINT
is cn-beijing.oss-dls.aliyuncs.com
, configured at ping_test/ping_poc.sh.
sh emr_tools.sh --profile default_emr_env.sh
Or
Set --service
to tx
, and then test connectivity on Huawei Cloud.
sh emr_tools.sh --case ping --endpoint cos.ap-beijing.myqcloud.com --region ap-beijing --service tx --ak ak --sk sk --host 127.0.0.1 --user root --port 9030 > log
sh emr_tools.sh --profile default_emr_env.sh
Or
Set --service
to hw
, and then test connectivity on Huawei Cloud.
sh emr_tools.sh --case ping --endpoint obs.cn-north-4.myhuaweicloud.com --region cn-north-4 --service hw --ak ak --sk sk --host 127.0.0.1 --user root --port 9030 > log
When the --case
option is set to data_set
, will test the query performance of Doris external table:
--test
test data set: ssb, ssb_flat, tpch, clickbench and all. Default all
.
--service
, EMR cloud vendors: ali(Alibaba), hw(Huawei), tx(tencent).
--host
, Doris Mysql Client IP.
--user
, Doris Mysql Client Username.
--port
, Doris Mysql Client Port.
Just modify the above environment variable in default_emr_env.sh
, the script will execute source default_emr_env.sh
to make the environment variable take effect.
If environment variables are configured, you can run the test script directly with the following command:
sh emr_tools.sh --profile default_emr_env.sh
To run the standard test set using the emr_tools.sh
script, you need to rewrite the object storage bucket specified by the BUCKET
variable, and then prepare data in advance and put them under the bucket. The script will generate table creation statements based on the bucket.
Now the emr_tools.sh
script supports iceberg, parquet and orc data for ssb, ssb_flat, tpch, clickbench.
BUCKET
variablehdfs:///benchmark-hdfs
analyze.sql
in the Doris Catalogrun_standard_set.sh
--service
.sh emr_tools.sh --case data_set --service ali --host 127.0.0.1 --user root --port 9030 > log
--test
option can be set to one of ssb, ssb_flat, tpch and clickbench.sh emr_tools.sh --case data_set --test ssb --service ali --host 127.0.0.1 --user root --port 9030 > log