commit | 37c51754299ca8b6ebd0cd83d3a9c1c86cb26c4d | [log] [tgz] |
---|---|---|
author | gnehil <adamlee489@gmail.com> | Wed Apr 10 15:32:51 2024 +0800 |
committer | GitHub <noreply@github.com> | Wed Apr 10 15:32:51 2024 +0800 |
tree | d97752c2434655994e00aeb00f12455895f2c19b | |
parent | 5f676bb8f43b8cd5be03caa6b3b73244fdd37e5e [diff] |
[feature] support variant type (#197)
More information about compilation and usage, please visit Spark Doris Connector
You need to copy customer_env.sh.tpl to customer_env.sh before build and you need to configure it before build.
git clone git@github.com:apache/doris-spark-connector.git cd doris-spark-connector/spark-doris-connector ./build.sh
$ docker pull apache/doris:build-env-ldb-toolchain-latest
the result of compile jar is like:spark-doris-connector-3.1_2.12-1.0.0-SNAPSHOT.jar
download spark for https://spark.apache.org/downloads.html .if in china there have a good choice of tencent link https://mirrors.cloud.tencent.com/apache/spark/spark-3.1.2/
#download wget https://mirrors.cloud.tencent.com/apache/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz #decompression tar -xzvf spark-3.1.2-bin-hadoop3.2.tgz
vim /etc/profile export SPARK_HOME=/your_parh/spark-3.1.2-bin-hadoop3.2 export PATH=$PATH:$SPARK_HOME/bin source /etc/profile
cp /your_path/spark-doris-connector/target/spark-doris-connector-3.1_2.12-1.0.0-SNAPSHOT.jar $SPARK_HOME/jars
created doris database and table。
create database mongo_doris; use mongo_doris; CREATE TABLE data_sync_test_simple ( _id VARCHAR(32) DEFAULT '', id VARCHAR(32) DEFAULT '', user_name VARCHAR(32) DEFAULT '', member_list VARCHAR(32) DEFAULT '' ) DUPLICATE KEY(_id) DISTRIBUTED BY HASH(_id) BUCKETS 10 PROPERTIES("replication_num" = "1"); INSERT INTO data_sync_test_simple VALUES ('1','1','alex','123');
import org.apache.doris.spark._ val dorisSparkRDD = sc.dorisRDD( tableIdentifier = Some("mongo_doris.data_sync_test"), cfg = Some(Map( "doris.fenodes" -> "127.0.0.1:8030", "doris.request.auth.user" -> "root", "doris.request.auth.password" -> "" )) ) dorisSparkRDD.collect()
spark.yarn.jars=hdfs:///spark-jars/doris-spark-connector-3.1.2-2.12-1.0.0.jar
Link:https://github.com/apache/doris/discussions/9486
dorisSparkDF = spark.read.format("doris") .option("doris.table.identifier", "mongo_doris.data_sync_test") .option("doris.fenodes", "127.0.0.1:8030") .option("user", "root") .option("password", "") .load() # show 5 lines data dorisSparkDF.show(5)
doris | spark |
---|---|
BOOLEAN | BooleanType |
TINYINT | ByteType |
SMALLINT | ShortType |
INT | IntegerType |
BIGINT | LongType |
LARGEINT | StringType |
FLOAT | FloatType |
DOUBLE | DoubleType |
DECIMAL(M,D) | DecimalType(M,D) |
DATE | DateType |
DATETIME | TimestampType |
CHAR(L) | StringType |
VARCHAR(L) | StringType |
STRING | StringType |
ARRAY | ARRAY |
MAP | MAP |
STRUCT | STRUCT |
If you find any bugs, feel free to file a GitHub issue or fix it by submitting a pull request.
Contact us through the following mailing list.
Name | Scope | |||
---|---|---|---|---|
dev@doris.apache.org | Development-related discussions | Subscribe | Unsubscribe | Archives |