Another implementation of Apache Uniffle shuffle server (Single binary, no extra dependencies)
Software: Uniffle 0.8.0 / Hadoop 3.2.2 / Spark 3.1.2
Hardware: Machine 96 cores, 512G memory, 1T * 4 SSD, network bandwidth 8GB/s
Hadoop Yarn Cluster: 1 * ResourceManager + 40 * NodeManager, every machine 4T * 4 HDD
Uniffle Cluster: 1 * Coordinator + 1 * Shuffle Server, every machine 1T * 4 NVME SSD
spark's conf
spark.executor.instances 400
spark.executor.cores 1
spark.executor.memory 2g
spark.shuffle.manager org.apache.spark.shuffle.RssShuffleManager
spark.rss.storage.type MEMORY_LOCALFILE
uniffle grpc-based server's conf
JVM XMX=30g # JDK11 + G1 rss.server.buffer.capacity 10g rss.server.read.buffer.capacity 10g rss.server.flush.thread.alive 10 rss.server.flush.threadPool.size 50 rss.server.high.watermark.write 80 rss.server.low.watermark.write 70
Rust-based shuffle-server conf
store_type = "MEMORY_LOCALFILE" [memory_store] capacity = "10G" [localfile_store] data_paths = ["/data1/uniffle", "/data2/uniffle", "/data3/uniffle", "/data4/uniffle"] healthy_check_min_disks = 0 [hybrid_store] memory_spill_high_watermark = 0.8 memory_spill_low_watermark = 0.7
type/buffer capacity | 250G (compressed) | comment |
---|---|---|
vanilla spark ess | 5.0min (2.2/2.8) | ess use 400 nodes but uniffle only one. But the rss speed is still fast! |
vanilla uniffle (grpc-based) / 10g | 5.3min (2.3m/3m) | 1.9G/s |
vanilla uniffle (grpc-based) / 300g | 5.6min (3.7m/1.9m) | GC occurs frequently / 2.5G/s |
vanilla uniffle (netty-based) / 10g | / | read failed. 2.5G/s (write is better due to zero copy) |
vanilla uniffle (netty-based) / 300g | / | app hang |
rust based shuffle server / 10g | 4.6min (2.2m/2.4m) | 2.4 G/s |
rust based shuffle server / 300g | 4min (1.5m/2.5m) | 3.5 G/s |
Compared with grpc based server, rust-based server has less memory footprint and stable performance.
And Netty is still not stable for production env.
In the future, rust-based server will use io_uring mechanism to improve writing performance.
cargo build --release --features hdfs,jemalloc
Uniffle-x currently treats all compiler warnings as error, with some dead-code warning excluded. When you are developing and really want to ignore the warnings for now, you can use ccargo --config 'build.rustflags=["-W", "warnings"]' build
to restore the default behavior. However, before submit your pr, you should fix all the warnings.
config.toml as follows:
store_type = "MEMORY_LOCALFILE" grpc_port = 21100 coordinator_quorum = ["xxxxxxx1", "xxxxxxx2] tags = ["uniffle-worker"] [memory_store] capacity = "100G" [localfile_store] data_paths = ["/data1/uniffle", "/data2/uniffle"] healthy_check_min_disks = 0 [hdfs_store] max_concurrency = 10 [hybrid_store] memory_spill_high_watermark = 0.8 memory_spill_low_watermark = 0.2 memory_single_buffer_max_spill_size = "256M" [metrics] http_port = 19998 push_gateway_endpoint = "http://xxxxxxxxxxxxxx/pushgateway"
WORKER_IP={ip} RUST_LOG=info WORKER_CONFIG_PATH=./config.toml ./uniffle-worker
Benefit from the hdfs-native crate, there is no need to setup the JAVA_HOME and relative dependencies. If HDFS store is valid, the spark client must specify the conf of spark.rss.client.remote.storage.useLocalConfAsDefault=true
cargo build --features hdfs --release
# configure the kerberos and conf env HADOOP_CONF_DIR=/etc/hadoop/conf KRB5_CONFIG=/etc/krb5.conf KRB5CCNAME=/tmp/krb5cc_2002 LOG=info ./uniffle-worker
cargo build
trace
must be enabledWORKER_IP={ip} RUST_LOG=trace ./uniffle-worker -c ./config.toml
tokio-console http://{uniffle-worker-host}:21002
cargo build --release --features memory-prof
_RJEM_MALLOC_CONF=prof:true,prof_prefix:jeprof.out ./uniffle-worker