tree: 287a7dc123e8e952ed07e22e2e943ca51b486035 [path history] [tgz]
  1. src/
  2. pom.xml
  3. README.md
tools/client-simulation-yarn/README.md

Uniffle Client Simulation On Yarn - Usage Guide

Currently, we have evaluated the performance of the flush operation using the Uniffle server‘s flush event recording and flush benchmark feature. This allows us to assess the server’s maximum capability to handle flush block requests for small blocks (e.g., 1 KiB) and the write throughput limit for large blocks (e.g., 1 MiB).

However, there may also be performance bottlenecks between the server receiving requests and the actual flush operation. Therefore, we need a simulated client that continuously sends data to the server.

Parameter Description

Parameter NameDefault ValueDescription
uniffle.client.sim.serverIdNoneUniffle server ID
uniffle.client.sim.container.num3Number of containers to start in the Yarn application, which corresponds to the number of concurrent client processes
uniffle.client.sim.threadCount1Number of concurrent threads running in each container process. The actual number of working threads is threadCount + 1 when each thread is concurrent.
uniffle.client.sim.queueNamedefaultYarn resource queue name
uniffle.client.sim.jarPath.listNoneHDFS addresses of additional JARs or other resources to download to AM or Task local, separated by commas (e.g., HDFS address of RSS shaded JAR)
uniffle.client.sim.tmp.hdfs.pathNoneA writable HDFS address for uploading temporary application resources
uniffle.client.sim.shuffleCount1Number of shuffles included in a single sendShuffleData request
uniffle.client.sim.partitionCount1Number of partitions included in each shuffle of a single sendShuffleData request
uniffle.client.sim.blockCount1Number of blocks included in each partition of a single sendShuffleData request
uniffle.client.sim.blockSize1024Size of each block in a single sendShuffleData request
uniffle.client.sim.am.vCores8Number of virtual cores specified when requesting the Application Master (AM)
uniffle.client.sim.am.memory4096Memory size (in MB) specified when requesting the AM
uniffle.client.sim.container.vCores2Number of virtual cores specified when requesting task containers
uniffle.client.sim.container.memory2048Memory size (in MB) specified when requesting task containers
uniffle.client.sim.am.jvm.optsNoneAdditional JVM options for debugging when the execution result is abnormal
uniffle.client.sim.container.jvm.optsNoneAdditional JVM options for debugging when the execution result is abnormal

Running Example

  1. Change to the Hadoop directory and execute the test on Yarn program:
cd $HADOOP_HOME
  1. Execute the example command:
$ bin/yarn jar rss-client-simulation-yarn-0.11.0-SNAPSHOT.jar \
-Duniffle.client.sim.serverId=<UNIFFLE_SERVER_ID> \
-Duniffle.client.sim.container.num=1000 \
-Duniffle.client.sim.queueName=<YOUR_QUEUE_NAME>  \
-Duniffle.client.sim.jarPath.list=hdfs://ns1/tmp/rss-client-spark3-shaded.jar  \
-Duniffle.client.sim.tmp.hdfs.path=hdfs://ns1/user/xx/tmp/uniffle-client-sim/  \
-Duniffle.client.sim.shuffleCount=5 \
-Duniffle.client.sim.partitionCount=50 \
-Duniffle.client.sim.blockCount=100 \
-Duniffle.client.sim.blockSize=10240 \
-Duniffle.client.sim.threadCount=10
  1. Example Output:
24/12/30 15:03:47 INFO simulator.UniffleClientSimOnYarnClient: appId: application_1729845342052_5295913
...
Application killed: application_1729845342052_5295913
Application status: KILLED

This guide provides a comprehensive overview of how to run the Uniffle Client simulator on Yarn, including parameter descriptions and a step-by-step example. Adjust the parameters as needed for your specific use case.