This is the cluster benchmark module for Paimon. Inspired by Nexmark.
jobmanager.memory.process.size: 4096m taskmanager.memory.process.size: 4096m taskmanager.numberOfTaskSlots: 1 parallelism.default: 16 execution.checkpointing.interval: 3min state.backend: rocksdb state.backend.incremental: trueWith this Flink configuration, you'll need 16 task manager instances in total, 8 on each EMR worker.
FLINK_HOME
environment variable. Set FLINK_HOME
to your Flink directory.mvn clean package -DskipTests -pl paimon-benchmark/paimon-cluster-benchmark/ -am
.paimon-benchmark/paimon-cluster-benchmark/target/paimon-benchmark-bin/paimon-benchmark
to the master node of your EMR cluster.paimon-benchmark/conf/benchmark.yaml
on the master node. You must change these config options:benchmark.metric.reporter.host
and flink.rest.address
: set these to the address of master node of your EMR cluster.benchmark.sink.path
is the path to which queries insert records. This should point to a non-existing path. Contents of this path will be removed before each test.paimon-benchmark
to every worker node of your EMR cluster.paimon-benchmark/bin/setup_cluster.sh
in master node. This activates the CPU metrics collector in worker nodes. Note that if you restart your Flink cluster, you must also restart the CPU metrics collectors. To stop CPU metrics collectors, run paimon-benchmark/bin/shutdown_cluster.sh
in master node.paimon-benchmark/bin/run_benchmark.sh <query> <sink>
to run <query>
for <sink>
. Currently <query>
can be q1
or all
, and sink can only be paimon
.# | Description |
---|---|
q1 | Test insert and update random primary keys with normal record size (150 bytes per record). Mimics the update of uv and pv of items in an E-commercial website. |
Results of each query consist of the following aspects:
paimon-benchmark/queries
as a SQL script.paimon-benchmark/queries/queries.yaml
to set the properties of this test. Supported properties are:sql
: An array of SQL scripts. These SQL scripts will run in the given order. Each SQL script will stop only after it produces enough number of rows.row-num
: Number of rows produced from the source for each SQL scripts.Note that each query must contain a -- __SINK_DDL_BEGIN__
and -- __SINK_DDL_END__
so that this DDL can be used for both write and read tests. See existing queries for detail.
Just add your sink to paimon-benchmark/sinks
as a yaml file, which supports the following properties:
before
: A SQL script which is appended before each test. This property is useful if your sink needs initialization (for example if your sink needs a special catalog).sink-name
: Name of the sink table. This property will replace the ${SINK_NAME}
in each query.sink-properties
: The WITH
option of the sink. This property will replace the ${DDL_TEMPLATE}
in each query.See existing sinks for detail.