tree: 351edaed5fbd1dd574370696930b047347edaf13 [path history] [tgz]
  1. models/
  2. queries/
  3. __init__.py
  4. build.gradle
  5. monitor.py
  6. nexmark_launcher.py
  7. nexmark_perf.py
  8. nexmark_util.py
  9. README.md
sdks/python/apache_beam/testing/benchmarks/nexmark/README.md

How to run a python nexmark benchmark

Batch Mode

For batch mode, a file needs to be generated first by running java suite and writing events to a file.

Direct Runner

./gradlew :sdks:java:testing:nexmark:run \
    -Pnexmark.runner=":runners:direct-java" \
    -Pnexmark.args="--query=0 --runner=DirectRunner --numEvents=100000 --manageResources=false --monitorJobs=true --enforceEncodability=true --enforceImmutability=true --generateEventFilePathPrefix=/tmp/eventfile"

./gradlew :sdks:python:apache_beam:testing:benchmarks:nexmark:run \
    -Pnexmark.args="--query=0 --num_events=100000 --runner=DirectRunner --input=/tmp/eventfile\*"

Dataflow Runner

RUN_DATA=$(uuidgen)

./gradlew :sdks:java:testing:nexmark:run \
    -Pnexmark.runner=":runners:direct-java" \
    -Pnexmark.args="--query=0 --runner=DirectRunner --numEvents=100000 --manageResources=false --monitorJobs=true --enforceEncodability=true --enforceImmutability=true --generateEventFilePathPrefix=gs://temp-storage-for-perf-tests/nexmark/eventfile/$RUN_DATA"

./gradlew :sdks:python:apache_beam:testing:benchmarks:nexmark:run \
    -Pnexmark.args="--query=0 --num_events=1000000 --runner=DataflowRunner --project=apache-beam-testing --region=us-central1 --temp_location=gs://temp-storage-for-perf-tests/nexmark/PythonQuery0/ --staging_location=gs://temp-storage-for-perf-tests/nexmark/PythonQuery0/ --input=gs://temp-storage-for-perf-tests/nexmark/eventfile/$RUN_DATA\*"

Streaming mode

First generate and publish events to pubsub using java nexmark suite, exmaple:

./gradlew :sdks:java:testing:nexmark:run \
    -Pnexmark.runner=":runners:google-cloud-dataflow-java"
    -Pnexmark.args=" --runner=DataflowRunner --suite=SMOKE --streamTimeout=60 --query=0 --streaming=true --project=apache-beam-testing --region=YOUR_REGION --workerMachineType=n1-highmem-8 --gcpTempLocation=YOUR_TEMP_LOCATION --stagingLocation=YOUR_STAGING_LOCATION --sourceType=PUBSUB --pubSubMode=PUBLISH_ONLY --pubsubTopic=YOUR_TOPIC_NAME --resourceNameMode=VERBATIM --manageResources=false --monitorJobs=false --numEventGenerators=64 --numWorkers=16 --maxNumWorkers=16 --firstEventRate=50000 --nextEventRate=50000 --isRateLimited=true --avgPersonByteSize=500 --avgAuctionByteSize=500 --avgBidByteSize=500 --probDelayedEvent=0.000001 --occasionalDelaySec=60 --numEvents=3000000 --experiments=enable_custom_pubsub_sink --pubsubMessageSerializationMethod=TO_STRING"

Direct Runner

python nexmark_launcher.py --query 5 --num_events 3000000 --streaming --runner DirectRunner --topic_name YOUR_TOPIC_NAME --subscription_name YOUR_SUB_NAME --project YOUR_PROJECT_NAME --region YOUR_REGION

Dataflow Runner

python nexmark_launcher.py --query 5 --num_events 3000000 --streaming --runner DataflowRunner --num_workers 16 --machine_type n1-highmem-8 --topic_name YOUR_TOPIC_NAME --subscription_name YOUR_SUB_NAME --project YOUR_PROJECT_NAME --region YOUR_REGION --temp_location YOUR_TEMP_LOCATION --staging_location YOUR_STAGING_LOCATION --sdk_location YOUR_SDK_LOCATION