docs/user-guide/optimizing-benchmarks/running-distributed-load.md - solr-orbit - Git at Google

 ---
 title: Running Distributed Load
 parent: Optimizing Benchmarks
 grand_parent: User Guide
 nav_order: 30
 ---

 # Running Distributed Load

 By default, Apache Solr Orbit generates all load from the machine where you run the `solr-orbit` command. For large clusters or high-throughput benchmarks, a single machine may become the bottleneck before Solr does. Distributed load generation lets you spread the workload across multiple machines.

 ## Architecture

 In distributed mode, one machine acts as the **coordinator** and one or more additional machines act as **workers**. The coordinator:

 - Parses and distributes the workload
 - Sends task assignments to each worker
 - Collects and aggregates metrics from all workers

 Each worker:
 - Executes its assigned portion of the load against the Solr cluster
 - Sends metrics back to the coordinator

 The coordinator and workers communicate over the network. All machines must be able to reach each other and the Solr cluster.

 ## Prerequisites

 - The same version of Apache Solr Orbit must be installed on all coordinator and worker machines
 - All machines must have network access to the Solr cluster
 - Workers must have the workload data files available at the same path as the coordinator (or accessible via a shared filesystem)

 ## Configuration

 ### Start the benchmark daemon on each worker

 On each worker machine, start the benchmark daemon:

 ```bash
 solr-orbitd start --node-ip WORKER_IP --coordinator-ip COORDINATOR_IP
 ```

 Replace `WORKER_IP` with the IP address of that worker machine and `COORDINATOR_IP` with the IP address of the machine that will run `solr-orbit run`.

 ### Run the benchmark from the coordinator

 On the coordinator machine, pass the worker IPs via `--worker-ips`:

 ```bash
 solr-orbit run \
   --workload nyc_taxis \
   --pipeline benchmark-only \
   --target-hosts solr1:8983,solr2:8983,solr3:8983 \
   --worker-ips 192.168.1.10,192.168.1.11,192.168.1.12
 ```

 The coordinator automatically divides the corpus and task schedule across the specified workers.

 ## How load is divided

 The corpus is partitioned by line ranges in the NDJSON data files. Each worker receives a non-overlapping slice of documents to index. For query tasks, each worker runs the full query schedule with the specified `clients` count, so the effective query rate is `clients × number_of_workers`.

 ## Example: 3-worker setup

 ```bash
 # On worker-1 (192.168.1.10):
 solr-orbitd start --node-ip 192.168.1.10 --coordinator-ip 192.168.1.1

 # On worker-2 (192.168.1.11):
 solr-orbitd start --node-ip 192.168.1.11 --coordinator-ip 192.168.1.1

 # On worker-3 (192.168.1.12):
 solr-orbitd start --node-ip 192.168.1.12 --coordinator-ip 192.168.1.1

 # On the coordinator (192.168.1.1):
 solr-orbit run \
   --workload nyc_taxis \
   --pipeline benchmark-only \
   --target-hosts solr1:8983,solr2:8983 \
   --worker-ips 192.168.1.10,192.168.1.11,192.168.1.12 \
   --user-tag "workers:3"
 ```

 ## Stopping workers

 When the benchmark is complete, stop each worker daemon:

 ```bash
 solr-orbitd stop --node-ip 192.168.1.10 --coordinator-ip 192.168.1.1
 ```

 Check daemon status on any worker:

 ```bash
 solr-orbitd status
 ```

 See the [solr-orbitd reference](../../reference/commands/benchmarkd.html) for full daemon documentation.

 ## When to use distributed load

 Consider distributed load generation when:

 - A single machine cannot sustain the target query rate (CPU or network saturated on the load generator, not on Solr)
 - You want to simulate many independent clients connecting from different source IPs
 - Your indexing throughput is limited by the speed of reading and sending documents from the coordinator
	---
	title: Running Distributed Load
	parent: Optimizing Benchmarks
	grand_parent: User Guide
	nav_order: 30
	---

	# Running Distributed Load

	By default, Apache Solr Orbit generates all load from the machine where you run the `solr-orbit` command. For large clusters or high-throughput benchmarks, a single machine may become the bottleneck before Solr does. Distributed load generation lets you spread the workload across multiple machines.

	## Architecture

	In distributed mode, one machine acts as the coordinator and one or more additional machines act as workers. The coordinator:

	- Parses and distributes the workload
	- Sends task assignments to each worker
	- Collects and aggregates metrics from all workers

	Each worker:
	- Executes its assigned portion of the load against the Solr cluster
	- Sends metrics back to the coordinator

	The coordinator and workers communicate over the network. All machines must be able to reach each other and the Solr cluster.

	## Prerequisites

	- The same version of Apache Solr Orbit must be installed on all coordinator and worker machines
	- All machines must have network access to the Solr cluster
	- Workers must have the workload data files available at the same path as the coordinator (or accessible via a shared filesystem)

	## Configuration

	### Start the benchmark daemon on each worker

	On each worker machine, start the benchmark daemon:

	```bash
	solr-orbitd start --node-ip WORKER_IP --coordinator-ip COORDINATOR_IP
	```

	Replace `WORKER_IP` with the IP address of that worker machine and `COORDINATOR_IP` with the IP address of the machine that will run `solr-orbit run`.

	### Run the benchmark from the coordinator

	On the coordinator machine, pass the worker IPs via `--worker-ips`:

	```bash
	solr-orbit run \
	--workload nyc_taxis \
	--pipeline benchmark-only \
	--target-hosts solr1:8983,solr2:8983,solr3:8983 \
	--worker-ips 192.168.1.10,192.168.1.11,192.168.1.12
	```

	The coordinator automatically divides the corpus and task schedule across the specified workers.

	## How load is divided

	The corpus is partitioned by line ranges in the NDJSON data files. Each worker receives a non-overlapping slice of documents to index. For query tasks, each worker runs the full query schedule with the specified `clients` count, so the effective query rate is `clients × number_of_workers`.

	## Example: 3-worker setup

	```bash
	# On worker-1 (192.168.1.10):
	solr-orbitd start --node-ip 192.168.1.10 --coordinator-ip 192.168.1.1

	# On worker-2 (192.168.1.11):
	solr-orbitd start --node-ip 192.168.1.11 --coordinator-ip 192.168.1.1

	# On worker-3 (192.168.1.12):
	solr-orbitd start --node-ip 192.168.1.12 --coordinator-ip 192.168.1.1

	# On the coordinator (192.168.1.1):
	solr-orbit run \
	--workload nyc_taxis \
	--pipeline benchmark-only \
	--target-hosts solr1:8983,solr2:8983 \
	--worker-ips 192.168.1.10,192.168.1.11,192.168.1.12 \
	--user-tag "workers:3"
	```

	## Stopping workers

	When the benchmark is complete, stop each worker daemon:

	```bash
	solr-orbitd stop --node-ip 192.168.1.10 --coordinator-ip 192.168.1.1
	```

	Check daemon status on any worker:

	```bash
	solr-orbitd status
	```

	See the [solr-orbitd reference](../../reference/commands/benchmarkd.html) for full daemon documentation.

	## When to use distributed load

	Consider distributed load generation when:

	- A single machine cannot sustain the target query rate (CPU or network saturated on the load generator, not on Solr)
	- You want to simulate many independent clients connecting from different source IPs
	- Your indexing throughput is limited by the speed of reading and sending documents from the coordinator