blob: b0c4ffac788edaf15676c5272ec4dce1c95baa6e [file] [log] [blame]
# Topologies for measuring Storm performance
This module includes topologies designed for measuring Storm performance.
## Overview
There are two basic modes for running these topologies
- **Cluster mode:** Submits the topology to a storm cluster. This mode is useful for benchmarking. It calculates throughput and latency numbers every minute and prints them on the console.
- **In-process mode:** Uses LocalCluster to run topology. This mode helps identify bottlenecks using profilers like JProfiler from within a IDE. This mode does not print metrics.
In both the modes, a shutdown hook is setup to terminate the topology when the program that is submitting the topology is terminated.
The bundled topologies can be classified into two types.
- Topologies that measure purely the internal functioning of Storm. Such topologies do not interact with external systems like Kafka or Hdfs.
- Topologies that measure speed of I/O with external systems like Kafka and Hdfs.
Topologies that measure internal performance can be run in either in-proc or cluster modes.
Topologies that measure I/O with external systems are designed to run in cluster mode only.
## Topologies List
1. **ConstSpoutOnlyTopo:** Helps measure how fast spout can emit. This topology has a spout and is not connected to any bolts. Supports cluster mode only.
2. **ConstSpoutNullBoltTopo:** Helps measure how fast spout can send data to a bolt. Spout emits a stream of constant values to a DevNull bolt which discards the incoming tuples. Supports cluster mode only.
3. **ConstSpoutIdBoltNullBoltTopo:** Helps measure speed of messaging between spouts and bolts. Spout emits a stream of constant values to an ID bolt which clones the tuple and emits it downstream to a DevNull bolt. Supports cluster mode only.
4. **FileReadWordCountTopo:** Measures speed of word counting. The spout loads a file into memory and emits these lines in an infinite loop. Supports cluster mode only.
5. **HdfsSpoutNullBoltTopo:** Measures speed at which HdfsSpout can read from HDFS. Supports cluster mode only.
6. **StrGenSpoutHdfsBoltTopo:** Measures speed at which HdfsBolt can write to HDFS. Supports cluster mode only.
7. **KafkaClientHdfsTopo:** Measures how fast Storm can read from Kafka and write to HDFS, using the storm-kafka-client spout. Supports cluster mode only
8. **KafkaClientSpoutNullBoltTopo:** Measures the speed at which the storm-kafka-client KafkaSpout can read from Kafka. Supports cluster mode only.
## How to run ?
### In-process mode:
This mode is intended for running the topology quickly and easily from within the IDE and does not expect any command line arguments.
Simply running the Topology's main() method without any arguments will get it running. The topology runs indefinitely till the program is terminated.
### Cluster mode:
When the topology is run with one or more than one cmd line arguments, the topology is submitted to the cluster.
The first argument indicates how long the topology should be run. Often the second argument refers to a yaml config
file which contains topology configuration settings. The conf/ directory in this module contains sample config files
with names matching the corresponding topology.
These topologies can be run using the standard storm jar command.
```
bin/storm jar /path/storm-perf-1.1.0-jar-with-dependencies.jar org.apache.storm.perf.ConstSpoutNullBoltTopo 200 conf/ConstSpoutIdBoltNullBoltTopo.yaml
```