$ stop-ingest.sh
The following query scripts randomly walk the graph created by the ingesters. Each walker produce detailed statistics on query/scan times.
$ start-walkers.sh
$ stop-walker.sh
The following scripts start and stop batch walkers.
$ start-batchwalkers.sh
$ stop-batchwalkers.sh
And the following scripts start and stop scanners.
$ start-scanners.sh $ stop-scanners.sh
In addition to placing continuous load, the following scripts start and stop a service that continually collect statistics about accumulo and HDFS.
$ start-stats.sh
$ stop-stats.sh
Optionally, start the agitator to periodically kill the tabletserver and/or datanode process(es) on random nodes. You can run this script as root and it will properly start processes as the user you configured in continuous-env.sh
(HDFS_USER
for the Datanode and ACCUMULO_USER
for Accumulo processes). If you run it as yourself and the HDFS_USER
and ACCUMULO_USER
values are the same as your user, the agitator will not change users. In the case where you run the agitator as a non-privileged user which isn't the same as HDFS_USER
or ACCUMULO_USER
, the agitator will attempt to sudo
to these users, which relies on correct configuration of sudo. Also, be sure that your HDFS_USER
has password-less ssh
configured.
$ start-agitator.sh
$ stop-agitator.sh
Start all three of these services and let them run for a few hours. Then run report.pl
to generate a simple HTML report containing plots and histograms showing what has transpired.
A MapReduce job to verify all data created by continuous ingest can be run with the following command. Before running the command modify the VERIFY_*
variables in continuous-env.sh
if needed. Do not run ingest while running this command, this will cause erroneous reporting of UNDEFINED nodes. The MapReduce job will scan a reference after it has scanned the definition.
$ run-verify.sh
Each entry, except for the first batch of entries, inserted by continuous ingest references a previously flushed entry. Since we are referencing flushed entries, they should always exist. The MapReduce job checks that all referenced entries exist. If it finds any that do not exist it will increment the UNDEFINED counter and emit the referenced but undefined node. The MapReduce job produces two other counts : REFERENCED and UNREFERENCED. It is expected that these two counts are non zero. REFERENCED counts nodes that are defined and referenced. UNREFERENCED counts nodes that defined and unreferenced, these are the latest nodes inserted.
To stress accumulo, run the following script which starts a MapReduce job that reads and writes to your continuous ingest table. This MapReduce job will write out an entry for every entry in the table (except for ones created by the MapReduce job itself). Stop ingest before running this MapReduce job. Do not run more than one instance of this MapReduce job concurrently against a table.
$ run-moru.sh