oak-run/README.md - jackrabbit-oak - Git at Google

 Oak Runnable Jar
 ================

 This jar contains everything you need for a simple Oak installation.

 The following runmodes are currently available:

     * backup          : Backup an existing Oak repository.
     * restore         : Restore a backup of an Oak repository.
     * benchmark       : Run benchmark tests against different Oak repository fixtures.
     * debug           : Print status information about an Oak repository.
     * compact         : Segment compaction on a TarMK repository.
     * upgrade         : Migrate existing Jackrabbit 2.x repository to Oak.
     * server          : Run the Oak Server.
     * console         : Start an interactive console.
     * explore         : Starts a GUI browser based on java swing.
     * graph           : Export the segment graph of a segment store to a file.
     * history         : Trace the history of a node
     * check           : Check the FileStore for inconsistencies
     * primary         : Run a TarMK Cold Standby primary instance
     * standby         : Run a TarMK Cold Standby standby instance
     * scalability     : Run scalability tests against different Oak repository fixtures.
     * recovery        : Run a _lastRev recovery on a MongoMK repository
     * checkpoints     : Manage checkpoints
     * tika            : Performs text extraction
     * garbage         : Identifies blob garbage on a DocumentMK repository
     * tarmkdiff       : Show changes between revisions on TarMk
     * tarmkrecovery   : Lists candidates for head journal entries
     * datastorecheck  : Consistency checker for data store
     * resetclusterid  : Resets the cluster id
     * help            : Print a list of available runmodes


 Some of the features related to Jackrabbit 2.x are provided by oak-run-jr2 jar. See
 the [Oak Runnable JR2](#jr2) section for more details.

 See the subsections below for more details on how to use these modes.

 Backup
 ------

 The 'backup' mode creates a backup from an existing oak repository. The most efficient
 way to backup the TarMK repository is to use a file system copy of the repository folder.
 The current backup implementation acts like a compaction to an enternal folder, on top of
 copying the state, it will also try to compress it, so it will significantly slower than
 what one might expect from a simple copy backup. Incremental backups (backup over an existing
 backup will still need to perform a full content diff) and will attempt to compact the diff.
 All optimisation flags used for offline compaction very much apply for this case as well.
 The FileStore backup doesn't need access to the DataStore, but if one is usually configured with
 the repository, it will need the following system property set to true in order to be able to
 perform the diffing `-Doak.backup.UseFakeBlobStore=true`. To start this mode, use:

     $ java -jar oak-run-*.jar backup /path/to/oak/repository /path/to/backup

 Restore
 -------

 The 'restore' mode imports a backup of an existing oak repository. To start this mode, use:

     $ java -jar oak-run-*.jar restore /path/to/oak/repository /path/to/backup

 Debug
 -----

 The 'debug' mode allows to obtain information about the status of the specified
 store. Currently this is only supported for the TarMK. To start this mode, use:

     $ java -jar oak-run-*.jar debug /path/to/oak/repository [id...]

 Console
 -------

 The 'console' mode allows to work with an interactive console and browse an
 existing oak repository. Type ':help' within the console to get a list of all
 supported commands. The console currently supports TarMK and MongoMK. To start
 the console for a TarMK repository, use:

     $ java -jar oak-run-*.jar console /path/to/oak/repository

 To start the console for a DocumentMK/Mongo repository, use:

     $ java -jar oak-run-*.jar console mongodb://host

 To start the console for a DocumentMK/RDB repository, use:

     $ java -jar oak-run-*.jar --rdbjdbcuser username --rdbjdbcpasswd password console jdbc:...

 To start the console connecting to a repository in read-write mode, use either of:

     $ java -jar oak-run-*.jar console --read-write /path/to/oak/repository
     $ java -jar oak-run-*.jar console --read-write mongodb://host
     $ java -jar oak-run-*.jar console --read-write --rdbjdbcuser username --rdbjdbcpasswd password console jdbc:...

 To specify FDS path while connecting to a repository, use `--fds-path` option (valid for segment and document repos):

     $ java -jar oak-run-*.jar console --fds-path /path/to-data/store /path/to/oak/repository

 Console is based on [Groovy Shell](http://groovy.codehaus.org/Groovy+Shell) and hence one
 can use all Groovy constructs. It also exposes the `org.apache.jackrabbit.oak.console.ConsoleSession`
 instance as through `session` variable. For example when using SegmentNodeStore you can
 dump the current segment info to a file

     > new File("segment.txt") << session.workingNode.segment.toString()

 In above case the `workingNode` captures the current `NodeState` which in case of
 Segment/TarMK is `SegmentNodeState`

 You can also load external script at launch time via passing an extra argument as shown
 below

     $ java -jar oak-run-*.jar console mongodb://host ":load /path/to/script.groovy"

 Explore
 -------

 The 'explore' mode starts a desktop browser GUI based on java swing which allows for read-only
 browsing of an existing oak repository.

     $ java -jar oak-run-*.jar explore /path/to/oak/repository [skip-size-check]

 Graph
 -----

 The 'graph' mode export the segment graph of a file store to a text file in the
 [Guess GDF format](https://gephi.github.io/users/supported-graph-formats/gdf-format/),
 which is easily imported into [Gephi](https://gephi.github.io).

 As the GDF format only supports integer values but the segment time stamps are encoded as long
 values an optional 'epoch' argument can be specified. If no epoch is given on the command line
 the start of the day of the last modified date of the 'journal.log' is used. The epoch specifies
 a negative offset translating all timestamps into a valid int range.

     $ java -jar oak-run-*.jar graph [File] <options>

     [File] -- Path to segment store (required)

     Option           Description
     ------           -----------
     --epoch <Long>   Epoch of the segment time stamps
                        (derived from journal.log if not
                        given)
     --output <File>  Output file (default: segments.gdf)
     --gc             Write the gc generation graph instead of the full graph
     --pattern        Regular exception specifying which
                        nodes to include (optional). Ignore
                        when --gc is specified.

 History
 -------

 Trace the history of a node backward through the revision history.

     $ java -jar oak-run-*.jar history [File] <options>

     [File] -- Path to segment store (required)

     Option             Description
     ------             -----------
     --depth <Integer>  Depth up to which to dump node states
                          (default: 0)
     --journal          journal file (default: journal.log)
     --path             Path for which to trace the history
                          (default: /)

 Check
 -----

 The 'check' mode checks the storage of the FileStore for inconsistencies.

     $ java -jar oak-run-*.jar check <options>

     --bin [Long]   read the n first bytes from binary
                      properties. -1 for all bytes.
                      (default: 0)
     --deep [Long]  enable deep consistency checking. An
                      optional long specifies the number
                      of seconds between progress
                      notifications (default:
                      9223372036854775807)
     --journal      journal file (default: journal.log)
     --path         path to the segment store (required)

 For example

     $ java -jar oak-run-*.jar check -p repository/segmentstore -d

 Checks the files in the `repository/segmentstore` directory for inconsistencies.
 It will start with the latest revision in the `journal.log` file going back revision
 by revision until a full traversal succeeds. During the traversal the current path
 will is output to the console every 1 second. When done the latest good revision is
 output.

     Searching for last good revision in journal.log
     Checking revision b82167c3-1ceb-4404-a67f-9c542e854086:240872
     Traversing /
     Error while traversing /home/users/foo: Segment 476e1abd-0ea0-44a8-ac3c-3a3bd
     Traversed 50048 nodes and 303846 properties
     Broken revision b82167c3-1ceb-4404-a67f-9c542e854086:240872
     Checking revision 84d693cb-f214-4d19-a1cd-8b5766d50fdb:250028
     Checking /home/users/foo
     Traversed 11612889 nodes and 27511640 properties
     Found latest good revision 84d693cb-f214-4d19-a1cd-8b5766d50fdb:250028
     Searched through 2 of 390523 revisions

 Primary
 -------

 The 'primary' mode starts a TarMK Cold Standby primary instance (master) listening on a TCP/IP port for connecting slaves.

     $ java -jar oak-run-*.jar primary [options] /path/to/TarMK

 The following options are available:

     --port 8023            - port to listen at
     --admissible 127.0.0.1 - admissible client IP range or host name
     --secure               - use secure connections

 Standby
 -------

 The 'standby' mode starts a TarMK Cold Standby standby instance (slave) to create or update a continuous backup from a Cold Standby primary.

     $ java -jar oak-run-*.jar standby [options] /path/to/TarMK

 The following options are available:

     --port 8023            - port to connect to
     --host 127.0.0.1       - host to connect to
     --secure               - use secure connections
     --interval 5           - schedule the slave to run continuously, connecting every n seconds

 Compact
 -------

 The 'compact' mode runs the segment compaction operation on the provided TarMK
 repository. To start this mode, use:

     $ java -jar oak-run-*.jar compact [path] <options>

     [File] -- Path to segment store (required)

     Option   Description
     ------   -----------
     --force  Force compaction and ignore non
                matching segment version

 Checkpoints
 -----------

 The 'checkpoints' mode can be used to list or remove repository checkpoints
 To start this mode, use:

     $ java -jar oak-run-*.jar checkpoints { /path/to/oak/repository | mongodb://host:port/database } [list|rm-all|rm-unreferenced|rm <checkpoint>]

 The 'list' option (treated as a default when nothing is specified) will list all existing checkpoints.
 The 'rm-all' option will wipe clean the 'checkpoints' node.
 The 'rm-unreferenced' option will remove all checkpoints except the one referenced from the async indexer (/:async@async).
 The 'rm <checkpoint>' option will remove a specific checkpoint from the repository.

 <a name="tika"></a>
 Tika
 ----

 The 'tika' mode enables performing text extraction, report generation and
 csv generation required for text extraction


     Apache Jackrabbit Oak 1.4-SNAPSHOT
     Non-option arguments:
     tika [extract|report|generate]
     report   : Generates a summary report related to binary data
     extract  : Performs the text extraction
     generate : Generates the csv data file based on configured NodeStore/BlobStore

     Option                 Description
     ------                 -----------
     -?, -h, --help         show help
     --data-file <File>     Data file in csv format containing the
                              binary metadata
     --fds-path <File>      Path of directory used by FileDataStore
     --nodestore            NodeStore detail
                              /path/to/oak/repository | mongodb:
                              //host:port/database
     --path                 Path in repository under which the
                              binaries would be searched
     --pool-size <Integer>  Size of the thread pool used to
                              perform text extraction. Defaults to
                              number of cores on the system
     --store-path <File>    Path of directory used to store
                              extracted text content
     --tika-config <File>   Tika config file path

 <a name="tika-csv"></a>
 ### CSV File Format

 Text extraction tool reads a csv file which contains details regarding those
 binary files from which text needs to be extracted. Entries in csv file look like
 below

 ```
 43844ed22d640a114134e5a25550244e8836c00c#28705,28705,"application/octet-stream",,"/content/activities/jcr:content/folderThumbnail/jcr:content"
 43844ed22d640a114134e5a25550244e8836c00c#28705,28705,"application/octet-stream",,"/content/snowboarding/jcr:content/folderThumbnail/jcr:content"
 ...
 ```

 Where the columns are in following order

 1. BlobId - Value of [Jackrabbit ContentIdentity](http://jackrabbit.apache.org/api/2.0/org/apache/jackrabbit/api/JackrabbitValue.html)
 2. Length
 3. jcr:mimeType
 4. jcr:encoding
 5. path of parent node

 The csv file can be generated programatically. For Oak based repositories
 it can be generated via `generate` command.

 ### Generate

 CSV file required for `extract` and `report` can  be generated via `generate`
 mode

     java -jar oak-run.jar tika \
     --fds-path /path/to/datastore \
     --nodestore /path/to/segmentstore --data-file dump.csv generate

 Above command would scan the NodeStore and create the csv file. This file can
 then be passed to `extract` command

 ### Report

 Tool can generate a summary report from a [csv](#tika-csv) file

     java -jar oak-run.jar tika \
         --data-file /path/to/binary-stats.csv report

 The report provides a summary like

 ```
 14:39:05.402 [main] INFO  o.a.j.o.p.tika.TextExtractorMain - MimeType Stats
         Total size         : 89.3 MB
         Total indexed size : 3.4 MB
         Total count        : 1048

                Type                 Indexed   Supported    Count       Size
 ___________________________________________________________________________________
 application/epub+zip              |      true|      true|  1       |    3.4 MB
 image/png                         |     false|      true|  544     |   40.2 MB
 image/jpeg                        |     false|      true|  444     |   34.0 MB
 image/tiff                        |     false|      true|  11      |    6.1 MB
 application/x-indesign            |     false|     false|  1       |    3.7 MB
 application/octet-stream          |     false|     false|  39      |    1.2 MB
 application/x-shockwave-flash     |     false|     false|  4       |  372.2 kB
 application/pdf                   |     false|     false|  3       |  168.3 kB
 video/quicktime                   |     false|     false|  1       |   95.9 kB
 ```

 ### Extract

 Extraction can be performed via following command

     java -cp oak-run.jar:tika-app-1.8.jar \
     org.apache.jackrabbit.oak.run.Main tika \
     --data-file binary-stats.csv \
     --store-path ./store
     --fds-path /path/to/datastore  extract

 You would need to provide the tika-app jar which contains all the parsers.
 It can be downloaded from [here](https://tika.apache.org/download.html).
 Extraction would then be performed in a multi threaded mode. Extracted text
 would be stored in the `store-path`

 Upgrade
 -------

 The 'upgrade' mode allows to migrate the contents of an existing
 Jackrabbit 2.x repository to Oak. To run the migration, use:

     $ java -jar oak-run-*.jar upgrade [--datastore] \
           /path/to/jackrabbit/repository [/path/to/jackrabbit/repository.xml] \
           { /path/to/oak/repository | mongodb://host:port/database }

 The source repository is opened from the given repository directory, and
 should not be concurrently accessed by any other client. Repository
 configuration is read from the specified configuration file, or from
 a `repository.xml` file within the repository directory if an explicit
 configuration file is not given.

 The target repository is specified either as a local filesystem path to
 a directory (which will be automatically created if it doesn't already exist)
 of a new TarMK repository or as a MongoDB client URI that specifies the
 location of a MongoDB database where a new DocumentMK repository.

 The `--datastore` option (if present) prevents the copying of binary data
 from a data store of the source repository to the target Oak repository.
 Instead the binaries are copied by reference, and you need to make the
 source data store available to the new Oak repository.

 The content migration will automatically adjust things like node type,
 privilege and user account settings that work a bit differently in Oak.
 Unsupported features like same-name-siblings are migrated on a best-effort
 basis, with no strict guarantees of completeness. Warnings will be logged
 for any content inconsistencies that might be encountered; such content
 should be manually reviewed after the migration is complete. Note that
 things like search index configuration work differently in Oak than in
 Jackrabbit 2.x, and will need to be manually recreated after the migration.
 See the relevant documentation for more details.

 Oak server mode
 ---------------

 The Oak server mode starts a NodeStore or full Oak instance with the
 standard JCR plugins and makes it available over a simple HTTP mapping
 defined in the `oak-http` component. To start this mode, use:

     $ java -jar oak-run-*.jar server [uri] [fixture] [options]

 If no arguments are specified, the command starts an in-memory repository
 and makes it available at http://localhost:8080/. Specify an `uri` and a
 `fixture` argument to change the host name and port and specify a different
 repository backend.

 The optional fixture argument allows to specify the repository implementation
 to be used. The following fixtures are currently supported:

 | Fixture       | Description                                           |
 |---------------|-------------------------------------------------------|
 | Jackrabbit(*) | Jackrabbit with the default embedded Derby  bundle PM |
 | Oak-Memory    | Oak with default in-memory storage                    |
 | Oak-MemoryNS  | Oak with default in-memory NodeStore                  |
 | Oak-Mongo     | Oak with the default Mongo backend                    |
 | Oak-Mongo-FDS | Oak with the default Mongo backend and FileDataStore  |
 | Oak-MongoNS   | Oak with the Mongo NodeStore                          |
 | Oak-Tar       | Oak with the Tar backend (aka Segment NodeStore)      |
 | Oak-Tar-FDS   | Oak with the Tar backend and FileDataStore            |

 Jackrabbit fixture requires [Oak Runnable JR2 jar](#jr2)

 Depending on the fixture the following options are available:

     --cache 100            - cache size (in MB)
     --host localhost       - MongoDB host
     --port 27101           - MongoDB port
     --db <name>            - MongoDB database (default is a generated name)
     --clusterIds           - Cluster Ids for the Mongo setup: a comma separated list of integers
     --base <file>          - Tar: Path to the base file
     --mmap <64bit?>        - TarMK memory mapping (the default on 64 bit JVMs)
     --rdbjdbcuri           - JDBC URL for RDB persistence
     --rdbjdbcuser          - JDBC username (defaults to "")
     --rdbjdbcpasswd        - JDBC password (defaults to "")
     --rdbjdbctableprefix   - for RDB persistence: prefix for table names (defaults to "")

 Examples:

     $ java -jar oak-run-*.jar server
     $ java -jar oak-run-*.jar server http://localhost:4503 Oak-Tar --base myOak
     $ java -jar oak-run-*.jar server http://localhost:4502 Oak-Mongo --db myOak --clusterIds c1,c2,c3

 See the documentation in the `oak-http` component for details about the available functionality.


 Benchmark mode
 --------------

 The benchmark mode is used for executing various micro-benchmarks. It can
 be invoked like this:

     $ java -jar oak-run-*.jar benchmark [options] [testcases] [fixtures]

 The following benchmark options (with default values) are currently supported:

     --host localhost       - MongoDB host
     --port 27101           - MongoDB port
     --db <name>            - MongoDB database (default is a generated name)
     --mongouri             - MongoDB URI (takes precedence over host, port and db)
     --dropDBAfterTest true - Whether to drop the MongoDB database after the test
     --base target          - Path to the base file (Tar setup),
     --mmap <64bit?>        - TarMK memory mapping (the default on 64 bit JVMs)
     --cache 100            - cache size (in MB)
     --wikipedia <file>     - Wikipedia dump
     --runAsAdmin false     - Run test as admin session
     --itemsToRead 1000     - Number of items to read
     --report false         - Whether to output intermediate results
     --csvFile <file>       - Optional csv file to report the benchmark results
     --concurrency <levels> - Comma separated list of concurrency levels
     --rdbjdbcuri           - JDBC URL for RDB persistence (defaults to local file-based H2)
     --rdbjdbcuser          - JDBC username (defaults to "")
     --rdbjdbcpasswd        - JDBC password (defaults to "")
     --rdbjdbctableprefix   - for RDB persistence: prefix for table names (defaults to "")

 These options are passed to the test cases and repository fixtures
 that need them. For example the Wikipedia dump option is needed by the
 WikipediaImport test case and the MongoDB address information by the
 MongoMK and SegmentMK -based repository fixtures. The cache setting
 controls the bundle cache size in Jackrabbit, the NodeState
 cache size in MongoMK, and the segment cache size in SegmentMK.

 The `--concurrency` levels can be specified as comma separated list of values,
 eg: `--concurrency 1,4,8`, which will execute the same test with the number of
 respective threads. Note that the `beforeSuite()` and `afterSuite()` are executed
 before and after the concurrency loop. eg. in the example above, the execution order
 is: `beforeSuite()`, 1x `runTest()`, 4x `runTest()`, 8x `runTest()`, `afterSuite()`.
 Tests that create their own background threads, should be executed with
 `--concurrency 1` which is the default.

 You can use extra JVM options like `-Xmx` settings to better control the
 benchmark environment. It's also possible to attach the JVM to a
 profiler to better understand benchmark results. For example, I'm
 using `-agentlib:hprof=cpu=samples,depth=100` as a basic profiling
 tool, whose results can be processed with `perl analyze-hprof.pl
 java.hprof.txt` to produce a somewhat easier-to-read top-down and
 bottom-up summaries of how the execution time is distributed across
 the benchmarked codebase.

 Some system properties are also used to control the benchmarks. For example:

     -Dwarmup=5         - warmup time (in seconds)
     -Druntime=60       - how long a single benchmark should run (in seconds)
     -Dprofile=true     - to collect and print profiling data

 The test case names like `ReadPropertyTest`, `SmallFileReadTest` and
 `SmallFileWriteTest` indicate the specific test case being run. You can
 specify one or more test cases in the benchmark command line, and
 oak-run will execute each benchmark in sequence. The benchmark code is
 located under `org.apache.jackrabbit.oak.benchmark` in the oak-run
 component. Each test case tries to exercise some tightly scoped aspect
 of the repository. You might remember many of these tests from the
 Jackrabbit benchmark reports like
 http://people.apache.org/~jukka/jackrabbit/report-2011-09-27/report.html
 that we used to produce earlier.

 Finally the benchmark runner supports the following repository fixtures:

 | Fixture       | Description                                           |
 |---------------|-------------------------------------------------------|
 | Jackrabbit    | Jackrabbit with the default embedded Derby  bundle PM |
 | Oak-Memory    | Oak with default in-memory storage                    |
 | Oak-MemoryNS  | Oak with default in-memory NodeStore                  |
 | Oak-Mongo     | Oak with the default Mongo backend                    |
 | Oak-Mongo-FDS | Oak with the default Mongo backend and FileDataStore  |
 | Oak-MongoNS   | Oak with the Mongo NodeStore                          |
 | Oak-Tar       | Oak with the Tar backend (aka Segment NodeStore)      |
 | Oak-RDB       | Oak with the DocumentMK/RDB persistence               |

 (Note that for Oak-RDB, the required JDBC drivers either need to be embedded
 into oak-run, or be specified separately in the class path. Furthermode,
 dropDBAfterTest is interpreted to drop the *tables*, not the database
 iself, if and only if they have been auto-created)

 Once started, the benchmark runner will execute each listed test case
 against all the listed repository fixtures. After starting up the
 repository and preparing the test environment, the test case is first
 executed a few times to warm up caches before measurements are
 started. Then the test case is run repeatedly for one minute
 and the number of milliseconds used by each execution
 is recorded. Once done, the following statistics are computed and
 reported:

 | Column      | Description                                           |
 |-------------|-------------------------------------------------------|
 | C           | concurrency level                                     |
 | min         | minimum time (in ms) taken by a test run              |
 | 10%         | time (in ms) in which the fastest 10% of test runs    |
 | 50%         | time (in ms) taken by the median test run             |
 | 90%         | time (in ms) in which the fastest 90% of test runs    |
 | max         | maximum time (in ms) taken by a test run              |
 | N           | total number of test runs in one minute (or more)     |

 The most useful of these numbers is probably the 90% figure, as it
 shows the time under which the majority of test runs completed and
 thus what kind of performance could reasonably be expected in a normal
 usage scenario. However, the reason why all these different numbers
 are reported, instead of just the 90% one, is that often seeing the
 distribution of time across test runs can be helpful in identifying
 things like whether a bigger cache might help.

 Finally, and most importantly, like in all benchmarking, the numbers
 produced by these tests should be taken with a large dose of salt.
 They DO NOT directly indicate the kind of application performance you
 could expect with (the current state of) Oak. Instead they are
 designed to isolate implementation-level bottlenecks and to help
 measure and profile the performance of specific, isolated features.

 How to add a new benchmark
 --------------------------

 To add a new test case to this benchmark suite, you'll need to implement
 the `Benchmark` interface and add an instance of the new test to the
 `allBenchmarks` array in the `BenchmarkRunner` class in the
 `org.apache.jackrabbit.oak.benchmark` package.

 The best way to implement the `Benchmark` interface is to extend the
 `AbstractTest` base class that takes care of most of the benchmarking
 details. The outline of such a benchmark is:

     class MyTest extends AbstracTest {
         @Override
         protected void beforeSuite() throws Exception {
             // optional, run once before all the iterations,
             // not included in the performance measurements
         }
         @Override
         protected void beforeTest() throws Exception {
             // optional, run before runTest() on each iteration,
             // but not included in the performance measurements
         }
         @Override
         protected void runTest() throws Exception {
             // required, run repeatedly during the benchmark,
             // and the time of each iteration is measured.
             // The ideal execution time of this method is
             // from a few hundred to a few thousand milliseconds.
             // Use a loop if the operation you're hoping to measure
             // is faster than that.
         }
         @Override
         protected void afterTest() throws Exception {
             // optional, run after runTest() on each iteration,
             // but not included in the performance measurements
         }
         @Override
         protected void afterSuite() throws Exception {
             // optional, run once after all the iterations,
             // not included in the performance measurements
         }
     }

 The rough outline of how the benchmark will be run is:

     test.beforeSuite();
     for (...) {
         test.beforeTest();
         recordStartTime();
         test.runTest();
         recordEndTime();
         test.afterTest();
     }
     test.afterSuite();

 You can use the `loginWriter()` and `loginReader()` methods to create admin
 and anonymous sessions. There's no need to logout those sessions (unless doing
 so is relevant to the benchmark) as they will automatically be closed after
 the benchmark is completed and the `afterSuite()` method has been called.

 Similarly, you can use the `addBackgroundJob(Runnable)` method to add
 background tasks that will be run concurrently while the main benchmark is
 executing. The relevant background thread works like this:

     while (running) {
         runnable.run();
         Thread.yield();
     }

 As you can see, the `run()` method of the background task gets invoked
 repeatedly. Such threads will automatically close once all test iterations
 are done, before the `afterSuite()` method is called.

 Scalability mode
 --------------

 The scalability mode is used for executing various scalability suites to test the
 performance of various associated tests. It can be invoked like this:

     $ java -jar oak-run-*.jar scalability [options] [suites] [fixtures]

 The following scalability options (with default values) are currently supported:

     --host localhost       - MongoDB host
     --port 27101           - MongoDB port
     --db <name>            - MongoDB database (default is a generated name)
     --dropDBAfterTest true - Whether to drop the MongoDB database after the test
     --base target          - Path to the base file (Tar setup),
     --mmap <64bit?>        - TarMK memory mapping (the default on 64 bit JVMs)
     --cache 100            - cache size (in MB)
     --csvFile <file>       - Optional csv file to report the benchmark results
     --rdbjdbcuri           - JDBC URL for RDB persistence (defaults to local file-based H2)
     --rdbjdbcuser          - JDBC username (defaults to "")
     --rdbjdbcpasswd        - JDBC password (defaults to "")

 These options are passed to the various suites and repository fixtures
 that need them. For example the the MongoDB address information by the
 MongoMK and SegmentMK -based repository fixtures. The cache setting
 controls the NodeState cache size in MongoMK, and the segment cache
 size in SegmentMK.

 You can use extra JVM options like `-Xmx` settings to better control the
 scalability suite test environment. It's also possible to attach the JVM to a
 profiler to better understand benchmark results. For example, I'm
 using `-agentlib:hprof=cpu=samples,depth=100` as a basic profiling
 tool, whose results can be processed with `perl analyze-hprof.pl
 java.hprof.txt` to produce a somewhat easier-to-read top-down and
 bottom-up summaries of how the execution time is distributed across
 the benchmarked codebase.

 The scalability suite creates the relevant repository load before starting the tests.
 Each test case tries to benchmark and profile a specific aspect of the repository.

 Each scalability suite is configured to run a number of related tests which require the
 same base load to be available in the repository.
 Either the entire suite can be executed or individual tests within the suite can be run.
 If the suite names are specified like `ScalabilityBlobSearchSuite` then all the tests
 configured for the suite are executed. To execute particular tests in the
 suite, suite names appended with tests of the form `suite:test1,test2` must be specified like
 `ScalabilityBlobSearchSuite:FormatSearcher,NodeTypeSearcher`. You can specify one or more
 suites in the scalability command line, and oak-run will execute each suite in sequence.

 Finally the scalability runner supports the following repository fixtures:

 | Fixture       | Description                                           |
 |---------------|-------------------------------------------------------|
 | Oak-Memory    | Oak with default in-memory storage                    |
 | Oak-MemoryNS  | Oak with default in-memory NodeStore                  |
 | Oak-Mongo     | Oak with the default Mongo backend                    |
 | Oak-Mongo-FDS | Oak with the default Mongo backend and FileDataStore  |
 | Oak-MongoNS   | Oak with the Mongo NodeStore                          |
 | Oak-Tar       | Oak with the Tar backend (aka Segment NodeStore)      |
 | Oak-RDB       | Oak with the DocumentMK/RDB persistence               |

 (Note that for Oak-RDB, the required JDBC drivers either need to be embedded
 into oak-run, or be specified separately in the class path.)

 Once started, the scalability runner will execute each listed suite against all the listed
 repository fixtures. After starting up the repository and preparing the test environment,
 the scalability suite executes all the configured tests to warm up caches before measurements
 are started. Then each configured test within the suite are run and the number of
 milliseconds used by each execution is recorded. Once done, the following statistics are
 computed and reported:

 | Column      | Description                                           |
 |-------------|-------------------------------------------------------|
 | min         | minimum time (in ms) taken by a test run              |
 | 10%         | time (in ms) in which the fastest 10% of test runs    |
 | 50%         | time (in ms) taken by the median test run             |
 | 90%         | time (in ms) in which the fastest 90% of test runs    |
 | max         | maximum time (in ms) taken by a test run              |
 | N           | total number of test runs in one minute (or more)     |

 Also, for each test, the execution times are reported for each iteration/load configured.

 | Column      | Description                                           |
 |-------------|-------------------------------------------------------|
 | Load        | time (in ms) taken by a test run              |

 The latter is more useful of these numbers as it shows how the individual execution
 times are scaling for each load.

 How to add a new scalability suite
 --------------------------
 The scalability code is
 located under `org.apache.jackrabbit.oak.scalabiity` in the oak-run
 component.

 To add a new scalability suite, you'll need to implement
 the `ScalabilitySuite` interface and add an instance of the new suite to the
 `allSuites` array in the `ScalabilityRunner` class, along with the test benchmarks,
 in the `org.apache.jackrabbit.oak.scalability` package.
 To implement the test benchmarks, it is required to extend the `ScalabilityBenchmark`
 abstract class and implement the `execute()` method.
 In addition, the methods `beforeExecute()` and `afterExecute()` can overridden to do processing
 before and after the benchmark executes.

 The best way to implement the `ScalabilitySuite` interface is to extend the
 `ScalabilityAbstractSuite` base class that takes care of most of the benchmarking
 details. The outline of such a suite is:

     class MyTestSuite extends ScalabilityAbstractSuite {
         @Override
         protected void beforeSuite() throws Exception {
             // optional, run once before all the iterations,
             // not included in the performance measurements
         }
         @Override
         protected void beforeIteration(ExecutionContext) throws Exception {
             // optional, Typically, this can be configured to create additional
             // loads for each iteration.
             // This method will be called before each test iteration begins
         }

         @Override
         protected void executeBenchmark(ScalabilityBenchmark benchmark,
             ExecutionContext context) throws Exception {
             // required, executes the specified benchmark
         }

         @Override
         protected void afterIteration() throws Exception {
             // optional, executed after runIteration(),
             // but not included in the performance measurements
         }
         @Override
         protected void afterSuite() throws Exception {
             // optional, run once after all the iterations are complete,
             // not included in the performance measurements
         }
     }

 The rough outline of how the individual suite will be run is:

     test.beforeSuite();
     for (iteration...) {
         test.beforeIteration();
         for (benchmarks...) {
               recordStartTime();
               test.executeBenchmark();
               recordEndTime();
         }
         test.afterIteration();
     }
     test.afterSuite();

 You can specify any context information to the test benchmarks using the ExecutionContext
 object passed as parameter to the `beforeIteration()` and the `executeBenchmark()` methods.
 `ExecutionBenchmark` exposes two methods `getMap()` and `setMap()` which can be used to
 pass context information.

 You can use the `loginWriter()` and `loginReader()` methods to create admin
 and anonymous sessions. There's no need to logout those sessions (unless doing
 so is relevant to the test) as they will automatically be closed after
 the suite is complete and the `afterSuite()` method has been called.

 Similarly, you can use the `addBackgroundJob(Runnable)` method to add
 background tasks that will be run concurrently while the test benchmark is
 executing. The relevant background thread works like this:

     while (running) {
         runnable.run();
         Thread.yield();
     }

 As you can see, the `run()` method of the background task gets invoked
 repeatedly. Such threads will automatically close once all test iterations
 are done, before the `afterSuite()` method is called.

 `ScalabilityAbstractSuite` defines some system properties which are used to control the
 suites extending from it :

     -Dincrements=10,100,1000,1000     - defines the varying loads for each test iteration
     -Dprofile=true                    - to collect and print profiling data
     -Ddebug=true                      - to output any intermediate results during the suite
                                         run

 Recovery Mode
 =============

 The recovery mode can be used to check the consistency of `_lastRev` fields
 of a MongoMK repository. It can be invoked like this:

     $ java -jar oak-run-*.jar recovery [options] mongodb://host:port/database [dryRun]

 The following recovery options (with default values) are currently supported:

     --clusterId         - MongoMK clusterId (default: 0 -> automatic)

 The recovery tool will only perform the check and fix for the given clusterId.
 It is therefore recommended to explicitly specify a clusterId. The tool will
 fix the documents it identified, unless the `dryRun` keyword is specified.

 Garbage
 =======

 The garbage mode can the used to identify blob garbage still referenced by
 documents in a MongoMK repository. It can be invoked like this:

     $ java -jar oak-run-*.jar garbage [options] mongodb://host:port/database

 The following recovery options (with default values) are currently supported:

     --clusterId         - MongoMK clusterId (default: 0 -> automatic)

 The tool will scan the store for documents with blob references and print a
 report with the top 100 documents with blob references considered garbage. The
 rank is based on the size of the referenced blobs.

 <a name="jr2"></a>
 Oak Runnable Jar - JR 2
 ===============================

 This jar provides Jackrabbit 2.x related features

 The following runmodes are currently available:

     * upgrade     : Upgrade from Jackrabbit 2.x repository to Oak.
     * benchmark   : Run benchmark tests against Jackrabbit 2.x repository fixture.
     * server      : Run the JR2 Server.

 Oak Mongo Shell Helpers
 =======================

 To simplify making sense of data created by Oak in Mongo a javascript file oak-mongo.js
 is provided. It includes [some useful function][1] to navigate the data in Mongo

     $ wget https://s.apache.org/oak-mongo.js
     $ mongo localhost/oak --shell oak-mongo.js
     MongoDB shell version: 2.6.3
     connecting to: localhost/oak
     type "help" for help
     > oak.countChildren('/oak:index/')
     356787
     > oak.getChildStats('/oak:index')
     { "count" : 356788, "size" : 127743372, "simple" : "121.83 MB" }
     > oak.getChildStats('/')
     { "count" : 593191, "size" : 302005011, "simple" : "288.01 MB" }
     >

 For reporting any issue related to Oak the script provides a function to collect important stats and
 can be dumped to a file

     $ mongo localhost/oak --eval "load('/path/to/oak-mongo.js');printjson(oak.systemStats());" --quiet > oak-stats.json

 [1]: http://jackrabbit.apache.org/oak/docs/oak-mongo-js/oak.html

 <a name="tarmkdiff"></a>
 Oak TarMK Revision Diff
 -----------------------

 Show changes between revisions on TarMk. It uses a read-only store, so it can also be used on a running system without the need to shut down.

     $ java -jar oak-run-*.jar tarmkdiff path/to/repository [--list] [--diff=R0..R1] [--incremental] [--ignore-snfes] [--output=/path/to/output/file]

 The following options are available:

     --list           - Lists the existing revisions. will ignore other params if this is provided
     --diff           - Revision diff interval. Ex '--diff=R0..R1'. 'HEAD' can be used to reference the latest head revision, ie. '--diff=R0..HEAD'
     --incremental    - Runs diffs between each subsequent revisions in the provided interval (false by default)
     --ignore-snfes   - Ignores SegmentNotFoundExceptions and continues running the diff (experimental) (false by default)
     --path           - Filter diff by given path
     --output         - Output file name (generated randomly if not provided)

 Output sample

     rev 7583946d-1817-4716-a05c-660ee52ddce0.ff94..c238cd7d-87a0-4cca-aa14-80b75e8ab81d.fb3e
     ^ /oak:index
     ^ /oak:index/lucene
     ^ /oak:index/lucene/:data
     - /oak:index/lucene/:data/_3729.cfs
     + /oak:index/lucene/:data/_372d.si
         + blobSize<LONG> = 1047552
         + jcr:lastModified<LONG> = 1447948037017
         + jcr:data<BINARIES>[1] = [252 bytes]
     - /oak:index/lucene/:data/segments_37bv
     + /oak:index/lucene/:data/_372d.cfe
         + blobSize<LONG> = 1047552
         + jcr:lastModified<LONG> = 1447948037017
         + jcr:data<BINARIES>[1] = [224 bytes]
     - /oak:index/lucene/:data/_3729.si
     - /oak:index/lucene/:data/_3729.cfe
     + /oak:index/lucene/:data/_372d.cfs
         + blobSize<LONG> = 1047552
         + jcr:lastModified<LONG> = 1447948037017
         + jcr:data<BINARIES>[1] = [907 bytes]
     + /oak:index/lucene/:data/segments_37bz
         + blobSize<LONG> = 1047552
         + jcr:lastModified<LONG> = 1447948045167
         + jcr:data<BINARIES>[1] = [863 bytes]
     ^ /oak:index/lucene/:data/segments.gen
         ^ jcr:lastModified
           - jcr:lastModified<LONG> = 1447947918027
           + jcr:lastModified<LONG> = 1447948045167
         ^ jcr:data
           - jcr:data<BINARIES>[1] = [20 bytes]
           + jcr:data<BINARIES>[1] = [20 bytes]
     ^ /oak:index/lucene/:status
         ^ lastUpdated
           - lastUpdated<DATE> = 2015-11-19T10:45:18.027-05:00
           + lastUpdated<DATE> = 2015-11-19T10:47:25.167-05:00

 <a name="tarmkrecovery"></a>
 Oak TarMK Revision Recovery
 ---------------------------

 Lists candidates for head journal entries. Uses a read-only store, so no updates will be performed on target repository.

     $ java -jar oak-run-*.jar tarmkrecovery path/to/repository [--version-v10]

 The following options are available:

     --version-v10           - Uses V10 version repository reading (see OAK-2527)

 Oak DataStore Check
 -------------------

 Consistency checker for the DataStore.
 Also can be used to list all the blob references in the node store and all the blob ids available in the data store.
 Use the following command:

     $ java -jar oak-run-*.jar datastorecheck [--id] [--ref] [--consistency] \
             [--store <path>|<mongo_uri>] \
             [--s3ds <s3ds_config>|--fds <fds_config>] \
             [--dump <path>]

 The following options are available:

     --id             - List all the ids in the data store
     --ref            - List all the blob references in the node store
     --consistency    - Lists all the missing blobs by doind a consistency check
     Atleast one of the above should be specified

     --store          - Path to the segment store of mongo uri (Required for --ref & --consistency option above)
     --dump           - Path where to dump the files (Optional). Otherwise, files will be dumped in the user tmp directory.
     --s3ds           - Path to the S3DataStore configuration file
     --fds            - Path to the FileDataStore configuration file ('path' property is mandatory)

 Note:
 For using S3DataStore the following additional jars have to be downloaded
     - [commons-logging-1.1.3.jar](http://central.maven.org/maven2/commons-logging/commons-logging/1.1.3/commons-logging-1.1.3.jar)
     - [httpcore-4.4.4.jar](http://central.maven.org/maven2/org/apache/httpcomponents/httpcore/4.4.4/httpcore-4.4.4.jar)
     - [aws-java-sdk-1.10.76.jar](http://central.maven.org/maven2/com/amazonaws/aws-java-sdk/1.10.76/aws-java-sdk-1.10.76.jar)

 The command to be executed for S3DataStore

     java -classpath oak-run-*.jar:httpcore-4.4.4.jar:aws-java-sdk-osgi-1.10.76.jar:commons-logging-1.1.3.jar \
         org.apache.jackrabbit.oak.run.Main \
         datastorecheck --id --ref --consistency \
         --store <path>|<mongo_uri> \
         --s3ds <s3ds_config> \
         --dump <dump_path>

 The config files should be formatted according to the OSGi configuration admin specification

     E.g.
     cat > org.apache.jackrabbit.oak.plugins.S3DataStore.config << EOF
     accessKey="XXXXXXXXX"
     secretKey="YYYYYY"
     s3Bucket="bucket1"
     s3Region="region1"
     EOF

     cat > org.apache.jackrabbit.oak.plugins.FileDataStore.config << EOF
     path="/data/datastore"
     EOF


 Reset Cluster Id
 ---------------

 Resets the cluster id generated internally. Use the following command after stopping the server

     $ java -jar oak-run-*.jar resetclusterid \
             { /path/to/oak/repository | mongodb://host:port/database }

 The cluster id will be removed and will be generated on next server start up.

 License
 -------

 (see the top-level [LICENSE.txt](../LICENSE.txt) for full license details)

 Collective work: Copyright 2012 The Apache Software Foundation.

 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the "License"); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

      http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.