| /* |
| * Licensed to the Apache Software Foundation (ASF) under one or more |
| * contributor license agreements. See the NOTICE file distributed with |
| * this work for additional information regarding copyright ownership. |
| * The ASF licenses this file to You under the Apache License, Version 2.0 |
| * (the "License"); you may not use this file except in compliance with |
| * the License. You may obtain a copy of the License at |
| * |
| * http://www.apache.org/licenses/LICENSE-2.0 |
| * |
| * Unless required by applicable law or agreed to in writing, software |
| * distributed under the License is distributed on an "AS IS" BASIS, |
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| * See the License for the specific language governing permissions and |
| * limitations under the License. |
| */ |
| |
| /** |
| * Benchmarking Lucene By Tasks |
| * <p> |
| * This package provides "task based" performance benchmarking of Lucene. |
| * One can use the predefined benchmarks, or create new ones. |
| * </p> |
| * <p> |
| * Contained packages: |
| * </p> |
| * |
| * <table border=1 cellpadding=4 summary="table of benchmark packages"> |
| * <tr> |
| * <td><b>Package</b></td> |
| * <td><b>Description</b></td> |
| * </tr> |
| * <tr> |
| * <td><a href="stats/package-summary.html">stats</a></td> |
| * <td>Statistics maintained when running benchmark tasks.</td> |
| * </tr> |
| * <tr> |
| * <td><a href="tasks/package-summary.html">tasks</a></td> |
| * <td>Benchmark tasks.</td> |
| * </tr> |
| * <tr> |
| * <td><a href="feeds/package-summary.html">feeds</a></td> |
| * <td>Sources for benchmark inputs: documents and queries.</td> |
| * </tr> |
| * <tr> |
| * <td><a href="utils/package-summary.html">utils</a></td> |
| * <td>Utilities used for the benchmark, and for the reports.</td> |
| * </tr> |
| * <tr> |
| * <td><a href="programmatic/package-summary.html">programmatic</a></td> |
| * <td>Sample performance test written programmatically.</td> |
| * </tr> |
| * </table> |
| * |
| * <h2>Table Of Contents</h2> |
| * <ol> |
| * <li><a href="#concept">Benchmarking By Tasks</a></li> |
| * <li><a href="#usage">How to use</a></li> |
| * <li><a href="#algorithm">Benchmark "algorithm"</a></li> |
| * <li><a href="#tasks">Supported tasks/commands</a></li> |
| * <li><a href="#properties">Benchmark properties</a></li> |
| * <li><a href="#example">Example input algorithm and the result benchmark |
| * report.</a></li> |
| * <li><a href="#recsCounting">Results record counting clarified</a></li> |
| * </ol> |
| * <a name="concept"></a> |
| * <h2>Benchmarking By Tasks</h2> |
| * <p> |
| * Benchmark Lucene using task primitives. |
| * </p> |
| * |
| * <p> |
| * A benchmark is composed of some predefined tasks, allowing for creating an |
| * index, adding documents, |
| * optimizing, searching, generating reports, and more. A benchmark run takes an |
| * "algorithm" file |
| * that contains a description of the sequence of tasks making up the run, and some |
| * properties defining a few |
| * additional characteristics of the benchmark run. |
| * </p> |
| * |
| * <a name="usage"></a> |
| * <h2>How to use</h2> |
| * <p> |
| * Easiest way to run a benchmarks is using the predefined ant task: |
| * <ul> |
| * <li>ant run-task |
| * <br>- would run the <code>micro-standard.alg</code> "algorithm". |
| * </li> |
| * <li>ant run-task -Dtask.alg=conf/compound-penalty.alg |
| * <br>- would run the <code>compound-penalty.alg</code> "algorithm". |
| * </li> |
| * <li>ant run-task -Dtask.alg=[full-path-to-your-alg-file] |
| * <br>- would run <code>your perf test</code> "algorithm". |
| * </li> |
| * <li>java org.apache.lucene.benchmark.byTask.programmatic.Sample |
| * <br>- would run a performance test programmatically - without using an alg |
| * file. This is less readable, and less convenient, but possible. |
| * </li> |
| * </ul> |
| * |
| * <p> |
| * You may find existing tasks sufficient for defining the benchmark <i>you</i> |
| * need, otherwise, you can extend the framework to meet your needs, as explained |
| * herein. |
| * </p> |
| * |
| * <p> |
| * Each benchmark run has a DocMaker and a QueryMaker. These two should usually |
| * match, so that "meaningful" queries are used for a certain collection. |
| * Properties set at the header of the alg file define which "makers" should be |
| * used. You can also specify your own makers, extending DocMaker and implementing |
| * QueryMaker. |
| * <blockquote> |
| * <b>Note:</b> since 2.9, DocMaker is a concrete class which accepts a |
| * ContentSource. In most cases, you can use the DocMaker class to create |
| * Documents, while providing your own ContentSource implementation. For |
| * example, the current Benchmark package includes ContentSource |
| * implementations for TREC, Enwiki and Reuters collections, as well as |
| * others like LineDocSource which reads a 'line' file produced by |
| * WriteLineDocTask. |
| * </blockquote> |
| * |
| * <p> |
| * Benchmark .alg file contains the benchmark "algorithm". The syntax is described |
| * below. Within the algorithm, you can specify groups of commands, assign them |
| * names, specify commands that should be repeated, |
| * do commands in serial or in parallel, |
| * and also control the speed of "firing" the commands. |
| * </p> |
| * |
| * <p> |
| * This allows, for instance, to specify |
| * that an index should be opened for update, |
| * documents should be added to it one by one but not faster than 20 docs a minute, |
| * and, in parallel with this, |
| * some N queries should be searched against that index, |
| * again, no more than 2 queries a second. |
| * You can have the searches all share an index reader, |
| * or have them each open its own reader and close it afterwords. |
| * </p> |
| * |
| * <p> |
| * If the commands available for use in the algorithm do not meet your needs, |
| * you can add commands by adding a new task under |
| * org.apache.lucene.benchmark.byTask.tasks - |
| * you should extend the PerfTask abstract class. |
| * Make sure that your new task class name is suffixed by Task. |
| * Assume you added the class "WonderfulTask" - doing so also enables the |
| * command "Wonderful" to be used in the algorithm. |
| * </p> |
| * |
| * <p> |
| * <u>External classes</u>: It is sometimes useful to invoke the benchmark |
| * package with your external alg file that configures the use of your own |
| * doc/query maker and or html parser. You can work this out without |
| * modifying the benchmark package code, by passing your class path |
| * with the benchmark.ext.classpath property: |
| * <ul> |
| * <li>ant run-task -Dtask.alg=[full-path-to-your-alg-file] |
| * <span style="color: #FF0000">-Dbenchmark.ext.classpath=/mydir/classes |
| * </span> -Dtask.mem=512M</li> |
| * </ul> |
| * <p> |
| * <u>External tasks</u>: When writing your own tasks under a package other than |
| * <b>org.apache.lucene.benchmark.byTask.tasks</b> specify that package thru the |
| * <span style="color: #FF0000">alt.tasks.packages</span> property. |
| * |
| * <a name="algorithm"></a> |
| * <h2>Benchmark "algorithm"</h2> |
| * |
| * <p> |
| * The following is an informal description of the supported syntax. |
| * </p> |
| * |
| * <ol> |
| * <li> |
| * <b>Measuring</b>: When a command is executed, statistics for the elapsed |
| * execution time and memory consumption are collected. |
| * At any time, those statistics can be printed, using one of the |
| * available ReportTasks. |
| * </li> |
| * <li> |
| * <b>Comments</b> start with '<span style="color: #FF0066">#</span>'. |
| * </li> |
| * <li> |
| * <b>Serial</b> sequences are enclosed within '<span style="color: #FF0066">{ }</span>'. |
| * </li> |
| * <li> |
| * <b>Parallel</b> sequences are enclosed within |
| * '<span style="color: #FF0066">[ ]</span>' |
| * </li> |
| * <li> |
| * <b>Sequence naming:</b> To name a sequence, put |
| * '<span style="color: #FF0066">"name"</span>' just after |
| * '<span style="color: #FF0066">{</span>' or '<span style="color: #FF0066">[</span>'. |
| * <br>Example - <span style="color: #FF0066">{ "ManyAdds" AddDoc } : 1000000</span> - |
| * would |
| * name the sequence of 1M add docs "ManyAdds", and this name would later appear |
| * in statistic reports. |
| * If you don't specify a name for a sequence, it is given one: you can see it as |
| * the algorithm is printed just before benchmark execution starts. |
| * </li> |
| * <li> |
| * <b>Repeating</b>: |
| * To repeat sequence tasks N times, add '<span style="color: #FF0066">: N</span>' just |
| * after the |
| * sequence closing tag - '<span style="color: #FF0066">}</span>' or |
| * '<span style="color: #FF0066">]</span>' or '<span style="color: #FF0066">></span>'. |
| * <br>Example - <span style="color: #FF0066">[ AddDoc ] : 4</span> - would do 4 addDoc |
| * in parallel, spawning 4 threads at once. |
| * <br>Example - <span style="color: #FF0066">[ AddDoc AddDoc ] : 4</span> - would do |
| * 8 addDoc in parallel, spawning 8 threads at once. |
| * <br>Example - <span style="color: #FF0066">{ AddDoc } : 30</span> - would do addDoc |
| * 30 times in a row. |
| * <br>Example - <span style="color: #FF0066">{ AddDoc AddDoc } : 30</span> - would do |
| * addDoc 60 times in a row. |
| * <br><b>Exhaustive repeating</b>: use <span style="color: #FF0066">*</span> instead of |
| * a number to repeat exhaustively. |
| * This is sometimes useful, for adding as many files as a doc maker can create, |
| * without iterating over the same file again, especially when the exact |
| * number of documents is not known in advance. For instance, TREC files extracted |
| * from a zip file. Note: when using this, you must also set |
| * <span style="color: #FF0066">content.source.forever</span> to false. |
| * <br>Example - <span style="color: #FF0066">{ AddDoc } : *</span> - would add docs |
| * until the doc maker is "exhausted". |
| * </li> |
| * <li> |
| * <b>Command parameter</b>: a command can optionally take a single parameter. |
| * If the certain command does not support a parameter, or if the parameter is of |
| * the wrong type, |
| * reading the algorithm will fail with an exception and the test would not start. |
| * Currently the following tasks take optional parameters: |
| * <ul> |
| * <li><b>AddDoc</b> takes a numeric parameter, indicating the required size of |
| * added document. Note: if the DocMaker implementation used in the test |
| * does not support makeDoc(size), an exception would be thrown and the test |
| * would fail. |
| * </li> |
| * <li><b>DeleteDoc</b> takes numeric parameter, indicating the docid to be |
| * deleted. The latter is not very useful for loops, since the docid is |
| * fixed, so for deletion in loops it is better to use the |
| * <code>doc.delete.step</code> property. |
| * </li> |
| * <li><b>SetProp</b> takes a <code>name,value</code> mandatory param, |
| * ',' used as a separator. |
| * </li> |
| * <li><b>SearchTravRetTask</b> and <b>SearchTravTask</b> take a numeric |
| * parameter, indicating the required traversal size. |
| * </li> |
| * <li><b>SearchTravRetLoadFieldSelectorTask</b> takes a string |
| * parameter: a comma separated list of Fields to load. |
| * </li> |
| * <li><b>SearchTravRetHighlighterTask</b> takes a string |
| * parameter: a comma separated list of parameters to define highlighting. See that |
| * tasks javadocs for more information |
| * </li> |
| * </ul> |
| * <br>Example - <span style="color: #FF0066">AddDoc(2000)</span> - would add a document |
| * of size 2000 (~bytes). |
| * <br>See conf/task-sample.alg for how this can be used, for instance, to check |
| * which is faster, adding |
| * many smaller documents, or few larger documents. |
| * Next candidates for supporting a parameter may be the Search tasks, |
| * for controlling the query size. |
| * </li> |
| * <li> |
| * <b>Statistic recording elimination</b>: - a sequence can also end with |
| * '<span style="color: #FF0066">></span>', |
| * in which case child tasks would not store their statistics. |
| * This can be useful to avoid exploding stats data, for adding say 1M docs. |
| * <br>Example - <span style="color: #FF0066">{ "ManyAdds" AddDoc > : 1000000</span> - |
| * would add million docs, measure that total, but not save stats for each addDoc. |
| * <br>Notice that the granularity of System.currentTimeMillis() (which is used |
| * here) is system dependant, |
| * and in some systems an operation that takes 5 ms to complete may show 0 ms |
| * latency time in performance measurements. |
| * Therefore it is sometimes more accurate to look at the elapsed time of a larger |
| * sequence, as demonstrated here. |
| * </li> |
| * <li> |
| * <b>Rate</b>: |
| * To set a rate (ops/sec or ops/min) for a sequence, add |
| * '<span style="color: #FF0066">: N : R</span>' just after sequence closing tag. |
| * This would specify repetition of N with rate of R operations/sec. |
| * Use '<span style="color: #FF0066">R/sec</span>' or |
| * '<span style="color: #FF0066">R/min</span>' |
| * to explicitly specify that the rate is per second or per minute. |
| * The default is per second, |
| * <br>Example - <span style="color: #FF0066">[ AddDoc ] : 400 : 3</span> - would do 400 |
| * addDoc in parallel, starting up to 3 threads per second. |
| * <br>Example - <span style="color: #FF0066">{ AddDoc } : 100 : 200/min</span> - would |
| * do 100 addDoc serially, |
| * waiting before starting next add, if otherwise rate would exceed 200 adds/min. |
| * </li> |
| * <li> |
| * <b>Disable Counting</b>: Each task executed contributes to the records count. |
| * This count is reflected in reports under recs/s and under recsPerRun. |
| * Most tasks count 1, some count 0, and some count more. |
| * (See <a href="#recsCounting">Results record counting clarified</a> for more details.) |
| * It is possible to disable counting for a task by preceding it with <span style="color: #FF0066">-</span>. |
| * <br>Example - <span style="color: #FF0066"> -CreateIndex </span> - would count 0 while |
| * the default behavior for CreateIndex is to count 1. |
| * </li> |
| * <li> |
| * <b>Command names</b>: Each class "AnyNameTask" in the |
| * package org.apache.lucene.benchmark.byTask.tasks, |
| * that extends PerfTask, is supported as command "AnyName" that can be |
| * used in the benchmark "algorithm" description. |
| * This allows to add new commands by just adding such classes. |
| * </li> |
| * </ol> |
| * |
| * |
| * <a name="tasks"></a> |
| * <h2>Supported tasks/commands</h2> |
| * |
| * <p> |
| * Existing tasks can be divided into a few groups: |
| * regular index/search work tasks, report tasks, and control tasks. |
| * </p> |
| * |
| * <ol> |
| * |
| * <li> |
| * <b>Report tasks</b>: There are a few Report commands for generating reports. |
| * Only task runs that were completed are reported. |
| * (The 'Report tasks' themselves are not measured and not reported.) |
| * <ul> |
| * <li> |
| * <span style="color: #FF0066">RepAll</span> - all (completed) task runs. |
| * </li> |
| * <li> |
| * <span style="color: #FF0066">RepSumByName</span> - all statistics, |
| * aggregated by name. So, if AddDoc was executed 2000 times, |
| * only 1 report line would be created for it, aggregating all those |
| * 2000 statistic records. |
| * </li> |
| * <li> |
| * <span style="color: #FF0066">RepSelectByPref prefixWord</span> - all |
| * records for tasks whose name start with |
| * <span style="color: #FF0066">prefixWord</span>. |
| * </li> |
| * <li> |
| * <span style="color: #FF0066">RepSumByPref prefixWord</span> - all |
| * records for tasks whose name start with |
| * <span style="color: #FF0066">prefixWord</span>, |
| * aggregated by their full task name. |
| * </li> |
| * <li> |
| * <span style="color: #FF0066">RepSumByNameRound</span> - all statistics, |
| * aggregated by name and by <span style="color: #FF0066">Round</span>. |
| * So, if AddDoc was executed 2000 times in each of 3 |
| * <span style="color: #FF0066">rounds</span>, 3 report lines would be |
| * created for it, |
| * aggregating all those 2000 statistic records in each round. |
| * See more about rounds in the <span style="color: #FF0066">NewRound</span> |
| * command description below. |
| * </li> |
| * <li> |
| * <span style="color: #FF0066">RepSumByPrefRound prefixWord</span> - |
| * similar to <span style="color: #FF0066">RepSumByNameRound</span>, |
| * just that only tasks whose name starts with |
| * <span style="color: #FF0066">prefixWord</span> are included. |
| * </li> |
| * </ul> |
| * If needed, additional reports can be added by extending the abstract class |
| * ReportTask, and by |
| * manipulating the statistics data in Points and TaskStats. |
| * </li> |
| * |
| * <li><b>Control tasks</b>: Few of the tasks control the benchmark algorithm |
| * all over: |
| * <ul> |
| * <li> |
| * <span style="color: #FF0066">ClearStats</span> - clears the entire statistics. |
| * Further reports would only include task runs that would start after this |
| * call. |
| * </li> |
| * <li> |
| * <span style="color: #FF0066">NewRound</span> - virtually start a new round of |
| * performance test. |
| * Although this command can be placed anywhere, it mostly makes sense at |
| * the end of an outermost sequence. |
| * <br>This increments a global "round counter". All task runs that |
| * would start now would |
| * record the new, updated round counter as their round number. |
| * This would appear in reports. |
| * In particular, see <span style="color: #FF0066">RepSumByNameRound</span> above. |
| * <br>An additional effect of NewRound, is that numeric and boolean |
| * properties defined (at the head |
| * of the .alg file) as a sequence of values, e.g. <span style="color: #FF0066"> |
| * merge.factor=mrg:10:100:10:100</span> would |
| * increment (cyclic) to the next value. |
| * Note: this would also be reflected in the reports, in this case under a |
| * column that would be named "mrg". |
| * </li> |
| * <li> |
| * <span style="color: #FF0066">ResetInputs</span> - DocMaker and the |
| * various QueryMakers |
| * would reset their counters to start. |
| * The way these Maker interfaces work, each call for makeDocument() |
| * or makeQuery() creates the next document or query |
| * that it "knows" to create. |
| * If that pool is "exhausted", the "maker" start over again. |
| * The ResetInputs command |
| * therefore allows to make the rounds comparable. |
| * It is therefore useful to invoke ResetInputs together with NewRound. |
| * </li> |
| * <li> |
| * <span style="color: #FF0066">ResetSystemErase</span> - reset all index |
| * and input data and call gc. |
| * Does NOT reset statistics. This contains ResetInputs. |
| * All writers/readers are nullified, deleted, closed. |
| * Index is erased. |
| * Directory is erased. |
| * You would have to call CreateIndex once this was called... |
| * </li> |
| * <li> |
| * <span style="color: #FF0066">ResetSystemSoft</span> - reset all |
| * index and input data and call gc. |
| * Does NOT reset statistics. This contains ResetInputs. |
| * All writers/readers are nullified, closed. |
| * Index is NOT erased. |
| * Directory is NOT erased. |
| * This is useful for testing performance on an existing index, |
| * for instance if the construction of a large index |
| * took a very long time and now you would to test |
| * its search or update performance. |
| * </li> |
| * </ul> |
| * </li> |
| * |
| * <li> |
| * Other existing tasks are quite straightforward and would |
| * just be briefly described here. |
| * <ul> |
| * <li> |
| * <span style="color: #FF0066">CreateIndex</span> and |
| * <span style="color: #FF0066">OpenIndex</span> both leave the |
| * index open for later update operations. |
| * <span style="color: #FF0066">CloseIndex</span> would close it. |
| * <li> |
| * <span style="color: #FF0066">OpenReader</span>, similarly, would |
| * leave an index reader open for later search operations. |
| * But this have further semantics. |
| * If a Read operation is performed, and an open reader exists, |
| * it would be used. |
| * Otherwise, the read operation would open its own reader |
| * and close it when the read operation is done. |
| * This allows testing various scenarios - sharing a reader, |
| * searching with "cold" reader, with "warmed" reader, etc. |
| * The read operations affected by this are: |
| * <span style="color: #FF0066">Warm</span>, |
| * <span style="color: #FF0066">Search</span>, |
| * <span style="color: #FF0066">SearchTrav</span> (search and traverse), |
| * and <span style="color: #FF0066">SearchTravRet</span> (search |
| * and traverse and retrieve). |
| * Notice that each of the 3 search task types maintains |
| * its own queryMaker instance. |
| * <li> |
| * <span style="color: #FF0066">CommitIndex</span> and |
| * <span style="color: #FF0066">ForceMerge</span> can be used to commit |
| * changes to the index then merge the index segments. The integer |
| * parameter specifies how many segments to merge down to (default |
| * 1). |
| * <li> |
| * <span style="color: #FF0066">WriteLineDoc</span> prepares a 'line' |
| * file where each line holds a document with <i>title</i>, |
| * <i>date</i> and <i>body</i> elements, separated by [TAB]. |
| * A line file is useful if one wants to measure pure indexing |
| * performance, without the overhead of parsing the data.<br> |
| * You can use LineDocSource as a ContentSource over a 'line' |
| * file. |
| * <li> |
| * <span style="color: #FF0066">ConsumeContentSource</span> consumes |
| * a ContentSource. Useful for e.g. testing a ContentSource |
| * performance, without the overhead of preparing a Document |
| * out of it. |
| * </ul> |
| * </li> |
| * </ol> |
| * |
| * <a name="properties"></a> |
| * <h2>Benchmark properties</h2> |
| * |
| * <p> |
| * Properties are read from the header of the .alg file, and |
| * define several parameters of the performance test. |
| * As mentioned above for the <span style="color: #FF0066">NewRound</span> task, |
| * numeric and boolean properties that are defined as a sequence |
| * of values, e.g. <span style="color: #FF0066">merge.factor=mrg:10:100:10:100</span> |
| * would increment (cyclic) to the next value, |
| * when NewRound is called, and would also |
| * appear as a named column in the reports (column |
| * name would be "mrg" in this example). |
| * </p> |
| * |
| * <p> |
| * Some of the currently defined properties are: |
| * </p> |
| * |
| * <ol> |
| * <li> |
| * <span style="color: #FF0066">analyzer</span> - full |
| * class name for the analyzer to use. |
| * Same analyzer would be used in the entire test. |
| * </li> |
| * |
| * <li> |
| * <span style="color: #FF0066">directory</span> - valid values are |
| * This tells which directory to use for the performance test. |
| * </li> |
| * |
| * <li> |
| * <b>Index work parameters</b>: |
| * Multi int/boolean values would be iterated with calls to NewRound. |
| * There would be also added as columns in the reports, first string in the |
| * sequence is the column name. |
| * (Make sure it is no shorter than any value in the sequence). |
| * <ul> |
| * <li><span style="color: #FF0066">max.buffered</span> |
| * <br>Example: max.buffered=buf:10:10:100:100 - |
| * this would define using maxBufferedDocs of 10 in iterations 0 and 1, |
| * and 100 in iterations 2 and 3. |
| * </li> |
| * <li> |
| * <span style="color: #FF0066">merge.factor</span> - which |
| * merge factor to use. |
| * </li> |
| * <li> |
| * <span style="color: #FF0066">compound</span> - whether the index is |
| * using the compound format or not. Valid values are "true" and "false". |
| * </li> |
| * </ul> |
| * </ol> |
| * |
| * <p> |
| * Here is a list of currently defined properties: |
| * </p> |
| * <ol> |
| * |
| * <li><b>Root directory for data and indexes:</b> |
| * <ul><li>work.dir (default is System property "benchmark.work.dir" or "work".) |
| * </li></ul> |
| * </li> |
| * |
| * <li><b>Docs and queries creation:</b> |
| * <ul><li>analyzer |
| * </li><li>doc.maker |
| * </li><li>content.source.forever |
| * </li><li>html.parser |
| * </li><li>doc.stored |
| * </li><li>doc.tokenized |
| * </li><li>doc.term.vector |
| * </li><li>doc.term.vector.positions |
| * </li><li>doc.term.vector.offsets |
| * </li><li>doc.store.body.bytes |
| * </li><li>docs.dir |
| * </li><li>query.maker |
| * </li><li>file.query.maker.file |
| * </li><li>file.query.maker.default.field |
| * </li><li>search.num.hits |
| * </li></ul> |
| * </li> |
| * |
| * <li><b>Logging</b>: |
| * <ul><li>log.step |
| * </li><li>log.step.[class name]Task ie log.step.DeleteDoc (e.g. log.step.Wonderful for the WonderfulTask example above). |
| * </li><li>log.queries |
| * </li><li>task.max.depth.log |
| * </li></ul> |
| * </li> |
| * |
| * <li><b>Index writing</b>: |
| * <ul><li>compound |
| * </li><li>merge.factor |
| * </li><li>max.buffered |
| * </li><li>directory |
| * </li><li>ram.flush.mb |
| * </li><li>codec.postingsFormat (eg Direct) Note: no codec should be specified through default.codec |
| * </li></ul> |
| * </li> |
| * |
| * <li><b>Doc deletion</b>: |
| * <ul><li>doc.delete.step |
| * </li></ul> |
| * </li> |
| * |
| * <li><b>Spatial</b>: Numerous; see spatial.alg |
| * </li> |
| * |
| * <li><b>Task alternative packages</b>: |
| * <ul><li>alt.tasks.packages |
| * - comma separated list of additional packages where tasks classes will be looked for |
| * when not found in the default package (that of PerfTask). If the same task class |
| * appears in more than one package, the package indicated first in this list will be used. |
| * </li></ul> |
| * </li> |
| * |
| * </ol> |
| * |
| * <p> |
| * For sample use of these properties see the *.alg files under conf. |
| * </p> |
| * |
| * <a name="example"></a> |
| * <h2>Example input algorithm and the result benchmark report</h2> |
| * <p> |
| * The following example is in conf/sample.alg: |
| * <pre> |
| * <span style="color: #003333"># -------------------------------------------------------- |
| * # |
| * # Sample: what is the effect of doc size on indexing time? |
| * # |
| * # There are two parts in this test: |
| * # - PopulateShort adds 2N documents of length L |
| * # - PopulateLong adds N documents of length 2L |
| * # Which one would be faster? |
| * # The comparison is done twice. |
| * # |
| * # -------------------------------------------------------- |
| * </span> |
| * <span style="color: #990066"># ------------------------------------------------------------------------------------- |
| * # multi val params are iterated by NewRound's, added to reports, start with column name. |
| * merge.factor=mrg:10:20 |
| * max.buffered=buf:100:1000 |
| * compound=true |
| * |
| * analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer |
| * directory=FSDirectory |
| * |
| * doc.stored=true |
| * doc.tokenized=true |
| * doc.term.vector=false |
| * doc.add.log.step=500 |
| * |
| * docs.dir=reuters-out |
| * |
| * doc.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleDocMaker |
| * |
| * query.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleQueryMaker |
| * |
| * # task at this depth or less would print when they start |
| * task.max.depth.log=2 |
| * |
| * log.queries=false |
| * # -------------------------------------------------------------------------------------</span> |
| * <span style="color: #3300FF">{ |
| * |
| * { "PopulateShort" |
| * CreateIndex |
| * { AddDoc(4000) > : 20000 |
| * Optimize |
| * CloseIndex |
| * > |
| * |
| * ResetSystemErase |
| * |
| * { "PopulateLong" |
| * CreateIndex |
| * { AddDoc(8000) > : 10000 |
| * Optimize |
| * CloseIndex |
| * > |
| * |
| * ResetSystemErase |
| * |
| * NewRound |
| * |
| * } : 2 |
| * |
| * RepSumByName |
| * RepSelectByPref Populate |
| * </span> |
| * </pre> |
| * |
| * <p> |
| * The command line for running this sample: |
| * <br><code>ant run-task -Dtask.alg=conf/sample.alg</code> |
| * </p> |
| * |
| * <p> |
| * The output report from running this test contains the following: |
| * <pre> |
| * Operation round mrg buf runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem |
| * PopulateShort 0 10 100 1 20003 119.6 167.26 12,959,120 14,241,792 |
| * PopulateLong - - 0 10 100 - - 1 - - 10003 - - - 74.3 - - 134.57 - 17,085,208 - 20,635,648 |
| * PopulateShort 1 20 1000 1 20003 143.5 139.39 63,982,040 94,756,864 |
| * PopulateLong - - 1 20 1000 - - 1 - - 10003 - - - 77.0 - - 129.92 - 87,309,608 - 100,831,232 |
| * </pre> |
| * |
| * <a name="recsCounting"></a> |
| * <h2>Results record counting clarified</h2> |
| * <p> |
| * Two columns in the results table indicate records counts: records-per-run and |
| * records-per-second. What does it mean? |
| * </p><p> |
| * Almost every task gets 1 in this count just for being executed. |
| * Task sequences aggregate the counts of their child tasks, |
| * plus their own count of 1. |
| * So, a task sequence containing 5 other task sequences, each running a single |
| * other task 10 times, would have a count of 1 + 5 * (1 + 10) = 56. |
| * </p><p> |
| * The traverse and retrieve tasks "count" more: a traverse task |
| * would add 1 for each traversed result (hit), and a retrieve task would |
| * additionally add 1 for each retrieved doc. So, regular Search would |
| * count 1, SearchTrav that traverses 10 hits would count 11, and a |
| * SearchTravRet task that retrieves (and traverses) 10, would count 21. |
| * </p><p> |
| * Confusing? this might help: always examine the <code>elapsedSec</code> column, |
| * and always compare "apples to apples", .i.e. it is interesting to check how the |
| * <code>rec/s</code> changed for the same task (or sequence) between two |
| * different runs, but it is not very useful to know how the <code>rec/s</code> |
| * differs between <code>Search</code> and <code>SearchTrav</code> tasks. For |
| * the latter, <code>elapsedSec</code> would bring more insight. |
| * </p> |
| */ |
| package org.apache.lucene.benchmark.byTask; |