| Lucene Benchmark Contrib Change Log |
| |
| The Benchmark contrib package contains code for benchmarking Lucene in a variety of ways. |
| |
| $Id:$ |
| |
| 8/4/2009 |
| LUCENE-1770: Add EnwikiQueryMaker (Mark Miller) |
| |
| 8/04/2009 |
| LUCENE-1773: Add FastVectorHighlighter tasks. This change is a |
| non-backwards compatible change in how subclasses of ReadTask define |
| a highlighter. The methods doHighlight, isMergeContiguousFragments, |
| maxNumFragments and getHighlighter are no longer used and have been |
| mark deprecated and package protected private so there's a compile |
| time error. Instead, the new getBenchmarkHighlighter method should |
| return an appropriate highlighter for the task. The configuration of |
| the highlighter tasks (maxFrags, mergeContiguous, etc.) is now |
| accepted as params to the task. (Koji Sekiguchi via Mike McCandless) |
| |
| 8/03/2009 |
| LUCENE-1778: Add support for log.step setting per task type. Perviously, if |
| you included a log.step line in the .alg file, it had been applied to all |
| tasks. Now, you can include a log.step.AddDoc, or log.step.DeleteDoc (for |
| example) to control logging for just these tasks. If you want to ommit logging |
| for any other task, include log.step=-1. The syntax is "log.step." together |
| with the Task's 'short' name (i.e., without the 'Task' part). |
| (Shai Erera via Mark Miller) |
| |
| 7/24/2009 |
| LUCENE-1595: Deprecate LineDocMaker and EnwikiDocMaker in favor of |
| using DocMaker directly, with content.source = LineDocSource or |
| EnwikiContentSource. NOTE: with this change, the "id" field from |
| the Wikipedia XML export is now indexed as the "docname" field |
| (previously it was indexed as "docid"). Additionaly, the |
| SearchWithSort task now accepts all types that SortField can accept |
| and no longer falls back to SortField.AUTO, which has been |
| deprecated. (Mike McCandless) |
| |
| 7/20/2009 |
| LUCENE-1755: Fix WriteLineDocTask to output a document if it contains either |
| a title or body (or both). (Shai Erera via Mark Miller) |
| |
| 7/14/2009 |
| LUCENE-1725: Fix the example Sort algorithm - auto is now deprecated and no longer works |
| with Benchmark. Benchmark will now throw an exception if you specify sort fields without |
| a type. The example sort algorithm is now typed. (Mark Miller) |
| |
| 7/6/2009 |
| LUCENE-1730: Fix TrecContentSource to use ISO-8859-1 when reading the TREC files, |
| unless a different encoding is specified. Additionally, ContentSource now supports |
| a content.source.encoding parameter in the configuration file. |
| (Shai Erera via Mark Miller) |
| |
| 6/26/2009 |
| LUCENE-1716: Added the following support: |
| doc.tokenized.norms: specifies whether to store norms |
| doc.body.tokenized.norms: special attribute for the body field |
| doc.index.props: specifies whether DocMaker should index the properties set on |
| DocData |
| writer.info.stream: specifies the info stream to set on IndexWriter (supported |
| values are: SystemOut, SystemErr and a file name). (Shai Erera via Mike McCandless) |
| |
| 6/23/09 |
| LUCENE-1714: WriteLineDocTask incorrectly normalized text, by replacing only |
| occurrences of "\t" with a space. It now replaces "\r\n" in addition to that, |
| so that LineDocMaker won't fail. (Shai Erera via Michael McCandless) |
| |
| 6/17/09 |
| LUCENE-1595: This issue breaks previous external algorithms. DocMaker has been |
| replaced with a concrete class which accepts a ContentSource for iterating over |
| a content source's documents. Most of the old DocMakers were changed to a |
| ContentSource implementation, and DocMaker is now a default document creation impl |
| that provides an easy way for reusing fields. When [doc.maker] is not defined in |
| an algorithm, the new DocMaker is the default. If you have .alg files which |
| specify a DocMaker (like ReutersDocMaker), you should change the [doc.maker] line to: |
| [content.source=org.apache.lucene.benchmark.byTask.feeds.ReutersContentSource] |
| |
| i.e. |
| doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker |
| becomes |
| content.source=org.apache.lucene.benchmark.byTask.feeds.ReutersContentSource |
| |
| doc.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleDocMaker |
| becomes |
| content.source=org.apache.lucene.benchmark.byTask.feeds.SingleDocSource |
| |
| Also, PerfTask now logs a message in tearDown() rather than each Task doing its |
| own logging. A new setting called [log.step] is consulted to determine how often |
| to log. [doc.add.log.step] is no longer a valid setting. For easy migration of |
| current .alg files, rename [doc.add.log.step] to [log.step] and [doc.delete.log.step] |
| to [delete.log.step]. |
| |
| Additionally, [doc.maker.forever] should be changed to [content.source.forever]. |
| (Shai Erera via Mark Miller) |
| |
| 6/12/09 |
| LUCENE-1539: Added DeleteByPercentTask which enables deleting a |
| percentage of documents and searching on them. Changed CommitIndex |
| to optionally accept a label (recorded as userData=<label> in the |
| commit point). Added FlushReaderTask, and modified OpenReaderTask |
| to also optionally take a label referencing a commit point to open. |
| Also changed default autoCommit (when IndexWriter is opened) to |
| true. (Jason Rutherglen via Mike McCandless) |
| |
| 12/20/08 |
| LUCENE-1495: Allow task sequence to run for specfied number of seconds by adding ": 2.7s" (for example). |
| |
| 12/16/08 |
| LUCENE-1493: Stop using deprecated Hits API for searching; add new |
| param search.num.hits to set top N docs to collect. |
| |
| 12/16/08 |
| LUCENE-1492: Added optional readOnly param (default true) to OpenReader task. |
| |
| 9/9/08 |
| LUCENE-1243: Added new sorting benchmark capabilities. Also Reopen and commit tasks. (Mark Miller via Grant Ingersoll) |
| |
| 5/10/08 |
| LUCENE-1090: remove relative paths assumptions from benchmark code. |
| Only build.xml was modified: work-dir definition must remain so |
| benchmark tests can run from both trunk-home and benchmark-home. |
| |
| 3/9/08 |
| LUCENE-1209: Fixed DocMaker settings by round. Prior to this fix, DocMaker settings of |
| first round were used in all rounds. (E.g. term vectors.) |
| (Mark Miller via Doron Cohen) |
| |
| 1/30/08 |
| LUCENE-1156: Fixed redirect problem in EnwikiDocMaker. Refactored ExtractWikipedia to use EnwikiDocMaker. Added property to EnwikiDocMaker to allow |
| for skipping image only documents. |
| |
| 1/24/2008 |
| LUCENE-1136: add ability to not count sub-task doLogic increment |
| |
| 1/23/2008 |
| LUCENE-1129: ReadTask properly uses the traversalSize value |
| LUCENE-1128: Added support for benchmarking the highlighter |
| |
| 01/20/08 |
| LUCENE-1139: various fixes |
| - add merge.scheduler, merge.policy config properties |
| - refactor Open/CreateIndexTask to share setting config on IndexWriter |
| - added doc.reuse.fields=true|false for LineDocMaker |
| - OptimizeTask now takes int param to call optimize(int maxNumSegments) |
| - CloseIndexTask now takes bool param to call close(false) (abort running merges) |
| |
| |
| 01/03/08 |
| LUCENE-1116: quality package improvements: |
| - add MRR computation; |
| - allow control of max #queries to run; |
| - verify log & report are flushed. |
| - add TREC query reader for the 1MQ track. |
| |
| 12/31/07 |
| LUCENE-1102: EnwikiDocMaker now indexes the docid field, so results might not be comparable with results prior to this change, although |
| it is doubted that this one small field makes much difference. |
| |
| 12/13/07 |
| LUCENE-1086: DocMakers setup for the "docs.dir" property |
| fixed to properly handle absolute paths. (Shai Erera via Doron Cohen) |
| |
| 9/18/07 |
| LUCENE-941: infinite loop for alg: {[AddDoc(4000)]: 4} : * |
| ResetInputsTask fixed to work also after exhaustion. |
| All Reset Tasks now subclas ResetInputsTask. |
| |
| 8/9/07 |
| LUCENE-971: Change enwiki tasks to a doc maker (extending |
| LineDocMaker) that directly processes the Wikipedia XML and produces |
| documents. Intermediate files (one per document) are no longer |
| created. |
| |
| 8/1/07 |
| LUCENE-967: Add "ReadTokensTask" to allow for benchmarking just tokenization. |
| |
| 7/27/07 |
| LUCENE-836: Add support for search quality benchmarking, running |
| a set of queries against a searcher, and, optionally produce a submission |
| report, and, if query judgements are available, compute quality measures: |
| recall, precision_at_N, average_precision, MAP. TREC specific Judge (based |
| on TREC QRels) and TREC Topics reader are included in o.a.l.benchmark.quality.trec |
| but any other format of queries and judgements can be implemented and used. |
| |
| 7/24/07 |
| LUCENE-947: Add support for creating and index "one document per |
| line" from a large text file, which reduces per-document overhead of |
| opening a single file for each document. |
| |
| 6/30/07 |
| LUCENE-848: Added support for Wikipedia benchmarking. |
| |
| 6/25/07 |
| - LUCENE-940: Multi-threaded issues fixed: SimpleDateFormat; logging for addDoc/deleteDoc tasks. |
| - LUCENE-945: tests fail to find data dirs. Added sys-prop benchmark.work.dir and cfg-prop work.dir. |
| (Doron Cohen) |
| |
| 4/17/07 |
| - LUCENE-863: Deprecated StandardBenchmarker in favour of byTask code. |
| (Otis Gospodnetic) |
| |
| 4/13/07 |
| |
| Better error handling and javadocs around "exhaustive" doc making. |
| |
| 3/25/07 |
| |
| LUCENE-849: |
| 1. which HTML Parser is used is configurable with html.parser property. |
| 2. External classes added to classpath with -Dbenchmark.ext.classpath=path. |
| 3. '*' as repeating number now means "exhaust doc maker - no repetitions". |
| |
| 3/22/07 |
| |
| -Moved withRetrieve() call out of the loop in ReadTask |
| -Added SearchTravRetLoadFieldSelectorTask to help benchmark some of the FieldSelector capabilities |
| -Added options to store content bytes on the Reuters Doc (and others, but Reuters is the only one w/ it enabled) |
| |
| 3/21/07 |
| |
| Tests (for benchmarking code correctness) were added - LUCENE-840. |
| To be invoked by "ant test" from contrib/benchmark. (Doron Cohen) |
| |
| 3/19/07 |
| |
| 1. Introduced an AbstractQueryMaker to hold common QueryMaker code. (GSI) |
| 2. Added traversalSize parameter to SearchTravRetTask and SearchTravTask. Changed SearchTravRetTask to extend SearchTravTask. (GSI) |
| 3. Added FileBasedQueryMaker to run queries from a File or resource. (GSI) |
| 4. Modified query-maker generation for read related tasks to make further read tasks addition simpler and safer. (DC) |
| 5. Changed Taks' setParams() to throw UnsupportedOperationException if that task does not suppot command line param. (DC) |
| 6. Improved javadoc to specify all properties command line params currently supported. (DC) |
| 7. Refactored ReportTasks so that it is easy/possible now to create new report tasks. (DC) |
| |
| 01/09/07 |
| |
| 1. Committed Doron Cohen's benchmarking contribution, which provides an easily expandable task based approach to benchmarking. See the javadocs for information. (Doron Cohen via Grant Ingersoll) |
| |
| 2. Added this file. |
| |
| 3. 2/11/07: LUCENE-790 and 788: Fixed Locale issue with date formatter. Fixed some minor issues with benchmarking by task. Added a dependency |
| on the Lucene demo to the build classpath. (Doron Cohen, Grant Ingersoll) |
| |
| 4. 2/13/07: LUCENE-801: build.xml now builds Lucene core and Demo first and has classpath dependencies on the output of that build. (Doron Cohen, Grant Ingersoll) |