# extract-reuters

Name

benchmark-extract-reuters - Splits Reuters SGML documents into simple text files containing: Title, Date, Dateline, Body.

Synopsis

lucene benchmark extract-reuters <INPUT_DIRECTORY> <OUTPUT_DIRECTORY> [?|-h|--help]

Arguments

INPUT_DIRECTORY

Path to Reuters SGML files.

OUTPUT_DIRECTORY

Path to a directory where the output files will be written.

Options

?|-h|--help

Prints out a short help for the command.

Example

Extracts the reuters SGML files in the z:\input directory and places the content in the z:\output directory.

lucene benchmark extract-reuters z:\input z:\output