title: generate-data parent: Command Reference grand_parent: Reference nav_order: 40

generate-data

Generates synthetic benchmark data from an existing index schema (JSON mappings) or a custom Python module. The generated corpus can be used in Apache Solr Orbit workloads.

Syntax

solr-orbit generate-data --index-mappings FILE --total-size N --index-name NAME [OPTIONS]
solr-orbit generate-data --custom-module FILE  --total-size N --index-name NAME [OPTIONS]

--index-mappings and --custom-module are mutually exclusive. --total-size is required.

Options

OptionShortRequiredDefaultDescription
--index-mappings-iYes (or --custom-module)Path to a JSON file containing index mappings to use as the schema for generated documents
--custom-module-mYes (or --index-mappings)Path to a custom Python module that defines document generation logic. The module must contain a generate_synthetic_document() function
--total-size-sYesTarget corpus size in GB
--index-name-nYesName for the generated corpus (used in the output file path)
--output-path-pNo./generated_corporaDirectory where the generated corpus files will be written
--custom-config-cNoOptional config file for overriding synthetic data generation settings or providing values used by a custom module
--test-document-tNooffGenerate a single document and print it to the console for validation, without writing a full corpus

Examples

Generate 10 GB of synthetic data from an existing schema:

solr-orbit generate-data \
  --index-mappings /path/to/mappings.json \
  --index-name my_index \
  --total-size 10 \
  --output-path /data/corpora

Preview a single generated document using a custom module:

solr-orbit generate-data \
  --custom-module /path/to/my_generator.py \
  --index-name my_index \
  --total-size 1 \
  --test-document

See also