This file provides guidance to AI coding agents working with this repository.
Prerequisites: pyenv, JDK 21, Docker, docker-compose, jq
Optional: pbzip2 (parallel bzip2 — install via apt install pbzip2 or brew install pbzip2). Without it, .bz2 corpus decompression falls back to Python stdlib (slower).
make develop # Install Python 3.12 via pyenv, create .venv, install all deps source .venv/bin/activate # Activate virtual environment
make lint # Run ruff check on all Python source files make test # Run unit tests (pytest tests/) pytest tests/path/to/test_file.py::TestClass::test_method # Run a single test make it # Run integration tests via tox (requires Java, Docker; ~30 min) make it312 # Integration tests for Python 3.12 only make benchmark # Run performance benchmarks (pytest benchmarks/) make build # Build distribution wheel make clean # Remove build artifacts, caches, tox environments
pyproject.toml under [tool.ruff])make lint before committing; CI enforces this on every PRApache Solr Orbit is a macrobenchmarking framework for Apache Solr clusters, using an actor-based concurrent execution model via the Thespian library.
solr-orbit / sb → solrorbit/benchmark.py:main — CLI for running benchmarkssolr-orbitd / sbd → solrorbit/benchmarkd.py:main — Daemon for distributed worker nodessolrorbit/)Orchestration layer:
benchmark.py — CLI arg parsing, subcommands: run, list, info, generate, convert-workloadtest_run_orchestrator.py — Pipeline execution: prepares, launches cluster, runs workload, publishes resultsactor.py — Thespian actor system setup for parallel/distributed executionconfig.py — Configuration loading and managementCluster management (builder/):
solr_provisioner.py — Download, install and launch Solr (from distribution, sources, or Docker)provisioners/ — Generic node provisioning infrastructuredownloaders/ — Download Solr distributionsinstallers/ — Install Solr on provisioned nodeslaunchers/ — Start/stop cluster nodesexecutors/ — Execute remote commands on cluster nodesconfigs/ — Jinja2 templates for cluster configurationBenchmark execution:
workload/ — Load and manage workload definitions (test procedures, operations, schedules)worker_coordinator/ — Coordinate distributed worker nodes; driver.py drives actual loadworker_coordinator/runner.py — Solr operation runners (SolrBulkIndex, SolrSearch, SolrCreateCollection, etc.)metrics.py — Collect, store, and aggregate benchmark metrics (filesystem-backed; no external store)telemetry.py — Solr-specific telemetry devices (JVM, node, collection, query, indexing, cache stats)publisher.py — Publish and format benchmark resultsresult_writer.py — Write results to local filesystem (JSON/CSV)Data and connectivity:
client.py — SolrAdminClient and SolrClient (HTTP via requests/pysolr; Collections API, /select, /update)synthetic_data_generator/ — Generate synthetic test datasetsworkload_generator/ — Generate workload definition files from existing Solr collectionsWorkload conversion:
conversion/workload_converter.py — Convert an OpenSearch Benchmark workload directory to Solr formatconversion/detector.py — Detect whether a workload uses OpenSearch-only operations/query DSLconversion/query.py — Translate OpenSearch Query DSL to Solr JSON Query DSLconversion/schema.py — Translate OpenSearch index mappings to Solr managed-schema.xmlUtilities:
utils/ — IO, process management, console output, network, version parsing, options handlingcloud_provider/ — Cloud provider integrations (AWS via boto3, GCP via google-auth)visualizations/ — Result visualizationtests/ — Unit tests mirroring solrorbit/ structureit/ — Integration tests (spin up real Solr clusters via Docker/provisioning)benchmarks/ — Performance benchmarks for Solr Orbit itselfWorkloads are defined as JSON/YAML files with:
Workloads must be in Solr format. Use solr-orbit convert-workload to convert from OpenSearch Benchmark format. Workloads can be loaded from a local path (--workload-path) or from a git workload repository (--workload-repository).
pysolr (data ops), requests (HTTP admin), psutil (I/O metrics), thespian (actor model), pytest (tests), tabulate (console output)~/.solr-orbit/, SQLite test-runs storedocs/; deployed to GitHub Pages via .github/workflows/docs.yml