docs/index.md

title: SpatialBench

SpatialBench is a benchmark for assessing geospatial SQL analytics query performance across database systems, making it easy to run tests on a realistic dataset with any query engine.

The methodology is unbiased, allowing you to run the benchmarks in any environment to compare the relative performance between runtimes.

Why SpatialBench

SpatialBench was created because standard database benchmarks don't adequately test the unique demands of geospatial queries. SpatialBench provides an open-source, standardized, and scalable framework designed specifically for geospatial analytics.

Inspired by the Star Schema Benchmark (SSB) and NYC taxi data, SpatialBench combines realistic urban mobility scenarios with a star schema extended with spatial attributes like pickup/dropoff points, zones, and building footprints.

This design enables evaluation of the following geospatial operations:

Spatial joins
Distance queries
Aggregations
Point-in-polygon analysis

Let's dive into the advantages of SpatialBench.

Key Features

To ensure fair and comprehensive testing, SpatialBench provides the following advantages:

Features realistic spatial datasets with native geometry columns.
Includes a suite of queries that test various operations such as spatial predicates and joins.
Provides a built-in synthetic data generator for creating consistent test data.
Offers a configurable scale factor to benchmark performance across various environments, from a single local machine to a large-scale cloud cluster.
Ensures consistent and reproducible benchmark results across all environments.
Utilizes a fully documented and unbiased methodology to facilitate fair comparisons.
Open-source and community-driven to foster transparency and continuous improvement.

Generate synthetic data

Here's how you can install the synthetic data generator:

cargo install --path ./spatialbench-cli

Here's how you can generate the synthetic dataset:

spatialbench-cli -s 1 --format=parquet

See the project repository README for the complete set of straightforward data generation instructions.

Example query

Here's an example query that counts the number of trips that start within 500 meters of each building:

SELECT
    b.b_buildingkey,
    b.b_name,
    COUNT(*) AS nearby_pickup_count
FROM trip t
JOIN building b
ON ST_DWithin(t.t_pickup_loc, b.b_boundary, 500)
GROUP BY b.b_buildingkey, b.b_name
ORDER BY nearby_pickup_count DESC;

This query performs a distance join, followed by an aggregation. It‘s a great example of a query that’s useful for performance benchmarking a spatial engine that can process vector geometries.

Join the community

Feel free to start a GitHub Discussion or join the Discord community to ask the developers any questions you may have.

We look forward to collaborating with you on these benchmarks!