SpatialBench is a benchmark for assessing geospatial SQL analytics query performance across database systems, making it easy to run tests on a realistic dataset with any query engine.
The methodology is unbiased, allowing you to run the benchmarks in any environment to compare the relative performance between runtimes.
SpatialBench was created because standard database benchmarks don't adequately test the unique demands of geospatial queries. SpatialBench provides an open-source, standardized, and scalable framework designed specifically for geospatial analytics.
Inspired by the Star Schema Benchmark (SSB) and NYC taxi data, SpatialBench combines realistic urban mobility scenarios with a star schema extended with spatial attributes like pickup/dropoff points, zones, and building footprints.
This design enables evaluation of the following geospatial operations:
Let's dive into the advantages of SpatialBench.
To ensure fair and comprehensive testing, SpatialBench provides the following advantages:
Here's how you can install the synthetic data generator:
cargo install --path ./spatialbench-cli
Here's how you can generate the synthetic dataset:
spatialbench-cli -s 1 --format=parquet
See the project repository README for the complete set of straightforward data generation instructions.
Here's an example query that counts the number of trips that start within 500 meters of each building:
SELECT b.b_buildingkey, b.b_name, COUNT(*) AS nearby_pickup_count FROM trip t JOIN building b ON ST_DWithin(t.t_pickup_loc, b.b_boundary, 500) GROUP BY b.b_buildingkey, b.b_name ORDER BY nearby_pickup_count DESC;
This query performs a distance join, followed by an aggregation. It‘s a great example of a query that’s useful for performance benchmarking a spatial engine that can process vector geometries.
Feel free to start a GitHub Discussion or join the Discord community to ask the developers any questions you may have.
We look forward to collaborating with you on these benchmarks!