[DOCS] Fit and finish fixes (#110)
Co-authored-by: Jia Yu <jiayu198910@gmail.com>
diff --git a/README.md b/README.md
index dfb7e78..dde5ab0 100644
--- a/README.md
+++ b/README.md
@@ -27,7 +27,11 @@
## Install
-You can install Python SedonaDB with `pip install apache-sedona[db]`.
+You can install Python SedonaDB with PyPI:
+
+```sh
+pip install "apache-sedona[db]"
+```
## Overture buildings example
diff --git a/docs/contributors-guide.md b/docs/contributors-guide.md
new file mode 100644
index 0000000..2183c65
--- /dev/null
+++ b/docs/contributors-guide.md
@@ -0,0 +1,240 @@
+<!---
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+# Contributors Guide
+
+This guide details how to set up your development environment as a SedonaDB Contributor.
+
+## Fork and clone the repository
+
+Your first step is to create a personal copy of the repository and connect it to the main project.
+
+1. Fork the repository
+
+ * Navigate to the official [Apache SedonaDB GitHub repository](https://github.com/apache/sedona-db).
+ * Click the **Fork** button in the top-right corner. This creates a complete copy of the project in your own GitHub account.
+
+1. Clone your fork
+
+ * Next, clone your newly created fork to your local machine. This command downloads the repository into a new folder named `sedona-db`.
+ * Replace `YourUsername` with your actual GitHub username.
+
+ ```shell
+ git clone https://github.com/YourUsername/sedona-db.git
+ cd sedona-db
+ ```
+
+1. Configure the remotes
+
+ * Your local repository needs to know where the original project is so you can pull in updates. You'll add a remote link, traditionally named **`upstream`**, to the main Apache SedonaDB repository.
+ * Your fork is automatically configured as the **`origin`** remote.
+
+ ```shell
+ # Add the main repository as the "upstream" remote
+ git remote add upstream https://github.com/apache/sedona-db.git
+ ```
+
+1. Verify the configuration
+
+ * Run the following command to verify that you have two remotes configured correctly: `origin` (your fork) and `upstream` (the main repository).
+
+ ```shell
+ git remote -v
+ ```
+
+ * The output should look like this:
+
+ ```shell
+ origin https://github.com/YourUsername/sedona-db.git (fetch)
+ origin https://github.com/YourUsername/sedona-db.git (push)
+ upstream https://github.com/apache/sedona-db.git (fetch)
+ upstream https://github.com/apache/sedona-db.git (push)
+ ```
+
+## Rust
+
+SedonaDB is written in Rust and is a standard `cargo` workspace.
+
+You can install a recent version of the Rust compiler and cargo from
+[rustup.rs](https://rustup.rs/) and run tests using `cargo test`.
+
+A local development version of the CLI can be run with `cargo run --bin sedona-cli`.
+
+### Test data setup
+
+Some tests require submodules that contain test data or pinned versions of
+external dependencies. These submodules can be initialized with:
+
+```shell
+git submodule init
+git submodule update --recursive
+```
+
+Additionally, some of the data required in the tests can be downloaded by running the following script.
+
+```bash
+python submodules/download-assets.py
+```
+
+### System dependencies
+
+Some crates wrap external native libraries and require system dependencies
+to build.
+
+!!!note "`sedona-s2geography`"
+ At this time, the only crate that requires this is the `sedona-s2geography`
+ crate, which requires [CMake](https://cmake.org),
+ [Abseil](https://github.com/abseil/abseil-cpp) and OpenSSL.
+
+#### macOS
+
+These can be installed on macOS with [Homebrew](https://brew.sh):
+
+```shell
+brew install abseil openssl cmake geos
+```
+
+#### Linux and Windows
+
+On Linux and Windows, it is recommended to use [vcpkg](https://github.com/microsoft/vcpkg)
+to provide external dependencies. This can be done by setting the `CMAKE_TOOLCHAIN_FILE`
+environment variable:
+
+```shell
+export CMAKE_TOOLCHAIN_FILE=/path/to/vcpkg/scripts/buildsystems/vcpkg.cmake
+```
+
+#### Visual Studio Code (VSCode) Configuration
+
+When using VSCode, it may be necessary to set this environment variable in `settings.json`
+such that it can be found by rust-analyzer when running build/run tasks:
+
+```json
+{
+ "rust-analyzer.runnables.extraEnv": {
+ "CMAKE_TOOLCHAIN_FILE": "/path/to/vcpkg/scripts/buildsystems/vcpkg.cmake"
+ },
+ "rust-analyzer.cargo.extraEnv": {
+ "CMAKE_TOOLCHAIN_FILE": "/path/to/vcpkg/scripts/buildsystems/vcpkg.cmake"
+ }
+}
+```
+
+## Python
+
+Python bindings to SedonaDB are built with the [Maturin](https://www.maturin.rs) build
+backend.
+
+To install a development version of the main Python bindings for the first time, run the following commands:
+
+```shell
+cd python/sedonadb
+pip install -e ".[test]"
+```
+
+If editing Rust code in either SedonaDB or the Python bindings, you can recompile the
+native component with:
+
+```shell
+maturin develop
+```
+
+## Debugging
+
+### Rust
+
+Debugging Rust code is most easily done by writing or finding a test that triggers
+the desired behavior and running it using the *Debug* selection in
+[VSCode](https://code.visualstudio.com/) with the
+[rust-analyzer](https://marketplace.visualstudio.com/items?itemName=rust-lang.rust-analyzer)
+extension. Rust code can also be debugged using the CLI by finding the `main()` function in
+`sedona-cli` and choosing the *Debug* run option.
+
+### Python, C, and C++
+
+Installation of Python bindings with `maturin develop` ensures a debug-friendly build for
+debugging Rust, Python, or C/C++ code. Python code can be debugged using breakpoints in
+any IDE that supports debugging an editable Python package installation (e.g., VSCode);
+Rust, C, or C++ code can be debugged using the
+[CodeLLDB](https://marketplace.visualstudio.com/items?itemName=vadimcn.vscode-lldb)
+*Attach to Process...* command from the command palette in VSCode.
+
+## Low-level benchmarking
+
+Low-level Rust benchmarks use [criterion](https://github.com/bheisler/criterion.rs).
+In general, there is at least one benchmark for every implementation of a function
+(some functions have more than one implementation provided by different libraries),
+and a few other benchmarks for low-level iteration where work was done to optimize
+specific cases.
+
+### Running benchmarks
+
+Benchmarks for a specific crate can be run with `cargo bench`:
+
+```shell
+cd rust/sedona-geo
+cargo bench
+```
+
+Benchmarks for a specific function can be run with a filter. These can be run
+from the workspace or a specific crate (although the output is usually easier
+to read for a specific crate).
+
+```shell
+cargo bench -- st_area
+```
+
+### Managing results
+
+By default, criterion saves the last run and will report the difference between the
+current benchmark and the last time it was run (although there are options to
+save and load various baselines).
+
+A report of the latest results for all benchmarks can be opened with the following command:
+
+=== "macOS"
+ ```shell
+ open target/criterion/report/index.html
+ ```
+=== "Ubuntu"
+ ```shell
+ xdg-open target/criterion/report/index.html
+ ```
+
+All previous saved benchmark runs can be cleared with:
+
+```shell
+rm -rf target/criterion
+```
+
+## Documentation
+
+To contribute to the SedonaDB documentation:
+
+1. Clone the repository and create a fork.
+1. Install the Documentation dependencies:
+ ```sh
+ pip install -r docs/requirements.txt
+ ```
+1. Make your changes to the documentation files.
+1. Preview your changes locally using these commands:
+ * `mkdocs serve` - Start the live-reloading docs server.
+ * `mkdocs build` - Build the documentation site.
+ * `mkdocs -h` - Print help message and exit.
+1. Push your changes and open a pull request.
diff --git a/docs/development.md b/docs/development.md
deleted file mode 100644
index 58f7178..0000000
--- a/docs/development.md
+++ /dev/null
@@ -1,154 +0,0 @@
-<!---
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements. See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership. The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied. See the License for the
- specific language governing permissions and limitations
- under the License.
--->
-
-# Development
-
-## Rust
-
-SedonaDB is written and Rust and is a standard `cargo` workspace. You can
-install a recent version of the Rust compiler and cargo from
-[rustup.rs](https://rustup.rs/) and run tests using `cargo test`. A local
-development version of the CLI can be run with `cargo run --bin sedona-cli`.
-
-Some tests require submodules that contain test data or pinned versions of
-external dependencies. These submodules can be initialized with:
-
-```shell
-git submodule init
-git submodule update --recursive
-```
-
-Additionally, some of the data required in the tests can be downloaded by running the following script.
-
-```bash
-python submodules/download-assets.py
-```
-
-Some crates wrap external native libraries and require system dependencies
-to build. At this time the only crate that requires this is the sedona-s2geography
-crate, which requires [CMake](https://cmake.org),
-[Abseil](https://github.com/abseil/abseil-cpp) and OpenSSL. These can be installed
-on MacOS with [Homebrew](https://brew.sh):
-
-```shell
-brew install abseil openssl cmake geos
-```
-
-On Linux and Windows, it is recommended to use [vcpkg](https://github.com/microsoft/vcpkg)
-to provide external dependencies. This can be done by setting the `CMAKE_TOOLCHAIN_FILE`
-environment variable:
-
-```shell
-export CMAKE_TOOLCHAIN_FILE=/path/to/vcpkg/scripts/buildsystems/vcpkg.cmake
-```
-
-When using VSCode, it may be necessary to set this environment variable in settings.json
-such that it can be found by rust-analyzer when running build/run tasks:
-
-```json
-{
- "rust-analyzer.runnables.extraEnv": {
- "CMAKE_TOOLCHAIN_FILE": "/path/to/vcpkg/scripts/buildsystems/vcpkg.cmake"
- },
- "rust-analyzer.cargo.extraEnv": {
- "CMAKE_TOOLCHAIN_FILE": "/path/to/vcpkg/scripts/buildsystems/vcpkg.cmake"
- }
-}
-```
-
-## Python
-
-Python bindings to SedonaDB are built with the [Maturin](https://www.maturin.rs) build
-backend. Installing a development version of the main Python bindings the first time
-can be done with:
-
-```shell
-cd python/sedonadb
-pip install -e ".[test]"
-```
-
-If editing Rust code in either SedonaDB or the Python bindings, you can recompile the
-native component with:
-
-```shell
-maturin develop
-```
-
-## Debugging
-
-Debugging Rust code is most easily done by writing or finding a test that triggers
-the desired behavior and running it using the *Debug* selection in
-[VSCode](https://code.visualstudio.com/) with the
-[rust-analyzer](https://marketplace.visualstudio.com/items?itemName=rust-lang.rust-analyzer)
-extension. Rust code can also debugged using the CLI by finding the `main()` function in
-sedona-cli and choosing the *Debug* run option.
-
-Installation of Python bindings with `maturin develop` ensures a debug-friendly build for
-debugging Rust, Python, or C/C++ code. Python code can be debugged using breakpoints in
-any IDE that supports debugging an editable Python package installation (e.g., VSCode);
-Rust, C, or C++ code can be debugged using the
-[CodeLLDB](https://marketplace.visualstudio.com/items?itemName=vadimcn.vscode-lldb)
-*Attach to Process...* command from the command palette in VSCode.
-
-## Low-level benchmarking
-
-Low-level Rust benchmarks use [criterion](https://github.com/bheisler/criterion.rs).
-In general, there is at least one benchmark for every implementation of a function
-(some functions have more than one implementation provided by different libraries),
-and a few other benchmarks for low-level iteration where work was done to optimize
-specific cases.
-
-Briefly, benchmarks for a specific crate can be run with `cargo bench`:
-
-```shell
-cd rust/sedona-geo
-cargo bench
-```
-
-Benchmarks for a specific function can be run with a filter. These can be run
-from the workspace or a specific crate (although the output is usually easier
-to read for a specific crate).
-
-```shell
-cargo bench -- st_area
-```
-
-By default, criterion saves the last run and will report the difference between the
-current benchmark and the last time it was run (although there are options to
-save and load various baselines). A report containing the last run for any
-benchmark that was ever run can be opened with:
-
-```shell
-# MacOS
-open target/criterion/report/index.html
-# Ubuntu
-xdg-open target/criterion/report/index.html
-```
-
-All previous saved benchmark runs can be cleared with:
-
-```shell
-rm -rf target/criterion
-```
-
-## Documentation
-
-* `mkdocs serve` - Start the live-reloading docs server.
-* `mkdocs build` - Build the documentation site.
-* `mkdocs -h` - Print help message and exit.
diff --git a/docs/index.md b/docs/index.md
index 45b2119..62a8bfc 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -1,6 +1,5 @@
---
hide:
- - navigation
title: Introducing SedonaDB
---
@@ -24,30 +23,45 @@
under the License.
-->
-SedonaDB is a high-performance, dependency-free geospatial compute engine designed for single-node processing, making it ideal for smaller datasets on local machines or cloud instances.
+SedonaDB is a single-node analytical database engine with geospatial as the first-class citizen.
+
+Fast and dependency-free, SedonaDB is ideal for working with smaller datasets located on local machines or cloud instances.
The initial `0.1` release supports a core set of vector operations, with comprehensive vector and raster computation capabilities planned for the near future.
+For distributed workloads, you can still leverage the power of SedonaSpark, SedonaFlink, or SedonaSnow.
+
## Key features
SedonaDB has several advantages:
* **Exceptional Performance:** Built in Rust to process massive geospatial datasets with exceptional speed.
* **Unified Geospatial Toolkit:** Access a comprehensive suite of functions for both vector and raster data in a single, powerful library.
-* **Seamless Ecosystem Integration:** Built on Apache Arrow for smooth interoperability with popular data science libraries like GeoPandas, DuckDB, and Polars.
+* **Extensive Ecosystem Integration:** Built on Apache Arrow for smooth interoperability with popular data science libraries like GeoPandas, DuckDB, and Polars.
* **Flexible APIs:** Effortlessly switch between Python and SQL interfaces to match your preferred workflow and skill set.
* **Guaranteed CRS Propagation:** Automatically manages coordinate reference systems (CRS) to ensure spatial accuracy and prevent common errors.
* **Broad File Format Support:** Work with a wide range of both modern and legacy geospatial file formats like geoparquet.
* **Highly Extensible:** Easily customize and extend the library's functionality to meet your project's unique requirements.
-## Run a query in SQL, Python, or Rust
+## Install SedonaDB
-SedonaDB offers a flexible query interface in SQL, Python, or Rust.
+Here's how to install SedonaDB with various build tools:
-Engineered for speed, SedonaDB provides performant geospatial processing on a single machine. This makes it perfect for the rapid analysis of smaller datasets, whether you're working locally or on a cloud server. While the initial release focuses on core vector operations, a full suite of vector and raster computations is on the roadmap.
+=== "pip"
-For massive, distributed workloads, you can leverage the power of SedonaSpark,
-SedonaFlink, or SedonaSnow.
+ ```bash
+ pip install "apache-sedona[db]"
+ ```
+
+=== "R"
+
+ ```bash
+ install.packages("sedonadb", repos = "https://community.r-multiverse.org")
+ ```
+
+## Run a query in SQL, Python, Rust, or R
+
+SedonaDB offers a flexible query interface.
=== "SQL"
@@ -58,7 +72,7 @@
=== "Python"
```python
- import seonda.db
+ import sedona.db
sd = sedona.db.connect()
sd.sql("SELECT ST_Point(0, 1) as geom")
@@ -86,21 +100,6 @@
sd_sql("SELECT ST_Point(0, 1) as geom")
```
-## Install SedonaDB
-
-Here's how to install SedonaDB with various build tools:
-
-=== "pip"
-
- ```bash
- pip install "apache-sedona[db]"
- ```
-
-=== "R"
-
- ```bash
- install.packages("sedonadb", repos = "https://community.r-multiverse.org")
- ```
## Have questions?
diff --git a/docs/programming-guide.ipynb b/docs/programming-guide.ipynb
index 0c3867d..13e36d1 100644
--- a/docs/programming-guide.ipynb
+++ b/docs/programming-guide.ipynb
@@ -24,14 +24,18 @@
" under the License.\n",
"-->\n",
"\n",
- "# SedonaDB Guide\n",
+ "# Working with Vector Data\n",
"\n",
- "This page explains how to process vector data with SedonaDB.\n",
+ "Process vector data using SedonaDB. You will learn to create DataFrames, run spatial queries, and manage file I/O. Let's begin by connecting to SedonaDB.\n",
"\n",
- "You will learn how to create SedonaDB DataFrames, run spatial queries, and perform I/O operations with various types of files.\n",
- "\n",
- "Let's start by establishing a SedonaDB connection.\n",
- "\n",
+ "Let's start by establishing a SedonaDB connection."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "119fcbae",
+ "metadata": {},
+ "source": [
"## Establish SedonaDB connection\n",
"\n",
"Here's how to create the SedonaDB connection:"
@@ -137,7 +141,7 @@
"source": [
"Now, let's run some spatial queries.\n",
"\n",
- "**Read from GeoPandas DataFrame**\n",
+ "### Read from GeoPandas DataFrame\n",
"\n",
"This section shows how to convert a GeoPandas DataFrame into a SedonaDB DataFrame.\n",
"\n",
@@ -146,7 +150,7 @@
},
{
"cell_type": "code",
- "execution_count": 12,
+ "execution_count": null,
"id": "b81549f2-0f58-49e4-9011-8de6578c2b0e",
"metadata": {},
"outputs": [],
@@ -202,7 +206,7 @@
"\n",
"Let's see how to run spatial operations like filtering, joins, and clustering algorithms.\n",
"\n",
- "**Spatial filtering**\n",
+ "### Spatial filtering\n",
"\n",
"Let's run a spatial filtering operation to fetch all the objects in the following polygon:"
]
@@ -249,11 +253,11 @@
"id": "32076e01-d807-40ed-8457-9d8c4244e89f",
"metadata": {},
"source": [
- "You can see it only includes the divisions in the Nova Scotia area. Skip to the visualization section to see how this data can be graphed on a map.\n",
+ "You can see it only includes the divisions in the Nova Scotia area.\n",
"\n",
- "**K-nearest neighbors (KNN) joins**\n",
+ "### K-nearest neighbors (KNN) joins\n",
"\n",
- "Create `restaurants` and `customers` tables so we can demonstrate the KNN join functionality."
+ "Create `restaurants` and `customers` views so we can demonstrate the KNN join functionality."
]
},
{
@@ -370,22 +374,6 @@
"source": [
"Notice how each customer has two rows - one for each of the two closest restaurants."
]
- },
- {
- "cell_type": "markdown",
- "id": "3cb1e53b",
- "metadata": {},
- "source": [
- "## GeoParquet support\n",
- "\n",
- "You can also read GeoParquet files with SedonaDB with `read_parquet()`\n",
- "\n",
- "```python\n",
- "df = sd.read_parquet(\"DATA_FILE.parquet\")\n",
- "```\n",
- "\n",
- "Once you read the file, you can easily expose it as a view and query it with spatial SQL, as we demonstrated in the example above.\n"
- ]
}
],
"metadata": {
diff --git a/docs/programming-guide.md b/docs/programming-guide.md
index 493603a..7da3c5f 100644
--- a/docs/programming-guide.md
+++ b/docs/programming-guide.md
@@ -17,11 +17,9 @@
under the License.
-->
-# SedonaDB Guide
+# Process Vector Data with SedonaDB
-This page explains how to process vector data with SedonaDB.
-
-You will learn how to create SedonaDB DataFrames, run spatial queries, and perform I/O operations with various types of files.
+Process vector data using SedonaDB. You will learn to create DataFrames, run spatial queries, and manage file I/O. Let's begin by connecting to SedonaDB.
Let's start by establishing a SedonaDB connection.
@@ -82,7 +80,7 @@
Now, let's run some spatial queries.
-**Read from GeoPandas DataFrame**
+### Read from GeoPandas DataFrame
This section shows how to convert a GeoPandas DataFrame into a SedonaDB DataFrame.
@@ -120,7 +118,7 @@
Let's see how to run spatial operations like filtering, joins, and clustering algorithms.
-**Spatial filtering**
+### Spatial filtering
Let's run a spatial filtering operation to fetch all the objects in the following polygon:
@@ -151,11 +149,11 @@
└──────────┴──────────┴────────────────────────────────────────────────────────────────────────────┘
-You can see it only includes the divisions in the Nova Scotia area. Skip to the visualization section to see how this data can be graphed on a map.
+You can see it only includes the divisions in the Nova Scotia area.
-**K-nearest neighbors (KNN) joins**
+### K-nearest neighbors (KNN) joins
-Create `restaurants` and `customers` tables so we can demonstrate the KNN join functionality.
+Create `restaurants` and `customers` views so we can demonstrate the KNN join functionality.
```python
@@ -234,13 +232,3 @@
Notice how each customer has two rows - one for each of the two closest restaurants.
-
-## GeoParquet support
-
-You can also read GeoParquet files with SedonaDB with `read_parquet()`
-
-```python
-df = sd.read_parquet("DATA_FILE.parquet")
-```
-
-Once you read the file, you can easily expose it as a view and query it with spatial SQL, as we demonstrated in the example above.
diff --git a/docs/quickstart-python.ipynb b/docs/quickstart-python.ipynb
index 56dcc17..3558e22 100644
--- a/docs/quickstart-python.ipynb
+++ b/docs/quickstart-python.ipynb
@@ -250,7 +250,7 @@
},
{
"cell_type": "code",
- "execution_count": 8,
+ "execution_count": null,
"id": "6dd816c7-fd3f-4358-b628-ef5e6940c95c",
"metadata": {},
"outputs": [],
diff --git a/docs/reference/read-parquet-files.md b/docs/reference/read-parquet-files.md
deleted file mode 100644
index 6dc4836..0000000
--- a/docs/reference/read-parquet-files.md
+++ /dev/null
@@ -1,71 +0,0 @@
-
-<!---
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements. See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership. The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied. See the License for the
- specific language governing permissions and limitations
- under the License.
--->
-
-# Reading Parquet Files
-
-To read a Parquet file, you must use the dedicated `sd.read_parquet()` method. You cannot query a file path directly within the `sd.sql()` `FROM` clause.
-
-The `sd.sql()` function is designed to query tables that have already been registered in the session. When you pass a path like `'s3://...'` to `FROM`, the SQL engine searches for a registered table with that literal name and fails when it's not found, producing a `table not found` error.
-
-## Usage
-
-The correct process is a two-step approach:
-
-1. **Load** the Parquet file into a data frame using `sd.read_parquet()`.
-1. **Register** the data frame view with `to_view()`.
-1. **Query** the view using `sd.sql()`.
-
-```python linenums="1" title="Read a parquet file with SedonaDB"
-
-import sedona.db
-sd = sedona.db.connect()
-
-df = sd.read_parquet(
- 's3://wherobots-benchmark-prod/SpatialBench_sf=1_format=parquet/'
- 'building/building.parquet'
-)
-
-# Load the Parquet file, which creates a Pandas data frame
-df = sd.read_parquet('s3://wherobots-benchmark-prod/SpatialBench_sf=1_format=parquet/building/building.parquet')
-
-# Convert the Pandas data frame to a Spark data frame AND
-# register it as a temporary view in a single line.
-spark.createDataFrame(df).to_view("zone")
-
-# Now, query the view using SQL
-sd.sql("SELECT * FROM zone LIMIT 10").show()
-```
-
-### Common Errors
-
-Directly using a file path within `sd.sql()` is a common mistake that will result in an error.
-
-**Incorrect Code:**
-
-```python
-# This will fail because the SQL engine looks for a table named 's3://...'
-sd.sql("SELECT * FROM 's3://wherobots-benchmark-prod/SpatialBench_sf=1_format=parquet/building/building.parquet'")
-```
-
-**Resulting Error:**
-
-```bash
-sedonadb._lib.SedonaError: Error during planning: table '...s3://...' not found
-```
diff --git a/docs/stylesheets/extra.css b/docs/stylesheets/extra.css
index b18d0b4..b4651e0 100644
--- a/docs/stylesheets/extra.css
+++ b/docs/stylesheets/extra.css
@@ -75,3 +75,14 @@
padding: 0 0.9rem;
font-size: 0.65rem; /* NEW: Adjust font size */
}
+
+/* ==========================================================================
+ Mobile Navigation Styles
+ ========================================================================== */
+
+/* This targets the main container of the slide-out navigation on mobile */
+.md-nav--primary .md-nav__title,
+.md-nav__source {
+ background-color: var(--color-red); /* Use your red color */
+ box-shadow: none; /* Optional: removes the shadow */
+}
diff --git a/docs/working-with-parquet-files.ipynb b/docs/working-with-parquet-files.ipynb
new file mode 100644
index 0000000..40aedaf
--- /dev/null
+++ b/docs/working-with-parquet-files.ipynb
@@ -0,0 +1,166 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Working with Parquet Files\n",
+ "\n",
+ "The easiest way to read a GeoParquet or Parquet file is to use `sd.read_parquet()`. Alternatively, you can query these files directly by their path in SQL."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Install SedonaDB\n",
+ "\n",
+ "Use pip to install SedonaDB from the Python Package Index (PyPI)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "> **Note**: Before running this notebook on your local machine, you must have SedonaDB installed in your environment. You can install SedonaDB with the following command: `pip install \"apache-sedona[db]\"`"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Implementation\n",
+ "\n",
+ "A common workflow for working with GeoParquet and/or Parquet files is:\n",
+ "\n",
+ "1. **Load** the Parquet file into a data frame using `sd.read_parquet()`.\n",
+ "2. **Register** the data frame as a view with `to_view()`.\n",
+ "3. **Query** the view using `sd.sql()`.\n",
+ "4. **Write** your results to a Parquet file with `.to_parquet()` or use `.to_pandas()` to export your results to a DataFrame or GeoDataFrame."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Import the sedona.db module and connect to SedonaDB\n",
+ "import sedona.db\n",
+ "\n",
+ "sd = sedona.db.connect()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "┌──────────────┬───────────────────────────────┐\n",
+ "│ name ┆ geometry │\n",
+ "│ utf8 ┆ geometry │\n",
+ "╞══════════════╪═══════════════════════════════╡\n",
+ "│ Vatican City ┆ POINT(12.4533865 41.9032822) │\n",
+ "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ San Marino ┆ POINT(12.4417702 43.9360958) │\n",
+ "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ Vaduz ┆ POINT(9.5166695 47.1337238) │\n",
+ "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ Lobamba ┆ POINT(31.1999971 -26.4666675) │\n",
+ "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ Luxembourg ┆ POINT(6.1300028 49.6116604) │\n",
+ "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ Palikir ┆ POINT(158.1499743 6.9166437) │\n",
+ "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ Majuro ┆ POINT(171.3800002 7.1030043) │\n",
+ "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ Funafuti ┆ POINT(179.2166471 -8.516652) │\n",
+ "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ Melekeok ┆ POINT(134.6265485 7.4873962) │\n",
+ "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ Bir Lehlou ┆ POINT(-9.6525222 26.1191667) │\n",
+ "└──────────────┴───────────────────────────────┘\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 1. Load the Parquet file\n",
+ "df = sd.read_parquet(\n",
+ " \"https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/\"\n",
+ " \"natural-earth/files/natural-earth_cities_geo.parquet\"\n",
+ ")\n",
+ "\n",
+ "# 2. Register the data frame as a view\n",
+ "df.to_view(\"zone\")\n",
+ "\n",
+ "# 3. Query the view and store the result in a new DataFrame\n",
+ "query_result_df = sd.sql(\"SELECT * FROM zone LIMIT 10\")\n",
+ "query_result_df.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ "Verifying the written file at 'query_results.parquet'...\n",
+ "┌──────────────┬───────────────────────────────┐\n",
+ "│ name ┆ geometry │\n",
+ "│ utf8 ┆ geometry │\n",
+ "╞══════════════╪═══════════════════════════════╡\n",
+ "│ Vatican City ┆ POINT(12.4533865 41.9032822) │\n",
+ "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ San Marino ┆ POINT(12.4417702 43.9360958) │\n",
+ "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ Vaduz ┆ POINT(9.5166695 47.1337238) │\n",
+ "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ Lobamba ┆ POINT(31.1999971 -26.4666675) │\n",
+ "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ Luxembourg ┆ POINT(6.1300028 49.6116604) │\n",
+ "└──────────────┴───────────────────────────────┘\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 4. Write the result to a new Parquet file\n",
+ "output_path = \"query_results.parquet\"\n",
+ "query_result_df.to_parquet(output_path)\n",
+ "\n",
+ "# (Optional) Verify the written file\n",
+ "print(f\"\\nVerifying the written file at '{output_path}'...\")\n",
+ "verified_df = sd.read_parquet(output_path)\n",
+ "verified_df.show(5)"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": ".venv (3.13.3)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.13.3"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/docs/working-with-parquet-files.md b/docs/working-with-parquet-files.md
new file mode 100644
index 0000000..ea28931
--- /dev/null
+++ b/docs/working-with-parquet-files.md
@@ -0,0 +1,116 @@
+<!---
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+# Working with Parquet Files
+
+The easiest way to read a GeoParquet or Parquet file is to use `sd.read_parquet()`. Alternatively, you can query these files directly by their path in SQL.
+
+## Install SedonaDB
+
+Use pip to install SedonaDB from the Python Package Index (PyPI).
+
+> **Note**: Before running this notebook on your local machine, you must have SedonaDB installed in your environment. You can install SedonaDB with the following command: `pip install "apache-sedona[db]"`
+
+## Implementation
+
+A common workflow for working with GeoParquet and/or Parquet files is:
+
+1. **Load** the Parquet file into a data frame using `sd.read_parquet()`.
+2. **Register** the data frame as a view with `to_view()`.
+3. **Query** the view using `sd.sql()`.
+4. **Write** your results to a Parquet file with `.to_parquet()` or use `.to_pandas()` to export your results to a DataFrame or GeoDataFrame.
+
+
+```python
+# Import the sedona.db module and connect to SedonaDB
+import sedona.db
+
+sd = sedona.db.connect()
+```
+
+
+```python
+# 1. Load the Parquet file
+df = sd.read_parquet(
+ "https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/"
+ "natural-earth/files/natural-earth_cities_geo.parquet"
+)
+
+# 2. Register the data frame as a view
+df.to_view("zone")
+
+# 3. Query the view and store the result in a new DataFrame
+query_result_df = sd.sql("SELECT * FROM zone LIMIT 10")
+query_result_df.show()
+```
+
+ ┌──────────────┬───────────────────────────────┐
+ │ name ┆ geometry │
+ │ utf8 ┆ geometry │
+ ╞══════════════╪═══════════════════════════════╡
+ │ Vatican City ┆ POINT(12.4533865 41.9032822) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ San Marino ┆ POINT(12.4417702 43.9360958) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Vaduz ┆ POINT(9.5166695 47.1337238) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Lobamba ┆ POINT(31.1999971 -26.4666675) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Luxembourg ┆ POINT(6.1300028 49.6116604) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Palikir ┆ POINT(158.1499743 6.9166437) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Majuro ┆ POINT(171.3800002 7.1030043) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Funafuti ┆ POINT(179.2166471 -8.516652) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Melekeok ┆ POINT(134.6265485 7.4873962) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Bir Lehlou ┆ POINT(-9.6525222 26.1191667) │
+ └──────────────┴───────────────────────────────┘
+
+
+
+```python
+# 4. Write the result to a new Parquet file
+output_path = "query_results.parquet"
+query_result_df.to_parquet(output_path)
+
+# (Optional) Verify the written file
+print(f"\nVerifying the written file at '{output_path}'...")
+verified_df = sd.read_parquet(output_path)
+verified_df.show(5)
+```
+
+
+ Verifying the written file at 'query_results.parquet'...
+ ┌──────────────┬───────────────────────────────┐
+ │ name ┆ geometry │
+ │ utf8 ┆ geometry │
+ ╞══════════════╪═══════════════════════════════╡
+ │ Vatican City ┆ POINT(12.4533865 41.9032822) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ San Marino ┆ POINT(12.4417702 43.9360958) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Vaduz ┆ POINT(9.5166695 47.1337238) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Lobamba ┆ POINT(31.1999971 -26.4666675) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Luxembourg ┆ POINT(6.1300028 49.6116604) │
+ └──────────────┴───────────────────────────────┘
diff --git a/mkdocs.yml b/mkdocs.yml
index 621f1bc..233ce78 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -20,19 +20,20 @@
site_url: https://sedona.apache.org/sedonadb/
nav:
- SedonaDB: index.md
+ - Python Quickstart: quickstart-python.md
- SedonaDB Guides:
- - Python Quickstart: quickstart-python.md
- - SedonaDB Guide: programming-guide.md
+ - Working with Vector Data: programming-guide.md
- Working with GeoPandas: geopandas-interop.md
- Working with Overture: overture-examples.md
- - Development: development.md
+ - Working with Parquet Files: working-with-parquet-files.md
+ - Contributors Guide: contributors-guide.md
+
- SedonaDB Reference:
- Python:
- Python Functions: reference/python.md
- SQL:
- SQL Functions: reference/sql.md
- Spatial Joins: reference/sql-joins.md
- - Read Parquet Files: reference/read-parquet-files.md
- Blog: "https://sedona.apache.org/latest/blog/"
- Community: "https://sedona.apache.org/latest/community/contact/"
- Apache Software Foundation: "https://sedona.apache.org/latest/asf/asf/"
@@ -50,7 +51,7 @@
primary: custom
accent: 'green'
favicon: image/sedona_logo_symbol.png
- logo: image/sedona_logo_symbol_white.svg
+ logo: image/sedona_logo_symbol.png
icon:
logo: fontawesome/solid/earth-americas
repo: fontawesome/brands/github