blob: f699c76de471e6d239f2b8bbbaa607df859f600a [file] [log] [blame]
{
"cells": [
{
"cell_type": "markdown",
"id": "caa5f2d5-28bb-4ce9-8a11-92646b3a9f6c",
"metadata": {},
"source": [
"<!---\n",
" Licensed to the Apache Software Foundation (ASF) under one\n",
" or more contributor license agreements. See the NOTICE file\n",
" distributed with this work for additional information\n",
" regarding copyright ownership. The ASF licenses this file\n",
" to you under the Apache License, Version 2.0 (the\n",
" \"License\"); you may not use this file except in compliance\n",
" with the License. You may obtain a copy of the License at\n",
"\n",
" http://www.apache.org/licenses/LICENSE-2.0\n",
"\n",
" Unless required by applicable law or agreed to in writing,\n",
" software distributed under the License is distributed on an\n",
" \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
" KIND, either express or implied. See the License for the\n",
" specific language governing permissions and limitations\n",
" under the License.\n",
"-->\n",
"\n",
"# GeoPandas Interoperability\n",
"\n",
"> Note: Before running this notebook, ensure that you have installed SedonaDB: `pip install \"apache-sedona[db]\"`\n",
"\n",
"This notebook shows how to leverage GeoPandas with SedonaDB for large-scale geospatial analysis.\n",
"\n",
"You'll learn how to:\n",
"\n",
"- Read common geospatial file formats like GeoJSON and FlatGeobuf into a GeoPandas GeoDataFrame\n",
"- Convert these data from these input formats into a SedonaDB DataFrame for large-scale analysis.\n",
"\n",
"Any file type that can be read by GeoPandas can also be read into a SedonaDB DataFrame!"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "0434bead-2628-4844-a3f6-2f9c15a21899",
"metadata": {},
"outputs": [],
"source": [
"import sedona.db\n",
"import geopandas as gpd\n",
"\n",
"sd = sedona.db.connect()"
]
},
{
"cell_type": "markdown",
"id": "618b5d1d-ac2b-4786-ae5b-0d10efd6a8d4",
"metadata": {},
"source": [
"### Read a GeoJSON file with GeoPandas"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "2691bd24-9b2d-4cf9-958d-4ef01d967cb3",
"metadata": {},
"outputs": [],
"source": [
"gdf = gpd.read_file(\"sample_geometries.json\")"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "cd367a73-acd3-41cf-b892-7d863c370d5f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>prop0</th>\n",
" <th>prop1</th>\n",
" <th>geometry</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>value0</td>\n",
" <td>None</td>\n",
" <td>POINT (102 0.5)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>value1</td>\n",
" <td>0.0</td>\n",
" <td>LINESTRING (102 0, 103 1, 104 0, 105 1)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>value2</td>\n",
" <td>{ \"this\": \"that\" }</td>\n",
" <td>POLYGON ((100 0, 101 0, 101 1, 100 1, 100 0))</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" prop0 prop1 geometry\n",
"0 value0 None POINT (102 0.5)\n",
"1 value1 0.0 LINESTRING (102 0, 103 1, 104 0, 105 1)\n",
"2 value2 { \"this\": \"that\" } POLYGON ((100 0, 101 0, 101 1, 100 1, 100 0))"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"gdf"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "454e08a3-de65-4151-9d29-5d5ee8cf31d3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'geopandas.geodataframe.GeoDataFrame'>\n",
"RangeIndex: 3 entries, 0 to 2\n",
"Data columns (total 3 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 prop0 3 non-null object \n",
" 1 prop1 2 non-null object \n",
" 2 geometry 3 non-null geometry\n",
"dtypes: geometry(1), object(2)\n",
"memory usage: 204.0+ bytes\n"
]
}
],
"source": [
"gdf.info()"
]
},
{
"cell_type": "markdown",
"id": "a5837268-1620-4b2b-bf37-cb6e282daedf",
"metadata": {},
"source": [
"### Convert the GeoPandas DataFrame to a SedonaDB DataFrame"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "385f6333-411d-4d1f-a09b-8816cccceabc",
"metadata": {},
"outputs": [],
"source": [
"df = sd.create_data_frame(gdf)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "186059ae-4cf8-48ec-878a-71e7a39ac07e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"┌────────┬────────────────────┬──────────────────────────────────────────┐\n",
"│ prop0 ┆ prop1 ┆ geometry │\n",
"│ utf8 ┆ utf8 ┆ geometry │\n",
"╞════════╪════════════════════╪══════════════════════════════════════════╡\n",
"│ value0 ┆ ┆ POINT(102 0.5) │\n",
"├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
"│ value1 ┆ 0.0 ┆ LINESTRING(102 0,103 1,104 0,105 1) │\n",
"├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
"│ value2 ┆ { \"this\": \"that\" } ┆ POLYGON((100 0,101 0,101 1,100 1,100 0)) │\n",
"└────────┴────────────────────┴──────────────────────────────────────────┘\n"
]
}
],
"source": [
"df.show()"
]
},
{
"cell_type": "markdown",
"id": "2f09cbbe-86b5-4eb4-b920-1b12f018d1a6",
"metadata": {},
"source": [
"## Read and Convert Data From a FlatGeobuf file\n",
"\n",
"This code demonstrates how to read a FlatGeobuf file with GeoPandas and then convert it to a SedonaDB DataFrame."
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "965ae9f3-293b-4e8e-92bf-1359a482bca3",
"metadata": {},
"outputs": [],
"source": [
"# Read a FlatGeobuf file with GeoPandas\n",
"path = \"https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/natural-earth/files/natural-earth_cities.fgb\"\n",
"gdf = gpd.read_file(path)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "372c937f-da36-4e4b-98da-347890318a80",
"metadata": {},
"outputs": [],
"source": [
"# Convert the GeoPandas DataFrame to a SedonaDB DataFrame\n",
"df = sd.create_data_frame(gdf)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "d99f4474-da3a-4834-9675-184a667b2a90",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"┌──────────────┬──────────────────────────────┐\n",
"│ name ┆ geometry │\n",
"│ utf8 ┆ geometry │\n",
"╞══════════════╪══════════════════════════════╡\n",
"│ Vatican City ┆ POINT(12.4533865 41.9032822) │\n",
"├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
"│ San Marino ┆ POINT(12.4417702 43.9360958) │\n",
"├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
"│ Vaduz ┆ POINT(9.5166695 47.1337238) │\n",
"└──────────────┴──────────────────────────────┘\n"
]
}
],
"source": [
"df.show(3)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv (3.13.3)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}