| { |
| "cells": [ |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "```\n", |
| "Licensed to the Apache Software Foundation (ASF) under one\n", |
| "or more contributor license agreements. See the NOTICE file\n", |
| "distributed with this work for additional information\n", |
| "regarding copyright ownership. The ASF licenses this file\n", |
| "to you under the Apache License, Version 2.0 (the\n", |
| "\"License\"); you may not use this file except in compliance\n", |
| "with the License. You may obtain a copy of the License at\n", |
| " http://www.apache.org/licenses/LICENSE-2.0\n", |
| "Unless required by applicable law or agreed to in writing,\n", |
| "software distributed under the License is distributed on an\n", |
| "\"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n", |
| "KIND, either express or implied. See the License for the\n", |
| "specific language governing permissions and limitations\n", |
| "under the License.\n", |
| "```" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "import os\n", |
| "\n", |
| "import geopandas as gpd\n", |
| "from pyspark.sql import SparkSession\n", |
| "from pyspark.sql.functions import col, expr, when, explode, hex\n", |
| "\n", |
| "\n", |
| "from sedona.spark import *\n", |
| "from utilities import getConfig" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "## Setup Sedona environment" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "config = (\n", |
| " SedonaContext.builder()\n", |
| " .config(\n", |
| " \"spark.jars.packages\",\n", |
| " \"uk.co.gresearch.spark:spark-extension_2.12:2.11.0-3.4\",\n", |
| " )\n", |
| " .master(\"spark://localhost:7077\")\n", |
| " .getOrCreate()\n", |
| ")\n", |
| "\n", |
| "sedona = SedonaContext.create(config)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "## Read countries shapefile into a Sedona DataFrame \n", |
| "Data link: https://www.naturalearthdata.com/downloads/50m-cultural-vectors/" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "countries_df = sedona.read.format(\"shapefile\").load(\n", |
| " \"data/ne_50m_admin_0_countries_lakes\"\n", |
| ")\n", |
| "countries_df.createOrReplaceTempView(\"country\")\n", |
| "countries_df.printSchema()" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "## Read airports shapefile into a Sedona DataFrame \n", |
| "Data link: https://www.naturalearthdata.com/downloads/50m-cultural-vectors/" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "airports_df = sedona.read.format(\"shapefile\").load(\"data/ne_50m_airports/\")\n", |
| "airports_df.createOrReplaceTempView(\"airport\")\n", |
| "airports_df.printSchema()" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "\n", |
| "\n", |
| "## Run Spatial Join using SQL API" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "result = sedona.sql(\n", |
| " \"SELECT c.geometry as country_geom, c.NAME_EN, a.geometry as airport_geom, a.name FROM country c, airport a WHERE ST_Contains(c.geometry, a.geometry)\"\n", |
| ")" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "## Run Spatial Join using RDD API" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "from sedona.spark import StructuredAdapter\n", |
| "\n", |
| "airports_rdd = StructuredAdapter.toSpatialRdd(airports_df, \"geometry\")\n", |
| "# Drop the duplicate name column in countries_df\n", |
| "countries_df = countries_df.drop(\"NAME\")\n", |
| "countries_rdd = StructuredAdapter.toSpatialRdd(countries_df, \"geometry\")\n", |
| "\n", |
| "airports_rdd.analyze()\n", |
| "countries_rdd.analyze()\n", |
| "\n", |
| "# 4 is the num partitions used in spatial partitioning. This is an optional parameter\n", |
| "airports_rdd.spatialPartitioning(GridType.KDBTREE, 4)\n", |
| "countries_rdd.spatialPartitioning(airports_rdd.getPartitioner())\n", |
| "\n", |
| "buildOnSpatialPartitionedRDD = True\n", |
| "usingIndex = True\n", |
| "considerBoundaryIntersection = True\n", |
| "airports_rdd.buildIndex(IndexType.QUADTREE, buildOnSpatialPartitionedRDD)\n", |
| "\n", |
| "result_pair_rdd = JoinQueryRaw.SpatialJoinQueryFlat(\n", |
| " airports_rdd, countries_rdd, usingIndex, considerBoundaryIntersection\n", |
| ")\n", |
| "\n", |
| "result2 = StructuredAdapter.pairRddToDf(\n", |
| " result_pair_rdd, countries_df.schema, airports_df.schema, sedona\n", |
| ")\n", |
| "\n", |
| "result2.createOrReplaceTempView(\"join_result_with_all_cols\")\n", |
| "# Select the columns needed in the join\n", |
| "result2 = sedona.sql(\"SELECT * FROM join_result_with_all_cols\")" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "## Print spatial join results" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "# The result of SQL API\n", |
| "result.show()\n", |
| "# The result of RDD API\n", |
| "result2.show()" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "## Group airports by country" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": { |
| "scrolled": true |
| }, |
| "outputs": [], |
| "source": [ |
| "result.createOrReplaceTempView(\"result\")\n", |
| "groupedresult = sedona.sql(\n", |
| " \"SELECT c.NAME_EN, c.country_geom, count(*) as AirportCount FROM result c GROUP BY c.NAME_EN, c.country_geom\"\n", |
| ")\n", |
| "groupedresult.show()\n", |
| "groupedresult.createOrReplaceTempView(\"grouped_result\")" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "## Visualize the number of airports in each country" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "### Visualize using SedonaKepler" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "sedona_kepler_map = SedonaKepler.create_map(\n", |
| " df=groupedresult, name=\"AirportCount\", config=getConfig()\n", |
| ")\n", |
| "sedona_kepler_map" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "### Visualize using SedonaPyDeck\n", |
| "The above visualization is generated by a pre-set config informing SedonaKepler that the map to be rendered has to be a choropleth map with choropleth of the `AirportCount` column value.\n", |
| "\n", |
| "This can be also be achieved using [SedonaPyDeck](https://sedona.apache.org/1.5.0/tutorial/sql/#sedonapydeck) and its `create_choropleth_map` API." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": { |
| "tags": [] |
| }, |
| "outputs": [], |
| "source": [ |
| "sedona_pydeck_map = SedonaPyDeck.create_choropleth_map(\n", |
| " df=groupedresult, plot_col=\"AirportCount\"\n", |
| ")\n", |
| "sedona_pydeck_map" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "## Visualize Uber H3 cells using SedonaKepler\n", |
| "The following tutorial depicts how Uber H3 cells can be generated using Sedona and visualized using SedonaKepler." |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "### Generate H3 cell IDs\n", |
| "[ST_H3CellIDs](https://sedona.apache.org/1.5.0/api/flink/Function/#st_h3cellids) can be used to generated cell IDs for given geometries" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": { |
| "tags": [] |
| }, |
| "outputs": [], |
| "source": [ |
| "h3_df = sedona.sql(\n", |
| " \"SELECT g.NAME_EN, g.country_geom, ST_H3CellIDs(g.country_geom, 3, false) as h3_cellID from grouped_result g\"\n", |
| ")\n", |
| "h3_df.show(2)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "### Since each geometry can have multiple H3 cell IDs, let's explode the generated H3 cell ID array to get individual cells" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": { |
| "tags": [] |
| }, |
| "outputs": [], |
| "source": [ |
| "exploded_h3 = h3_df.select(\n", |
| " h3_df.NAME_EN, h3_df.country_geom, explode(h3_df.h3_cellID).alias(\"h3\")\n", |
| ")\n", |
| "exploded_h3.show(2)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "### Convert generated long H3 cell ID to a hex cell ID\n", |
| "SedonaKepler accepts each H3 cell ID as a hexadecimal to automatically visualize them. Also, let us sample the data to be able to visualize sparse cells on the map." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": { |
| "tags": [] |
| }, |
| "outputs": [], |
| "source": [ |
| "exploded_h3 = exploded_h3.sample(0.3)\n", |
| "exploded_h3.createOrReplaceTempView(\"exploded_h3\")\n", |
| "hex_exploded_h3 = exploded_h3.select(\n", |
| " exploded_h3.NAME_EN, hex(exploded_h3.h3).alias(\"ex_h3\")\n", |
| ")\n", |
| "hex_exploded_h3.show(2)\n", |
| "hex_exploded_h3.printSchema()" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "### Visualize using SedonaKepler\n", |
| "Now, simply provide the final df to SedonaKepler.create_map and you can automagically visualize the H3 cells on the map!" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": { |
| "tags": [] |
| }, |
| "outputs": [], |
| "source": [ |
| "sedona_kepler_h3 = SedonaKepler.create_map(df=hex_exploded_h3, name=\"h3\")\n", |
| "sedona_kepler_h3" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": {}, |
| "outputs": [], |
| "source": [] |
| } |
| ], |
| "metadata": { |
| "kernelspec": { |
| "display_name": "Python 3 (ipykernel)", |
| "language": "python", |
| "name": "python3" |
| }, |
| "language_info": { |
| "codemirror_mode": { |
| "name": "ipython", |
| "version": 3 |
| }, |
| "file_extension": ".py", |
| "mimetype": "text/x-python", |
| "name": "python", |
| "nbconvert_exporter": "python", |
| "pygments_lexer": "ipython3", |
| "version": "3.10.12" |
| } |
| }, |
| "nbformat": 4, |
| "nbformat_minor": 4 |
| } |