Apache Sedona extends pyspark functions which depends on libraries:
You need to install necessary packages if your system does not have them installed. Sedona now uses uv for Python dependency management. See the dependency definitions in our pyproject.toml.
pip install apache-sedona
spark extra:pip install apache-sedona[spark]
Clone Sedona GitHub source code and run the following command
cd python python3 -m pip install .
Sedona Python needs one additional jar file called sedona-spark-shaded or sedona-spark to work properly. Please make sure you use the correct version for Spark and Scala.
Please use Spark major.minor version number in artifact names.
You can get it using one of the following methods:
shaded jar: Download sedona-spark-shaded jar and geotools-wrapper jar from Maven Central, and put them in SPARK_HOME/jars/ folder.unshaded jar. Call the Maven Central coordinate in your python program. For example, ==Sedona >= 1.4.1==from sedona.spark import * config = ( SedonaContext.builder() .config( "spark.jars.packages", "org.apache.sedona:sedona-spark-3.3_2.12:{{ sedona.current_version }}," "org.datasyslab:geotools-wrapper:{{ sedona.current_geotools }}", ) .config( "spark.jars.repositories", "https://artifacts.unidata.ucar.edu/repository/unidata-all", ) .getOrCreate() ) sedona = SedonaContext.create(config)
==Sedona < 1.4.1==
SedonaRegistrator is deprecated in Sedona 1.4.1 and later versions. Please use the above method instead.
from pyspark.sql import SparkSession from sedona.spark import SedonaRegistrator from sedona.spark import SedonaKryoRegistrator, KryoSerializer spark = ( SparkSession.builder.appName("appName") .config("spark.serializer", KryoSerializer.getName) .config("spark.kryo.registrator", SedonaKryoRegistrator.getName) .config( "spark.jars.packages", "org.apache.sedona:sedona-spark-shaded-3.3_2.12:{{ sedona.current_version }}," "org.datasyslab:geotools-wrapper:{{ sedona.current_geotools }}", ) .getOrCreate() ) SedonaRegistrator.registerAll(spark)
If you manually copy the sedona-spark-shaded jar to SPARK_HOME/jars/ folder, you need to setup two environment variables
export SPARK_HOME=~/Downloads/spark-3.0.1-bin-hadoop2.7
export PYTHONPATH=$SPARK_HOME/python
You can then play with Sedona Python Jupyter notebook.