This page shows how to read GeoPackage files with Apache Sedona and Spark.
You’ll learn about the advantages and disadvantages of the GeoPackage file format and how to use them in production settings.
Let’s start by creating a GeoPackage file and then reading it.
Let’s create a GeoPackage file with a few rows of data.
Start by creating a GeoPandas DataFrame:
point1 = Point(0, 0) point2 = Point(1, 1) polygon1 = Polygon([(5, 5), (6, 6), (7, 5), (6, 4)]) data = { "name": ["Point A", "Point B", "Polygon A"], "value": [10, 20, 30], "geometry": [point1, point2, polygon1], } gdf = gpd.GeoDataFrame(data, geometry="geometry")
Now write the GeoPandas DataFrame to a GeoPackage file:
gdf.to_file("/tmp/my_file.gpkg", layer="my_layer", driver="GPKG")
GeoPandas knows to write this to a GeoPackage file because the code sets the driver to GPKG.
You can think of the layer as the table name.
Now let’s read the GeoPackage file Apache Sedona and Spark:
df = ( sedona.read.format("geopackage") .option("tableName", "my_layer") .load("/tmp/my_file.gpkg") ) df.show()
Here are the contents of the DataFrame:
+---+--------------------+---------+-----+ |fid| geom| name|value| +---+--------------------+---------+-----+ | 1| POINT (0 0)| Point A| 10| | 2| POINT (1 1)| Point B| 20| | 3|POLYGON ((5 5, 6 ...|Polygon A| 30| +---+--------------------+---------+-----+
The geometry column can contain many different geometric objects like points, polygons, and many more.
You can also see the metadata of the GeoPackage file:
df = ( sedona.read.format("geopackage") .option("showMetadata", "true") .load("/tmp/my_file.gpkg") ) df.show()
Here are the contents:
+----------+---------+----------+-----------+--------------------+-----+-----+-----+-----+------+ |table_name|data_type|identifier|description| last_change|min_x|min_y|max_x|max_y|srs_id| +----------+---------+----------+-----------+--------------------+-----+-----+-----+-----+------+ | my_layer| features| my_layer| |2025-02-25 06:28:...| 0.0| 0.0| 7.0| 6.0| 99999| +----------+---------+----------+-----------+--------------------+-----+-----+-----+-----+------+
You can also read many GeoPackage files with Sedona. Suppose you have the following GeoPackage files:
gpkgs/ my_file1.gpkg my_file2.gpkg
Here’s how you can read all the files:
df = sedona.read.format("geopackage").option("tableName", "my_layer").load("/tmp/gpkgs") df.show()
Here are the results:
+---+--------------------+---------+-----+ |fid| geom| name|value| +---+--------------------+---------+-----+ | 1| POINT (5 5)| Point C| 30| | 2|POLYGON ((5 5, 6 ...|Polygon A| 40| | 1| POINT (0 0)| Point A| 10| | 2| POINT (1 1)| Point B| 20| +---+--------------------+---------+-----+
You just need to supply the directory containing the GeoPackage files, and Sedona can read all of them into a DataFrame.
Sedona is an excellent option for analyzing many GeoPackage files because it can read and process them in parallel.
You can also load data from raster tables in the GeoPackage file. To load raster data, you can use the following code.
df = ( sedona.read.format("geopackage") .option("tableName", "raster_table") .load("/path/to/geopackage") )
Here are the contents of the DataFrame:
+---+----------+-----------+--------+--------------------+ | id|zoom_level|tile_column|tile_row| tile_data| +---+----------+-----------+--------+--------------------+ | 1| 11| 428| 778|GridCoverage2D["c...| | 2| 11| 429| 778|GridCoverage2D["c...| | 3| 11| 428| 779|GridCoverage2D["c...| | 4| 11| 429| 779|GridCoverage2D["c...| | 5| 11| 427| 777|GridCoverage2D["c...| +---+----------+-----------+--------+--------------------+
Known limitations (v1.7.0):
All points above should be resolved soon; stay tuned!
The GeoPackage file format has many advantages:
However, the GeoPackage format also has many downsides.
The GeoPackage file format has the following disadvantages:
GeoPackage is a solid file format if you’re using SQLite.
It’s excellent that Sedona can read GeoPackage files created by SQLite analyses. This allows you to read GeoPackage files in parallel and analyze massive datasets. You can also run Sedona on a cluster.
If you don’t already use GeoPackage, you should probably use file formats like GeoParquet or Iceberg.