| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| |
| !!!note |
| Sedona uses 1-based indexing for all raster functions except [map algebra function](../api/sql/Raster-map-algebra.md), which uses 0-based indexing. |
| |
| !!!note |
| Sedona assumes geographic coordinates to be in longitude/latitude order. If your data is lat/lon order, please use `ST_FlipCoordinates` to swap X and Y. |
| |
| Starting from `v1.1.0`, Sedona SQL supports raster data sources and raster operators in DataFrame and SQL. Raster support is available in all Sedona language bindings including ==Scala, Java, Python, and R==. |
| |
| This page outlines the steps to manage raster data using SedonaSQL. |
| |
| === "Scala" |
| |
| ```scala |
| var myDataFrame = sedona.sql("YOUR_SQL") |
| myDataFrame.createOrReplaceTempView("rasterDf") |
| ``` |
| |
| === "Java" |
| |
| ```java |
| Dataset<Row> myDataFrame = sedona.sql("YOUR_SQL") |
| myDataFrame.createOrReplaceTempView("rasterDf") |
| ``` |
| |
| === "Python" |
| |
| ```python |
| myDataFrame = sedona.sql("YOUR_SQL") |
| myDataFrame.createOrReplaceTempView("rasterDf") |
| ``` |
| |
| Detailed SedonaSQL APIs are available here: [SedonaSQL API](../api/sql/Overview.md). You can find example raster data in [Sedona GitHub repo](https://github.com/apache/sedona/blob/0eae42576c2588fe278f75cef3b17fee600eac90/spark/common/src/test/resources/raster/raster_with_no_data/test5.tiff). |
| |
| ## Set up dependencies |
| |
| === "Scala/Java" |
| |
| 1. Read [Sedona Maven Central coordinates](../setup/maven-coordinates.md) and add Sedona dependencies in build.sbt or pom.xml. |
| 2. Add [Apache Spark core](https://mvnrepository.com/artifact/org.apache.spark/spark-core), [Apache SparkSQL](https://mvnrepository.com/artifact/org.apache.spark/spark-sql) in build.sbt or pom.xml. |
| 3. Please see [SQL example project](demo.md) |
| |
| === "Python" |
| |
| 1. Please read [Quick start](../setup/install-python.md) to install Sedona Python. |
| 2. This tutorial is based on [Sedona SQL Jupyter Notebook example](jupyter-notebook.md). |
| |
| ## Create Sedona config |
| |
| Use the following code to create your Sedona config at the beginning. If you already have a SparkSession (usually named `spark`) created by Wherobots/AWS EMR/Databricks, please skip this step and use `spark` directly. |
| |
| You can add additional Spark runtime config to the config builder. For example, `SedonaContext.builder().config("spark.sql.autoBroadcastJoinThreshold", "10485760")` |
| |
| === "Scala" |
| |
| ```scala |
| import org.apache.sedona.spark.SedonaContext |
| |
| val config = SedonaContext.builder() |
| .master("local[*]") // Delete this if run in cluster mode |
| .appName("readTestScala") // Change this to a proper name |
| .getOrCreate() |
| ``` |
| If you use SedonaViz together with SedonaSQL, please add the following line after `SedonaContext.builder()` to enable Sedona Kryo serializer: |
| ```scala |
| .config("spark.kryo.registrator", classOf[SedonaVizKryoRegistrator].getName) // org.apache.sedona.viz.core.Serde.SedonaVizKryoRegistrator |
| ``` |
| |
| === "Java" |
| |
| ```java |
| import org.apache.sedona.spark.SedonaContext; |
| |
| SparkSession config = SedonaContext.builder() |
| .master("local[*]") // Delete this if run in cluster mode |
| .appName("readTestScala") // Change this to a proper name |
| .getOrCreate() |
| ``` |
| If you use SedonaViz together with SedonaSQL, please add the following line after `SedonaContext.builder()` to enable Sedona Kryo serializer: |
| ```scala |
| .config("spark.kryo.registrator", SedonaVizKryoRegistrator.class.getName) // org.apache.sedona.viz.core.Serde.SedonaVizKryoRegistrator |
| ``` |
| |
| === "Python" |
| |
| ```python |
| from sedona.spark import * |
| |
| config = SedonaContext.builder() .\ |
| config('spark.jars.packages', |
| 'org.apache.sedona:sedona-spark-shaded-3.3_2.12:{{ sedona.current_version }},' |
| 'org.datasyslab:geotools-wrapper:{{ sedona.current_geotools }}'). \ |
| getOrCreate() |
| ``` |
| Please replace the `3.3` in the package name of sedona-spark-shaded with the corresponding major.minor version of Spark, such as `sedona-spark-shaded-3.4_2.12:{{ sedona.current_version }}`. |
| |
| ## Initiate SedonaContext |
| |
| Add the following line after creating the Sedona config. If you already have a SparkSession (usually named `spark`) created by Wherobots/AWS EMR/Databricks, please call `SedonaContext.create(spark)` instead. |
| |
| === "Scala" |
| |
| ```scala |
| import org.apache.sedona.spark.SedonaContext |
| |
| val sedona = SedonaContext.create(config) |
| ``` |
| |
| === "Java" |
| |
| ```java |
| import org.apache.sedona.spark.SedonaContext; |
| |
| SparkSession sedona = SedonaContext.create(config) |
| ``` |
| |
| === "Python" |
| |
| ```python |
| from sedona.spark import * |
| |
| sedona = SedonaContext.create(config) |
| ``` |
| |
| You can also register everything by passing `--conf spark.sql.extensions=org.apache.sedona.sql.SedonaSqlExtensions` to `spark-submit` or `spark-shell`. |
| |
| ## Load data from files |
| |
| Assume we have a single raster data file called rasterData.tiff, [at Path](https://github.com/apache/sedona/blob/0eae42576c2588fe278f75cef3b17fee600eac90/spark/common/src/test/resources/raster/raster_with_no_data/test5.tiff). |
| |
| Use the following code to load the data and create a raw Dataframe. |
| |
| === "Scala" |
| ```scala |
| var rawDf = sedona.read.format("binaryFile").load(path_to_raster_data) |
| rawDf.createOrReplaceTempView("rawdf") |
| rawDf.show() |
| ``` |
| |
| === "Java" |
| ```java |
| Dataset<Row> rawDf = sedona.read.format("binaryFile").load(path_to_raster_data) |
| rawDf.createOrReplaceTempView("rawdf") |
| rawDf.show() |
| ``` |
| |
| === "Python" |
| ```python |
| rawDf = sedona.read.format("binaryFile").load(path_to_raster_data) |
| rawDf.createOrReplaceTempView("rawdf") |
| rawDf.show() |
| ``` |
| |
| The output will look like this: |
| |
| ``` |
| | path| modificationTime|length| content| |
| +--------------------+--------------------+------+--------------------+ |
| |file:/Download/ra...|2023-09-06 16:24:...|174803|[49 49 2A 00 08 0...| |
| ``` |
| |
| For multiple raster data files use the following code to load the data [from path](https://github.com/apache/sedona/blob/0eae42576c2588fe278f75cef3b17fee600eac90/spark/common/src/test/resources/raster/) and create raw DataFrame. |
| |
| !!!note |
| The above code works too for loading multiple raster data files. If the raster files are in separate directories and the option also makes sure that only `.tif` or `.tiff` files are being loaded. |
| |
| === "Scala" |
| ```scala |
| var rawDf = sedona.read.format("binaryFile").option("recursiveFileLookup", "true").option("pathGlobFilter", "*.tif*").load(path_to_raster_data_folder) |
| rawDf.createOrReplaceTempView("rawdf") |
| rawDf.show() |
| ``` |
| |
| === "Java" |
| ```java |
| Dataset<Row> rawDf = sedona.read.format("binaryFile").option("recursiveFileLookup", "true").option("pathGlobFilter", "*.tif*").load(path_to_raster_data_folder); |
| rawDf.createOrReplaceTempView("rawdf"); |
| rawDf.show(); |
| ``` |
| |
| === "Python" |
| ```python |
| rawDf = ( |
| sedona.read.format("binaryFile") |
| .option("recursiveFileLookup", "true") |
| .option("pathGlobFilter", "*.tif*") |
| .load(path_to_raster_data_folder) |
| ) |
| rawDf.createOrReplaceTempView("rawdf") |
| rawDf.show() |
| ``` |
| |
| The output will look like this: |
| |
| ``` |
| | path| modificationTime|length| content| |
| +--------------------+--------------------+------+--------------------+ |
| |file:/Download/ra...|2023-09-06 16:24:...|209199|[4D 4D 00 2A 00 0...| |
| |file:/Download/ra...|2023-09-06 16:24:...|174803|[49 49 2A 00 08 0...| |
| |file:/Download/ra...|2023-09-06 16:24:...|174803|[49 49 2A 00 08 0...| |
| |file:/Download/ra...|2023-09-06 16:24:...| 6619|[49 49 2A 00 08 0...| |
| ``` |
| |
| The content column in the raster table is still in the raw form, binary form. |
| |
| ## Create a Raster type column |
| |
| All raster operations in SedonaSQL require Raster type objects. Therefore, this should be the next step after loading the data. |
| |
| ### From Geotiff |
| |
| ```sql |
| SELECT RS_FromGeoTiff(content) AS rast, modificationTime, length, path FROM rawdf |
| ``` |
| |
| To verify this, use the following code to print the schema of the DataFrame: |
| |
| ```sql |
| rasterDf.printSchema() |
| ``` |
| |
| The output will be like this: |
| |
| ``` |
| root |
| |-- rast: raster (nullable = true) |
| |-- modificationTime: timestamp (nullable = true) |
| |-- length: long (nullable = true) |
| |-- path: string (nullable = true) |
| ``` |
| |
| ### From Arc Grid |
| |
| The raster data is loaded the same way as `tiff` file, but the raster data is stored with the extension `.asc`, ASCII format. The following code creates a Raster type objects from binary data: |
| |
| ```sql |
| SELECT RS_FromArcInfoAsciiGrid(content) AS rast, modificationTime, length, path FROM rawdf |
| ``` |
| |
| ## Raster's metadata |
| |
| Sedona has a function to get the metadata for the raster, and also a function to get the world file of the raster. |
| |
| ### Metadata |
| |
| This function will return an array of metadata, it will have all the necessary information about the raster, Please refer to [RS_MetaData](../api/sql/Raster-operators.md#rs_metadata). |
| |
| ```sql |
| SELECT RS_MetaData(rast) FROM rasterDf |
| ``` |
| |
| Output for the following function will be: |
| |
| ``` |
| [-1.3095817809482181E7, 4021262.7487925636, 512.0, 517.0, 72.32861272132695, -72.32861272132695, 0.0, 0.0, 3857.0, 1.0] |
| ``` |
| |
| The first two elements of the array represent the real-world geographic coordinates (like longitude/latitude) of the raster image's top left pixel, while the next two elements represent the pixel dimensions of the raster. |
| |
| ### World File |
| |
| There are two kinds of georeferences, GDAL and ESRI seen in [world files](https://en.wikipedia.org/wiki/World_file). For more information please refer to [RS_GeoReference](../api/sql/Raster-operators.md#rs_georeference). |
| |
| ```sql |
| SELECT RS_GeoReference(rast, "ESRI") FROM rasterDf |
| ``` |
| |
| The Output will be as follows: |
| |
| ``` |
| 72.328613 |
| 0.000000 |
| 0.000000 |
| -72.328613 |
| -13095781.645176 |
| 4021226.584486 |
| ``` |
| |
| World files are used to georeference and geolocate images by establishing an image-to-world coordinate transformation that assigns real-world geographic coordinates to the pixels of the image. |
| |
| ## Raster Manipulation |
| |
| Since `v1.5.0` there have been many additions to manipulate raster data, we will show you a few example queries. |
| |
| !!!note |
| Read [SedonaSQL Raster operators](../api/sql/Raster-operators.md) to learn how you can use Sedona for raster manipulation. |
| |
| ### Coordinate translation |
| |
| Sedona allows you to translate coordinates as per your needs. It can translate pixel locations to world coordinates and vice versa. |
| |
| #### PixelAsPoint |
| |
| Use [RS_PixelAsPoint](../api/sql/Raster-operators.md#rs_pixelaspoint) to translate pixel coordinates to world location. |
| |
| ```sql |
| SELECT RS_PixelAsPoint(rast, 450, 400) FROM rasterDf |
| ``` |
| |
| Output: |
| |
| ``` |
| POINT (-13063342 3992403.75) |
| ``` |
| |
| #### World to Raster Coordinate |
| |
| Use [RS_WorldToRasterCoord](../api/sql/Raster-operators.md#rs_worldtorastercoord) to translate world location to pixel coordinates. To just get X coordinate use [RS_WorldToRasterCoordX](../api/sql/Raster-operators.md#rs_worldtorastercoordx) and for just Y coordinate use [RS_WorldToRasterCoordY](../api/sql/Raster-operators.md#rs_worldtorastercoordy). |
| |
| ```sql |
| SELECT RS_WorldToRasterCoord(rast, -1.3063342E7, 3992403.75) |
| ``` |
| |
| Output: |
| |
| ``` |
| POINT (450 400) |
| ``` |
| |
| ### Pixel Manipulation |
| |
| Use [RS_Values](../api/sql/Raster-operators.md#rs_values) to fetch values for a specified array of Point Geometries. The coordinates in the point geometry are indicative of real-world location. |
| |
| ```sql |
| SELECT RS_Values(rast, Array(ST_Point(-13063342, 3992403.75), ST_Point(-13074192, 3996020))) |
| ``` |
| |
| Output: |
| |
| ``` |
| [132.0, 148.0] |
| ``` |
| |
| To change values over a grid or area defined by geometry, we will use [RS_SetValues](../api/sql/Raster-operators.md#rs_setvalues). |
| |
| ```sql |
| SELECT RS_SetValues( |
| rast, 1, 250, 260, 3, 3, |
| Array(10, 12, 17, 26, 28, 37, 43, 64, 66) |
| ) |
| ``` |
| |
| Follow the links to get more information on how to use the functions appropriately. |
| |
| ### Band Manipulation |
| |
| Sedona provides APIs to select specific bands from a raster image and create a new raster. For example, to select 2 bands from a raster, you can use the [RS_Band](../api/sql/Raster-operators.md#rs_band) API to retrieve the desired multi-band raster. |
| |
| Let's use a [multi-band raster](https://github.com/apache/sedona/blob/2a0b36989aa895c0781f9a10c907dd726506d0b7/spark/common/src/test/resources/raster_geotiff_color/FAA_UTM18N_NAD83.tif) for this example. The process of loading and converting it to raster type is the same. |
| |
| ```sql |
| SELECT RS_Band(colorRaster, Array(1, 2)) |
| ``` |
| |
| Let's say you have many single-banded rasters and want to add a band to the raster to perform [map algebra operations](#execute-map-algebra-operations). You can do so using [RS_AddBand](../api/sql/Raster-operators.md#rs_addband) Sedona function. |
| |
| ```sql |
| SELECT RS_AddBand(raster1, raster2, 1, 2) |
| ``` |
| |
| This will result in `raster1` having `raster2`'s specified band. |
| |
| ### Resample raster data |
| |
| Sedona allows you to resample raster data using different interpolation methods like the nearest neighbor, bilinear, and bicubic to change the cell size or align raster grids, using [RS_Resample](../api/sql/Raster-operators.md#rs_resample). |
| |
| ```sql |
| SELECT RS_Resample(rast, 50, -50, -13063342, 3992403.75, true, "bicubic") |
| ``` |
| |
| For more information please follow the link. |
| |
| ## Execute map algebra operations |
| |
| Map algebra is a way to perform raster calculations using mathematical expressions. The expression can be a simple arithmetic operation or a complex combination of multiple operations. |
| |
| The Normalized Difference Vegetation Index (NDVI) is a simple graphical indicator that can be used to analyze remote sensing measurements from a space platform and assess whether the target being observed contains live green vegetation or not. |
| |
| ``` |
| NDVI = (NIR - Red) / (NIR + Red) |
| ``` |
| |
| where NIR is the near-infrared band and Red is the red band. |
| |
| ```sql |
| SELECT RS_MapAlgebra(raster, 'D', 'out = (rast[3] - rast[0]) / (rast[3] + rast[0]);') as ndvi FROM raster_table |
| ``` |
| |
| For more information please refer to [Map Algebra API](../api/sql/Raster-map-algebra.md). |
| |
| ## Interoperability between raster and vector data |
| |
| ### Geometry As Raster |
| |
| Sedona allows you to rasterize a geometry by using [RS_AsRaster](../api/sql/Raster-writer.md#rs_asraster). |
| |
| ```sql |
| SELECT RS_AsRaster( |
| ST_GeomFromWKT('POLYGON((150 150, 220 260, 190 300, 300 220, 150 150))'), |
| RS_MakeEmptyRaster(1, 'b', 4, 6, 1, -1, 1), |
| 'b', 230 |
| ) |
| ``` |
| |
| The image created is as below for the vector: |
| |
|  |
| |
| !!!note |
| The vector coordinates are buffed up to showcase the output, the real use case, may or may not match the example. |
| |
| ### Spatial range query |
| |
| Sedona provides raster predicates to do a range query using a geometry window, for example, let's use [RS_Intersects](../api/sql/Raster-operators.md#rs_intersects). |
| |
| ```sql |
| SELECT rast FROM rasterDf WHERE RS_Intersect(rast, ST_GeomFromWKT('POLYGON((0 0, 0 10, 10 10, 10 0, 0 0))')) |
| ``` |
| |
| ### Spatial join query |
| |
| Sedona's raster predicates also can do a spatial join using the raster column and geometry column, using the same function as above. |
| |
| ```sql |
| SELECT r.rast, g.geom FROM rasterDf r, geomDf g WHERE RS_Interest(r.rast, g.geom) |
| ``` |
| |
| !!!note |
| These range and join queries will filter rasters using the provided geometric boundary and the spatial boundary of the raster. |
| |
| Sedona offers more raster predicates to do spatial range queries and spatial join queries. Please refer to [raster predicates docs](../api/sql/Raster-operators.md#raster-predicates). |
| |
| ## Visualize raster images |
| |
| Sedona provides APIs to visualize raster data in an image form. |
| |
| ### Base64 String |
| |
| The [RS_AsBase64](../api/sql/Raster-visualizer.md#rs_asbase64) encodes the raster data as a Base64 string and can be visualized using [online decoder](https://base64-viewer.onrender.com/). |
| |
| ```sql |
| SELECT RS_AsBase64(rast) FROM rasterDf |
| ``` |
| |
| ### HTML Image |
| |
| The [RS_AsImage](../api/sql/Raster-visualizer.md#rs_asimage) returns an HTML image tag, that can be visualized using an HTML viewer or in Jupyter Notebook. For more information please click on the link. |
| |
| ```sql |
| SELECT RS_AsImage(rast, 500) FROM rasterDf |
| ``` |
| |
| The output looks like this: |
| |
|  |
| |
| ### 2-D Matrix |
| |
| Sedona offers an API to visualize raster data that is not sufficient for the other APIs mentioned above. |
| |
| ```sql |
| SELECT RS_AsMatrix(rast) FROM rasterDf |
| ``` |
| |
| Output will be as follows: |
| |
| ```sql |
| | 1 3 4 0| |
| | 2 9 10 11| |
| | 3 4 5 6| |
| ``` |
| |
| Please refer to [Raster visualizer docs](../api/sql/Raster-visualizer.md) to learn how to make the most of the visualizing APIs. |
| |
| ## Save to permanent storage |
| |
| Sedona has APIs that can save an entire raster column to files in a specified location. Before saving, the raster type column needs to be converted to a binary format. Sedona provides several functions to convert a raster column into a binary column suitable for file storage. Once in binary format, the raster data can then be written to files on disk using the Sedona file storage APIs. |
| |
| ```sparksql |
| rasterDf.write.format("raster").option("rasterField", "raster").option("fileExtension", ".tiff").mode(SaveMode.Overwrite).save(dirPath) |
| ``` |
| |
| Sedona has a few writer functions that create the binary DataFrame necessary for saving the raster images. |
| |
| ### As Arc Grid |
| |
| Use [RS_AsArcGrid](../api/sql/Raster-writer.md#rs_asarcgrid) to get the binary Dataframe of the raster in Arc Grid format. |
| |
| ```sql |
| SELECT RS_AsArcGrid(raster) |
| ``` |
| |
| ### As GeoTiff |
| |
| Use [RS_AsGeoTiff](../api/sql/Raster-writer.md#rs_asgeotiff) to get the binary Dataframe of the raster in GeoTiff format. |
| |
| ```sql |
| SELECT RS_AsGeoTiff(raster) |
| ``` |
| |
| ### As PNG |
| |
| Use [RS_AsPNG](../api/sql/Raster-writer.md#rs_aspng) to get the binary Dataframe of the raster in PNG format. |
| |
| ```sql |
| SELECT RS_AsPNG(raster) |
| ``` |
| |
| Please refer to [Raster writer docs](../api/sql/Raster-writer.md) for more details. |
| |
| ## Collecting raster Dataframes and working with them locally in Python |
| |
| Sedona allows collecting Dataframes with raster columns and working with them locally in Python since `v1.6.0`. |
| The raster objects are represented as `SedonaRaster` objects in Python, which can be used to perform raster operations. |
| |
| ```python |
| df_raster = ( |
| sedona.read.format("binaryFile") |
| .load("/path/to/raster.tif") |
| .selectExpr("RS_FromGeoTiff(content) as rast") |
| ) |
| rows = df_raster.collect() |
| raster = rows[0].rast |
| raster # <sedona.raster.sedona_raster.InDbSedonaRaster at 0x1618fb1f0> |
| ``` |
| |
| You can retrieve the metadata of the raster by accessing the properties of the `SedonaRaster` object. |
| |
| ```python |
| raster.width # width of the raster |
| raster.height # height of the raster |
| raster.affine_trans # affine transformation matrix |
| raster.crs_wkt # coordinate reference system as WKT |
| ``` |
| |
| You can get a numpy array containing the band data of the raster using the `as_numpy` or `as_numpy_masked` method. The |
| band data is organized in CHW order. |
| |
| ```python |
| raster.as_numpy() # numpy array of the raster |
| raster.as_numpy_masked() # numpy array with nodata values masked as nan |
| ``` |
| |
| If you want to work with the raster data using `rasterio`, you can retrieve a `rasterio.DatasetReader` object using the |
| `as_rasterio` method. |
| |
| !!!note |
| You need to have the `rasterio` package installed (version >= 1.2.10) to use this method. You can install it using `pip install rasterio`. |
| |
| ```python |
| ds = raster.as_rasterio() # rasterio.DatasetReader object |
| # Work with the raster using rasterio |
| band1 = ds.read(1) # read the first band |
| ``` |
| |
| ## Writing Python UDF to work with raster data |
| |
| You can write Python UDFs to work with raster data in Python. The UDFs can take `SedonaRaster` objects as input and |
| return any Spark data type as output. This is an example of a Python UDF that calculates the mean of the raster data. |
| |
| ```python |
| from pyspark.sql.types import DoubleType |
| |
| |
| def mean_udf(raster): |
| return float(raster.as_numpy().mean()) |
| |
| |
| sedona.udf.register("mean_udf", mean_udf, DoubleType()) |
| df_raster.withColumn("mean", expr("mean_udf(rast)")).show() |
| ``` |
| |
| ``` |
| +--------------------+------------------+ |
| | rast| mean| |
| +--------------------+------------------+ |
| |GridCoverage2D["g...|1542.8092886117788| |
| +--------------------+------------------+ |
| ``` |
| |
| It is much trickier to write an UDF that returns a raster object, since Sedona does not support serializing Python raster |
| objects yet. However, you can write a UDF that returns the band data as an array and then construct the raster object using |
| `RS_MakeRaster`. This is an example of a Python UDF that creates a mask raster based on the first band of the input raster. |
| |
| ```python |
| from pyspark.sql.types import ArrayType, DoubleType |
| import numpy as np |
| |
| |
| def mask_udf(raster): |
| band1 = raster.as_numpy()[0, :, :] |
| mask = (band1 < 1400).astype(np.float64) |
| return mask.flatten().tolist() |
| |
| |
| sedona.udf.register("mask_udf", band_udf, ArrayType(DoubleType())) |
| df_raster.withColumn("mask", expr("mask_udf(rast)")).withColumn( |
| "mask_rast", expr("RS_MakeRaster(rast, 'I', mask)") |
| ).show() |
| ``` |
| |
| ``` |
| +--------------------+--------------------+--------------------+ |
| | rast| mask| mask_rast| |
| +--------------------+--------------------+--------------------+ |
| |GridCoverage2D["g...|[0.0, 0.0, 0.0, 0...|GridCoverage2D["g...| |
| +--------------------+--------------------+--------------------+ |
| ``` |
| |
| ## Performance optimization |
| |
| When working with large raster datasets, refer to the [documentation on storing raster geometries in Parquet format](storing-blobs-in-parquet.md) for recommendations to optimize performance. |