blob: 832fb0403e76a5371454a775a18093320ca602f8 [file] [log] [blame]
---
title: "Using raster data in Apache Sedona for R"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Using raster data in Apache Sedona for R}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
eval = FALSE,
comment = "#>"
)
```
Raster data in GeoTiff and ArcInfoAsciiGrid formats can be read into and written from Spark.
# Using the RasterUDT
## Read
Raster data in GeoTiff and ArcInfo Grid format can be loaded directly into Spark using the `sparklyr::spark_read_binary` and Sedona constructors `RS_FromGeoTiff` and `RS_FromArcInfoAsciiGrid`.
```{r include=FALSE}
Sys.setenv("SEDONA_JAR_FILES" = "~/WORK/MISC_CODE/sedona/spark-shaded/target/sedona-spark-shaded-3.0_2.12-1.4.0-SNAPSHOT.jar")
```
```{r message=FALSE, warning=FALSE}
library(dplyr)
library(sparklyr)
library(apache.sedona)
sc <- spark_connect(master = "local")
data_tbl <- spark_read_binary(sc, dir = here::here("/../spark/common/src/test/resources/raster/"), name = "data")
raster <-
data_tbl %>%
mutate(raster = RS_FromGeoTiff(content))
raster
raster %>% sdf_schema()
```
Once the data is loaded, raster functions are available in dplyr workflows:
* [Raster operators](../../../api/sql/Raster-operators/)
* [Raster input and output](../../../api/sql/Raster-loader/)
Functions taking in `raster: Raster` arguments are meant to be used with data loaded with this reader, such as `RS_Value`, `RS_Values`, `RS_Envelope`. Functions taking in `Band: Array[Double]` arguments work with data loaded using the Sedona Geotiff DataFrame loader (see [below](#Using the Sedona Geotiff Dataframe Loader)).
For example, getting the number of bands:
```{r}
raster %>%
mutate(
nbands = RS_NumBands(raster)
) %>%
select(path, nbands) %>%
collect() %>%
mutate(path = path %>% basename())
```
Or getting values the envelope:
```{r}
raster %>%
mutate(
env = RS_Envelope(raster) %>% st_astext()
) %>%
select(path, env) %>%
collect() %>%
mutate(path = path %>% basename())
```
Or getting values at specific points:
```{r}
raster %>%
mutate(
val = RS_Value(raster, ST_Point(-13077301.685, 4002565.802))
) %>%
select(path, val) %>%
collect() %>%
mutate(path = path %>% basename())
```
## Write
To write a Sedona Raster DataFrame to raster files, you need to (1) first convert the Raster DataFrame to a binary DataFrame using `RS_AsXXX` functions and (2) then write the binary DataFrame to raster files using Sedona's built-in `raster` data source.
To write a Sedona binary DataFrame to external storage using Sedona's built-in `raster` data source, use the `spark_write_raster` function:
```{r}
dest_file <- tempfile()
raster %>%
mutate(content = RS_AsGeoTiff(raster)) %>%
spark_write_raster(path = dest_file)
dir(dest_file, recursive = TRUE)
```
Available options see [Raster writer](../../../api/sql/Raster-writer/):
* rasterField: the binary column to be saved (if there is only one takes that column by default, otherwise specify)
* fileExtension: `.tiff` by default, also accepts `.png`, `.jpeg`, `.asc`
* pathField: if used any column name that indicates the paths of each raster file, otherwise random UUIDs are generated.
```{r}
dest_file <- tempfile()
raster %>%
mutate(content = RS_AsArcGrid(raster)) %>%
spark_write_raster(path = dest_file,
options = list("rasterField" = "content",
"fileExtension" = ".asc",
"pathField" = "path"
))
dir(dest_file, recursive = TRUE)
```