| --- |
| title: "Using raster data in Apache Sedona for R" |
| output: rmarkdown::html_vignette |
| vignette: > |
| %\VignetteIndexEntry{Using raster data in Apache Sedona for R} |
| %\VignetteEngine{knitr::rmarkdown} |
| %\VignetteEncoding{UTF-8} |
| --- |
| |
| ```{r, include = FALSE} |
| knitr::opts_chunk$set( |
| collapse = TRUE, |
| eval = FALSE, |
| comment = "#>" |
| ) |
| ``` |
| |
| |
| Raster data in GeoTiff and ArcInfoAsciiGrid formats can be read into and written from Spark. |
| |
| # Using the RasterUDT |
| |
| ## Read |
| Raster data in GeoTiff and ArcInfo Grid format can be loaded directly into Spark using the `sparklyr::spark_read_binary` and Sedona constructors `RS_FromGeoTiff` and `RS_FromArcInfoAsciiGrid`. |
| |
| ```{r include=FALSE} |
| Sys.setenv("SEDONA_JAR_FILES" = "~/WORK/MISC_CODE/sedona/spark-shaded/target/sedona-spark-shaded-3.0_2.12-1.4.0-SNAPSHOT.jar") |
| ``` |
| |
| ```{r message=FALSE, warning=FALSE} |
| library(dplyr) |
| library(sparklyr) |
| library(apache.sedona) |
| |
| sc <- spark_connect(master = "local") |
| |
| data_tbl <- spark_read_binary(sc, dir = here::here("/../spark/common/src/test/resources/raster/"), name = "data") |
| |
| raster <- |
| data_tbl %>% |
| mutate(raster = RS_FromGeoTiff(content)) |
| |
| raster |
| |
| raster %>% sdf_schema() |
| ``` |
| |
| Once the data is loaded, raster functions are available in dplyr workflows: |
| |
| * [Raster operators](../../../api/sql/Raster-operators/) |
| * [Raster input and output](../../../api/sql/Raster-loader/) |
| |
| Functions taking in `raster: Raster` arguments are meant to be used with data loaded with this reader, such as `RS_Value`, `RS_Values`, `RS_Envelope`. Functions taking in `Band: Array[Double]` arguments work with data loaded using the Sedona Geotiff DataFrame loader (see [below](#Using the Sedona Geotiff Dataframe Loader)). |
| |
| |
| For example, getting the number of bands: |
| ```{r} |
| raster %>% |
| mutate( |
| nbands = RS_NumBands(raster) |
| ) %>% |
| select(path, nbands) %>% |
| collect() %>% |
| mutate(path = path %>% basename()) |
| ``` |
| |
| Or getting values the envelope: |
| ```{r} |
| raster %>% |
| mutate( |
| env = RS_Envelope(raster) %>% st_astext() |
| ) %>% |
| select(path, env) %>% |
| collect() %>% |
| mutate(path = path %>% basename()) |
| ``` |
| |
| Or getting values at specific points: |
| ```{r} |
| raster %>% |
| mutate( |
| val = RS_Value(raster, ST_Point(-13077301.685, 4002565.802)) |
| ) %>% |
| select(path, val) %>% |
| collect() %>% |
| mutate(path = path %>% basename()) |
| ``` |
| |
| |
| ## Write |
| |
| |
| To write a Sedona Raster DataFrame to raster files, you need to (1) first convert the Raster DataFrame to a binary DataFrame using `RS_AsXXX` functions and (2) then write the binary DataFrame to raster files using Sedona's built-in `raster` data source. |
| |
| To write a Sedona binary DataFrame to external storage using Sedona's built-in `raster` data source, use the `spark_write_raster` function: |
| |
| ```{r} |
| dest_file <- tempfile() |
| raster %>% |
| mutate(content = RS_AsGeoTiff(raster)) %>% |
| spark_write_raster(path = dest_file) |
| |
| dir(dest_file, recursive = TRUE) |
| ``` |
| |
| |
| Available options see [Raster writer](../../../api/sql/Raster-writer/): |
| |
| * rasterField: the binary column to be saved (if there is only one takes that column by default, otherwise specify) |
| * fileExtension: `.tiff` by default, also accepts `.png`, `.jpeg`, `.asc` |
| * pathField: if used any column name that indicates the paths of each raster file, otherwise random UUIDs are generated. |
| |
| ```{r} |
| dest_file <- tempfile() |
| raster %>% |
| mutate(content = RS_AsArcGrid(raster)) %>% |
| spark_write_raster(path = dest_file, |
| options = list("rasterField" = "content", |
| "fileExtension" = ".asc", |
| "pathField" = "path" |
| )) |
| |
| dir(dest_file, recursive = TRUE) |
| ``` |