blob: 0093adf4533d14e3b10aa5952fcaa0f9c07f809f [file] [log] [blame] [view]
We recommend Sedona-1.3.1-incuabting and above for EMR. In the tutorial, we use AWS Elastic MapReduce (EMR) 6.9.0. It has the following applications installed: Hadoop 3.3.3, JupyterEnterpriseGateway 2.6.0, Livy 0.7.1, Spark 3.3.0.
!!!tip
Wherobots Cloud provides a free tool to deploy Apache Sedona to AWS EMR. Please sign up [here](https://www.wherobots.services/).
This tutorial is tested on EMR on EC2 with EMR Studio (notebooks). EMR on EC2 uses YARN to manage resources.
## Prepare initialization script
In your S3 bucket, add a script that has the following content:
```bash
#!/bin/bash
# EMR clusters only have ephemeral local storage. It does not really matter where we store the jars.
sudo mkdir /jars
# Download Sedona jar
sudo curl -o /jars/sedona-spark-shaded-3.0_2.12-{{ sedona.current_version }}.jar "https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.0_2.12/{{ sedona.current_version }}/sedona-spark-shaded-3.0_2.12-{{ sedona.current_version }}.jar"
# Download GeoTools jar
sudo curl -o /jars/geotools-wrapper-{{ sedona.current_geotools }}.jar "https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/{{ sedona.current_geotools }}/geotools-wrapper-{{ sedona.current_geotools }}.jar"
# Install necessary python libraries
sudo python3 -m pip install pandas shapely==1.8.5
sudo python3 -m pip install pandas geopandas==0.10.2
sudo python3 -m pip install attrs matplotlib descartes apache-sedona=={{ sedona.current_version }}
```
When you create a EMR cluster, in the `bootstrap action`, specify the location of this script.
## Add software configuration
When you create a EMR cluster, in the software configuration, add the following content:
```bash
[
{
"Classification":"spark-defaults",
"Properties":{
"spark.yarn.dist.jars": "/jars/sedona-spark-shaded-3.0_2.12-{{ sedona.current_version }}.jar,/jars/geotools-wrapper-{{ sedona.current_geotools }}.jar",
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
"spark.kryo.registrator": "org.apache.sedona.core.serde.SedonaKryoRegistrator",
"spark.sql.extensions": "org.apache.sedona.viz.sql.SedonaVizExtensions,org.apache.sedona.sql.SedonaSqlExtensions"
}
}
]
```
!!!note
If you use Sedona 1.3.1-incubating, please use `sedona-python-adpater-3.0_2.12` jar in the content above, instead of `sedona-spark-shaded-3.0_2.12`.