docs/setup/docker.md - sedona - Git at Google

 <!--
  Licensed to the Apache Software Foundation (ASF) under one
  or more contributor license agreements.  See the NOTICE file
  distributed with this work for additional information
  regarding copyright ownership.  The ASF licenses this file
  to you under the Apache License, Version 2.0 (the
  "License"); you may not use this file except in compliance
  with the License.  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing,
  software distributed under the License is distributed on an
  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  KIND, either express or implied.  See the License for the
  specific language governing permissions and limitations
  under the License.
  -->

 # Sedona JupyterLab Docker Image

 Sedona Docker images are available on [Sedona official DockerHub repo](https://hub.docker.com/r/apache/sedona).

 We provide a Docker image for Apache Sedona with Python JupyterLab, Apache Zeppelin and 1 master node and 1 worker node.

 ## How to use

 ### Pull the image from DockerHub

 Format:

 ```bash
 docker pull apache/sedona:<sedona_version>
 ```

 Example 1: Pull the latest image of Sedona master branch

 ```bash
 docker pull apache/sedona:latest
 ```

 Example 2: Pull the image of a specific Sedona release

 ```bash
 docker pull apache/sedona:{{ sedona.current_version }}
 ```

 ### Start the container

 Format:

 ```bash
 docker run -d -e DRIVER_MEM=<driver_mem> -e EXECUTOR_MEM=<executor_mem> -p 8888:8888 -p 8080:8080 -p 8081:8081 -p 4040:4040 -p 8085:8085 apache/sedona:<sedona_version>
 ```

 Driver memory and executor memory are optional. If their values are not given, the container will take 4GB RAM for the driver and 4GB RAM for the executor. The -d (or --detach) flag ensures the container runs in detached mode, allowing it to run in the background.

 Example 1:

 ```bash
 docker run -d -e DRIVER_MEM=6g -e EXECUTOR_MEM=8g -p 8888:8888 -p 8080:8080 -p 8081:8081 -p 4040:4040 -p 8085:8085 apache/sedona:latest
 ```

 This command will start a container with 6GB RAM for the driver and 8GB RAM for the executor and use the latest Sedona image. The container will run in detached mode.

 This command will bind the container's ports 8888, 8080, 8081, 4040, 8085 to the host's ports 8888, 8080, 8081, 4040, 8085 respectively.

 Example 2:

 ```bash
 docker run -d -e -p 8888:8888 -p 8080:8080 -p 8081:8081 -p 4040:4040 -p 8085:8085 apache/sedona:{{ sedona.current_version }}
 ```

 This command will start a container with 4GB RAM for the driver and 4GB RAM for the executor and use Sedona {{ sedona.current_version }} image.

 This command will bind the container's ports 8888, 8080, 8081, 4040, 8085 to the host's ports 8888, 8080, 8081, 4040, 8085 respectively.

 Example 3: Persisting `/opt` (Jupyter & Zeppelin Data) with Docker Volume

 To ensure that **Jupyter workspace, Zeppelin notebooks, and configurations persist**, mount `/opt` as a **Docker volume**:

 ```bash
 docker run -d -e DRIVER_MEM=6g -e EXECUTOR_MEM=8g \
     -p 8888:8888 -p 8080:8080 -p 8081:8081 -p 4040:4040 -p 8085:8085 \
     -v sedona_opt:/opt \
     apache/sedona:latest
 ```

 - The `-v sedona_opt:/opt` flag **creates (if not existing) and mounts a Docker volume named `sedona_opt`** to the `/opt` directory inside the container.
 - This ensures that **Jupyter and Zeppelin notebooks, configurations, and workspaces persist** even if the container is stopped or removed.

 ### Start coding

 Open your browser and go to [http://localhost:8888/](http://localhost:8888/) to start coding with Sedona in Jupyter Notebook. You can also access Apache Zeppelin at [http://localhost:8085/classic/](http://localhost:8085/classic/  ) using your browser.

 ### Notes

 - This container assumes you have at least 8GB RAM and takes all your CPU cores and 8GM RAM. The 1 worker will take 4GB and the Jupyter program will take the remaining 4GB.
 - Sedona in this container runs in the cluster mode. Only 1 notebook can be run at a time. If you want to run another notebook, please shut down the kernel of the current notebook first ([How?](https://jupyterlab.readthedocs.io/en/stable/user/running.html)).

 ## How to build

 Clone the Sedona GitHub repository

 ### Build the image against a Sedona release

 Requirements: docker ([How?](https://docs.docker.com/engine/install/))

 Format:

 ```bash
 ./docker/build.sh <spark_version> <sedona_version> <build_mode>
 ```

 Example:

 ```bash
 ./docker/build.sh 3.4.1 {{ sedona.current_version }}
 ```

 `build_mode` is optional. If its value is not given or is `local`, the script will build the image locally. Otherwise, it will start a cross-platform compilation and push images directly to DockerHub.

 ### Build the image against the latest Sedona master

 Requirements: docker ([How?](https://docs.docker.com/engine/install/)), JDK <= 19, maven3

 Format:

 ```bash
 ./docker/build.sh <spark_version> latest <build_mode>
 ```

 Example:

 ```bash
 ./docker/build.sh 3.4.1 latest
 ```

 `build_mode` is optional. If its value is not given or is `local`, the script will build the image locally. Otherwise, it will start a cross-platform compilation and push images directly to DockerHub.

 ### Notes

 This docker image can only be built against Sedona 1.7.0+ and Spark 3.3+

 ## Cluster Configuration

 ### Software

 - OS: Ubuntu 22.02
 - JDK: openjdk-19
 - Python: 3.10
 - Spark 3.5.5

 ### Web UI

 - JupyterLab: http://localhost:8888/
 - Spark master URL: spark://localhost:7077
 - Spark job UI: http://localhost:4040
 - Spark master web UI: http://localhost:8080/
 - Spark work web UI: http://localhost:8081/
 - Apache Zeppelin: http://localhost:8085/

 A Zeppelin tutorial notebook is bundled with Sedona tutorials. See [Sedona-Zeppelin tutorial](../tutorial/zeppelin.md) for details.

 ## How to push to DockerHub

 Format:

 ```bash
 docker login
 ./docker/build.sh <spark_version> <sedona_version> release
 ```

 Example:

 ```bash
 docker login
 ./docker/build.sh 3.4.1 {{ sedona.current_version }} release
 ```
	<!--
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	-->

	# Sedona JupyterLab Docker Image

	Sedona Docker images are available on [Sedona official DockerHub repo](https://hub.docker.com/r/apache/sedona).

	We provide a Docker image for Apache Sedona with Python JupyterLab, Apache Zeppelin and 1 master node and 1 worker node.

	## How to use

	### Pull the image from DockerHub

	Format:

	```bash
	docker pull apache/sedona:<sedona_version>
	```

	Example 1: Pull the latest image of Sedona master branch

	```bash
	docker pull apache/sedona:latest
	```

	Example 2: Pull the image of a specific Sedona release

	```bash
	docker pull apache/sedona:{{ sedona.current_version }}
	```

	### Start the container

	Format:

	```bash
	docker run -d -e DRIVER_MEM=<driver_mem> -e EXECUTOR_MEM=<executor_mem> -p 8888:8888 -p 8080:8080 -p 8081:8081 -p 4040:4040 -p 8085:8085 apache/sedona:<sedona_version>
	```

	Driver memory and executor memory are optional. If their values are not given, the container will take 4GB RAM for the driver and 4GB RAM for the executor. The -d (or --detach) flag ensures the container runs in detached mode, allowing it to run in the background.

	Example 1:

	```bash
	docker run -d -e DRIVER_MEM=6g -e EXECUTOR_MEM=8g -p 8888:8888 -p 8080:8080 -p 8081:8081 -p 4040:4040 -p 8085:8085 apache/sedona:latest
	```

	This command will start a container with 6GB RAM for the driver and 8GB RAM for the executor and use the latest Sedona image. The container will run in detached mode.

	This command will bind the container's ports 8888, 8080, 8081, 4040, 8085 to the host's ports 8888, 8080, 8081, 4040, 8085 respectively.

	Example 2:

	```bash
	docker run -d -e -p 8888:8888 -p 8080:8080 -p 8081:8081 -p 4040:4040 -p 8085:8085 apache/sedona:{{ sedona.current_version }}
	```

	This command will start a container with 4GB RAM for the driver and 4GB RAM for the executor and use Sedona {{ sedona.current_version }} image.

	This command will bind the container's ports 8888, 8080, 8081, 4040, 8085 to the host's ports 8888, 8080, 8081, 4040, 8085 respectively.

	Example 3: Persisting `/opt` (Jupyter & Zeppelin Data) with Docker Volume

	To ensure that Jupyter workspace, Zeppelin notebooks, and configurations persist, mount `/opt` as a Docker volume:

	```bash
	docker run -d -e DRIVER_MEM=6g -e EXECUTOR_MEM=8g \
	-p 8888:8888 -p 8080:8080 -p 8081:8081 -p 4040:4040 -p 8085:8085 \
	-v sedona_opt:/opt \
	apache/sedona:latest
	```

	- The `-v sedona_opt:/opt` flag creates (if not existing) and mounts a Docker volume named `sedona_opt` to the `/opt` directory inside the container.
	- This ensures that Jupyter and Zeppelin notebooks, configurations, and workspaces persist even if the container is stopped or removed.

	### Start coding

	Open your browser and go to [http://localhost:8888/](http://localhost:8888/) to start coding with Sedona in Jupyter Notebook. You can also access Apache Zeppelin at [http://localhost:8085/classic/](http://localhost:8085/classic/ ) using your browser.

	### Notes

	- This container assumes you have at least 8GB RAM and takes all your CPU cores and 8GM RAM. The 1 worker will take 4GB and the Jupyter program will take the remaining 4GB.
	- Sedona in this container runs in the cluster mode. Only 1 notebook can be run at a time. If you want to run another notebook, please shut down the kernel of the current notebook first ([How?](https://jupyterlab.readthedocs.io/en/stable/user/running.html)).

	## How to build

	Clone the Sedona GitHub repository

	### Build the image against a Sedona release

	Requirements: docker ([How?](https://docs.docker.com/engine/install/))

	Format:

	```bash
	./docker/build.sh <spark_version> <sedona_version> <build_mode>
	```

	Example:

	```bash
	./docker/build.sh 3.4.1 {{ sedona.current_version }}
	```

	`build_mode` is optional. If its value is not given or is `local`, the script will build the image locally. Otherwise, it will start a cross-platform compilation and push images directly to DockerHub.

	### Build the image against the latest Sedona master

	Requirements: docker ([How?](https://docs.docker.com/engine/install/)), JDK <= 19, maven3

	Format:

	```bash
	./docker/build.sh <spark_version> latest <build_mode>
	```

	Example:

	```bash
	./docker/build.sh 3.4.1 latest
	```

	`build_mode` is optional. If its value is not given or is `local`, the script will build the image locally. Otherwise, it will start a cross-platform compilation and push images directly to DockerHub.

	### Notes

	This docker image can only be built against Sedona 1.7.0+ and Spark 3.3+

	## Cluster Configuration

	### Software

	- OS: Ubuntu 22.02
	- JDK: openjdk-19
	- Python: 3.10
	- Spark 3.5.5

	### Web UI

	- JupyterLab: http://localhost:8888/
	- Spark master URL: spark://localhost:7077
	- Spark job UI: http://localhost:4040
	- Spark master web UI: http://localhost:8080/
	- Spark work web UI: http://localhost:8081/
	- Apache Zeppelin: http://localhost:8085/

	A Zeppelin tutorial notebook is bundled with Sedona tutorials. See [Sedona-Zeppelin tutorial](../tutorial/zeppelin.md) for details.

	## How to push to DockerHub

	Format:

	```bash
	docker login
	./docker/build.sh <spark_version> <sedona_version> release
	```

	Example:

	```bash
	docker login
	./docker/build.sh 3.4.1 {{ sedona.current_version }} release
	```