[SPARK-22994][K8S] Use a single image for all Spark containers.
This change allows a user to submit a Spark application on kubernetes
having to provide a single image, instead of one image for each type
of container. The image's entry point now takes an extra argument that
identifies the process that is being started.
The configuration still allows the user to provide different images
for each container type if they so desire.
On top of that, the entry point was simplified a bit to share more
code; mainly, the same env variable is used to propagate the user-defined
classpath to the different containers.
Aside from being modified to match the new behavior, the
'build-push-docker-images.sh' script was renamed to 'docker-image-tool.sh'
to more closely match its purpose; the old name was a little awkward
and now also not entirely correct, since there is a single image. It
was also moved to 'bin' since it's not necessarily an admin tool.
Docs have been updated to match the new behavior.
Tested locally with minikube.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #20192 from vanzin/SPARK-22994.
(cherry picked from commit 0b2eefb674151a0af64806728b38d9410da552ec)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
diff --git a/sbin/build-push-docker-images.sh b/bin/docker-image-tool.sh
similarity index 63%
rename from sbin/build-push-docker-images.sh
rename to bin/docker-image-tool.sh
index b953259..0714063 100755
--- a/sbin/build-push-docker-images.sh
+++ b/bin/docker-image-tool.sh
@@ -24,29 +24,11 @@
exit 1
}
-# Detect whether this is a git clone or a Spark distribution and adjust paths
-# accordingly.
if [ -z "${SPARK_HOME}" ]; then
SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
fi
. "${SPARK_HOME}/bin/load-spark-env.sh"
-if [ -f "$SPARK_HOME/RELEASE" ]; then
- IMG_PATH="kubernetes/dockerfiles"
- SPARK_JARS="jars"
-else
- IMG_PATH="resource-managers/kubernetes/docker/src/main/dockerfiles"
- SPARK_JARS="assembly/target/scala-$SPARK_SCALA_VERSION/jars"
-fi
-
-if [ ! -d "$IMG_PATH" ]; then
- error "Cannot find docker images. This script must be run from a runnable distribution of Apache Spark."
-fi
-
-declare -A path=( [spark-driver]="$IMG_PATH/driver/Dockerfile" \
- [spark-executor]="$IMG_PATH/executor/Dockerfile" \
- [spark-init]="$IMG_PATH/init-container/Dockerfile" )
-
function image_ref {
local image="$1"
local add_repo="${2:-1}"
@@ -60,35 +42,49 @@
}
function build {
- docker build \
- --build-arg "spark_jars=$SPARK_JARS" \
- --build-arg "img_path=$IMG_PATH" \
- -t spark-base \
- -f "$IMG_PATH/spark-base/Dockerfile" .
- for image in "${!path[@]}"; do
- docker build -t "$(image_ref $image)" -f ${path[$image]} .
- done
+ local BUILD_ARGS
+ local IMG_PATH
+
+ if [ ! -f "$SPARK_HOME/RELEASE" ]; then
+ # Set image build arguments accordingly if this is a source repo and not a distribution archive.
+ IMG_PATH=resource-managers/kubernetes/docker/src/main/dockerfiles
+ BUILD_ARGS=(
+ --build-arg
+ img_path=$IMG_PATH
+ --build-arg
+ spark_jars=assembly/target/scala-$SPARK_SCALA_VERSION/jars
+ )
+ else
+ # Not passed as an argument to docker, but used to validate the Spark directory.
+ IMG_PATH="kubernetes/dockerfiles"
+ fi
+
+ if [ ! -d "$IMG_PATH" ]; then
+ error "Cannot find docker image. This script must be run from a runnable distribution of Apache Spark."
+ fi
+
+ docker build "${BUILD_ARGS[@]}" \
+ -t $(image_ref spark) \
+ -f "$IMG_PATH/spark/Dockerfile" .
}
function push {
- for image in "${!path[@]}"; do
- docker push "$(image_ref $image)"
- done
+ docker push "$(image_ref spark)"
}
function usage {
cat <<EOF
Usage: $0 [options] [command]
-Builds or pushes the built-in Spark Docker images.
+Builds or pushes the built-in Spark Docker image.
Commands:
- build Build images.
- push Push images to a registry. Requires a repository address to be provided, both
- when building and when pushing the images.
+ build Build image. Requires a repository address to be provided if the image will be
+ pushed to a different registry.
+ push Push a pre-built image to a registry. Requires a repository address to be provided.
Options:
-r repo Repository address.
- -t tag Tag to apply to built images, or to identify images to be pushed.
+ -t tag Tag to apply to the built image, or to identify the image to be pushed.
-m Use minikube's Docker daemon.
Using minikube when building images will do so directly into minikube's Docker daemon.
@@ -100,10 +96,10 @@
https://kubernetes.io/docs/getting-started-guides/minikube/#reusing-the-docker-daemon
Examples:
- - Build images in minikube with tag "testing"
+ - Build image in minikube with tag "testing"
$0 -m -t testing build
- - Build and push images with tag "v2.3.0" to docker.io/myrepo
+ - Build and push image with tag "v2.3.0" to docker.io/myrepo
$0 -r docker.io/myrepo -t v2.3.0 build
$0 -r docker.io/myrepo -t v2.3.0 push
EOF
diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md
index 2d69f63..08ec34c 100644
--- a/docs/running-on-kubernetes.md
+++ b/docs/running-on-kubernetes.md
@@ -53,20 +53,17 @@
Kubernetes requires users to supply images that can be deployed into containers within pods. The images are built to
be run in a container runtime environment that Kubernetes supports. Docker is a container runtime environment that is
-frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles provided in the runnable distribution that can be customized
-and built for your usage.
+frequently used with Kubernetes. Spark (starting with version 2.3) ships with a Dockerfile that can be used for this
+purpose, or customized to match an individual application's needs. It can be found in the `kubernetes/dockerfiles/`
+directory.
-You may build these docker images from sources.
-There is a script, `sbin/build-push-docker-images.sh` that you can use to build and push
-customized Spark distribution images consisting of all the above components.
+Spark also ships with a `bin/docker-image-tool.sh` script that can be used to build and publish the Docker images to
+use with the Kubernetes backend.
Example usage is:
- ./sbin/build-push-docker-images.sh -r <repo> -t my-tag build
- ./sbin/build-push-docker-images.sh -r <repo> -t my-tag push
-
-Docker files are under the `kubernetes/dockerfiles/` directory and can be customized further before
-building using the supplied script, or manually.
+ ./bin/docker-image-tool.sh -r <repo> -t my-tag build
+ ./bin/docker-image-tool.sh -r <repo> -t my-tag push
## Cluster Mode
@@ -79,8 +76,7 @@
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
- --conf spark.kubernetes.driver.container.image=<driver-image> \
- --conf spark.kubernetes.executor.container.image=<executor-image> \
+ --conf spark.kubernetes.container.image=<spark-image> \
local:///path/to/examples.jar
```
@@ -126,13 +122,7 @@
### Using Remote Dependencies
When there are application dependencies hosted in remote locations like HDFS or HTTP servers, the driver and executor pods
need a Kubernetes [init-container](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/) for downloading
-the dependencies so the driver and executor containers can use them locally. This requires users to specify the container
-image for the init-container using the configuration property `spark.kubernetes.initContainer.image`. For example, users
-simply add the following option to the `spark-submit` command to specify the init-container image:
-
-```
---conf spark.kubernetes.initContainer.image=<init-container image>
-```
+the dependencies so the driver and executor containers can use them locally.
The init-container handles remote dependencies specified in `spark.jars` (or the `--jars` option of `spark-submit`) and
`spark.files` (or the `--files` option of `spark-submit`). It also handles remotely hosted main application resources, e.g.,
@@ -147,9 +137,7 @@
--jars https://path/to/dependency1.jar,https://path/to/dependency2.jar
--files hdfs://host:port/path/to/file1,hdfs://host:port/path/to/file2
--conf spark.executor.instances=5 \
- --conf spark.kubernetes.driver.container.image=<driver-image> \
- --conf spark.kubernetes.executor.container.image=<executor-image> \
- --conf spark.kubernetes.initContainer.image=<init-container image>
+ --conf spark.kubernetes.container.image=<spark-image> \
https://path/to/examples.jar
```
@@ -322,21 +310,27 @@
</td>
</tr>
<tr>
- <td><code>spark.kubernetes.driver.container.image</code></td>
+ <td><code>spark.kubernetes.container.image</code></td>
<td><code>(none)</code></td>
<td>
- Container image to use for the driver.
- This is usually of the form <code>example.com/repo/spark-driver:v1.0.0</code>.
- This configuration is required and must be provided by the user.
+ Container image to use for the Spark application.
+ This is usually of the form <code>example.com/repo/spark:v1.0.0</code>.
+ This configuration is required and must be provided by the user, unless explicit
+ images are provided for each different container type.
+ </td>
+</tr>
+<tr>
+ <td><code>spark.kubernetes.driver.container.image</code></td>
+ <td><code>(value of spark.kubernetes.container.image)</code></td>
+ <td>
+ Custom container image to use for the driver.
</td>
</tr>
<tr>
<td><code>spark.kubernetes.executor.container.image</code></td>
- <td><code>(none)</code></td>
+ <td><code>(value of spark.kubernetes.container.image)</code></td>
<td>
- Container image to use for the executors.
- This is usually of the form <code>example.com/repo/spark-executor:v1.0.0</code>.
- This configuration is required and must be provided by the user.
+ Custom container image to use for executors.
</td>
</tr>
<tr>
@@ -643,9 +637,9 @@
</tr>
<tr>
<td><code>spark.kubernetes.initContainer.image</code></td>
- <td>(none)</td>
+ <td><code>(value of spark.kubernetes.container.image)</code></td>
<td>
- Container image for the <a href="https://kubernetes.io/docs/concepts/workloads/pods/init-containers/">init-container</a> of the driver and executors for downloading dependencies. This is usually of the form <code>example.com/repo/spark-init:v1.0.0</code>. This configuration is optional and must be provided by the user if any non-container local dependency is used and must be downloaded remotely.
+ Custom container image for the init container of both driver and executors.
</td>
</tr>
<tr>
diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
index e5d79d9..471196a 100644
--- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
+++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
@@ -29,17 +29,23 @@
.stringConf
.createWithDefault("default")
+ val CONTAINER_IMAGE =
+ ConfigBuilder("spark.kubernetes.container.image")
+ .doc("Container image to use for Spark containers. Individual container types " +
+ "(e.g. driver or executor) can also be configured to use different images if desired, " +
+ "by setting the container type-specific image name.")
+ .stringConf
+ .createOptional
+
val DRIVER_CONTAINER_IMAGE =
ConfigBuilder("spark.kubernetes.driver.container.image")
.doc("Container image to use for the driver.")
- .stringConf
- .createOptional
+ .fallbackConf(CONTAINER_IMAGE)
val EXECUTOR_CONTAINER_IMAGE =
ConfigBuilder("spark.kubernetes.executor.container.image")
.doc("Container image to use for the executors.")
- .stringConf
- .createOptional
+ .fallbackConf(CONTAINER_IMAGE)
val CONTAINER_IMAGE_PULL_POLICY =
ConfigBuilder("spark.kubernetes.container.image.pullPolicy")
@@ -148,8 +154,7 @@
val INIT_CONTAINER_IMAGE =
ConfigBuilder("spark.kubernetes.initContainer.image")
.doc("Image for the driver and executor's init-container for downloading dependencies.")
- .stringConf
- .createOptional
+ .fallbackConf(CONTAINER_IMAGE)
val INIT_CONTAINER_MOUNT_TIMEOUT =
ConfigBuilder("spark.kubernetes.mountDependencies.timeout")
diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala
index 111cb2a..9411956 100644
--- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala
+++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala
@@ -60,10 +60,9 @@
val ENV_APPLICATION_ID = "SPARK_APPLICATION_ID"
val ENV_EXECUTOR_ID = "SPARK_EXECUTOR_ID"
val ENV_EXECUTOR_POD_IP = "SPARK_EXECUTOR_POD_IP"
- val ENV_EXECUTOR_EXTRA_CLASSPATH = "SPARK_EXECUTOR_EXTRA_CLASSPATH"
val ENV_MOUNTED_CLASSPATH = "SPARK_MOUNTED_CLASSPATH"
val ENV_JAVA_OPT_PREFIX = "SPARK_JAVA_OPT_"
- val ENV_SUBMIT_EXTRA_CLASSPATH = "SPARK_SUBMIT_EXTRA_CLASSPATH"
+ val ENV_CLASSPATH = "SPARK_CLASSPATH"
val ENV_DRIVER_MAIN_CLASS = "SPARK_DRIVER_CLASS"
val ENV_DRIVER_ARGS = "SPARK_DRIVER_ARGS"
val ENV_DRIVER_JAVA_OPTS = "SPARK_DRIVER_JAVA_OPTS"
diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/InitContainerBootstrap.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/InitContainerBootstrap.scala
index dfeccf9..f6a57df 100644
--- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/InitContainerBootstrap.scala
+++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/InitContainerBootstrap.scala
@@ -77,6 +77,7 @@
.withMountPath(INIT_CONTAINER_PROPERTIES_FILE_DIR)
.endVolumeMount()
.addToVolumeMounts(sharedVolumeMounts: _*)
+ .addToArgs("init")
.addToArgs(INIT_CONTAINER_PROPERTIES_FILE_PATH)
.build()
diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/BasicDriverConfigurationStep.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/BasicDriverConfigurationStep.scala
index eca46b8..164e2e5 100644
--- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/BasicDriverConfigurationStep.scala
+++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/BasicDriverConfigurationStep.scala
@@ -66,7 +66,7 @@
override def configureDriver(driverSpec: KubernetesDriverSpec): KubernetesDriverSpec = {
val driverExtraClasspathEnv = driverExtraClasspath.map { classPath =>
new EnvVarBuilder()
- .withName(ENV_SUBMIT_EXTRA_CLASSPATH)
+ .withName(ENV_CLASSPATH)
.withValue(classPath)
.build()
}
@@ -133,6 +133,7 @@
.addToLimits("memory", driverMemoryLimitQuantity)
.addToLimits(maybeCpuLimitQuantity.toMap.asJava)
.endResources()
+ .addToArgs("driver")
.build()
val baseDriverPod = new PodBuilder(driverSpec.driverPod)
diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala
index bcacb39..141bd28 100644
--- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala
+++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala
@@ -128,7 +128,7 @@
.build()
val executorExtraClasspathEnv = executorExtraClasspath.map { cp =>
new EnvVarBuilder()
- .withName(ENV_EXECUTOR_EXTRA_CLASSPATH)
+ .withName(ENV_CLASSPATH)
.withValue(cp)
.build()
}
@@ -181,6 +181,7 @@
.endResources()
.addAllToEnv(executorEnv.asJava)
.withPorts(requiredPorts.asJava)
+ .addToArgs("executor")
.build()
val executorPod = new PodBuilder()
diff --git a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/DriverConfigOrchestratorSuite.scala b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/DriverConfigOrchestratorSuite.scala
index f193b1f..65274c6 100644
--- a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/DriverConfigOrchestratorSuite.scala
+++ b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/DriverConfigOrchestratorSuite.scala
@@ -34,8 +34,7 @@
private val SECRET_MOUNT_PATH = "/etc/secrets/driver"
test("Base submission steps with a main app resource.") {
- val sparkConf = new SparkConf(false)
- .set(DRIVER_CONTAINER_IMAGE, DRIVER_IMAGE)
+ val sparkConf = new SparkConf(false).set(CONTAINER_IMAGE, DRIVER_IMAGE)
val mainAppResource = JavaMainAppResource("local:///var/apps/jars/main.jar")
val orchestrator = new DriverConfigOrchestrator(
APP_ID,
@@ -55,8 +54,7 @@
}
test("Base submission steps without a main app resource.") {
- val sparkConf = new SparkConf(false)
- .set(DRIVER_CONTAINER_IMAGE, DRIVER_IMAGE)
+ val sparkConf = new SparkConf(false).set(CONTAINER_IMAGE, DRIVER_IMAGE)
val orchestrator = new DriverConfigOrchestrator(
APP_ID,
LAUNCH_TIME,
@@ -75,8 +73,8 @@
test("Submission steps with an init-container.") {
val sparkConf = new SparkConf(false)
- .set(DRIVER_CONTAINER_IMAGE, DRIVER_IMAGE)
- .set(INIT_CONTAINER_IMAGE, IC_IMAGE)
+ .set(CONTAINER_IMAGE, DRIVER_IMAGE)
+ .set(INIT_CONTAINER_IMAGE.key, IC_IMAGE)
.set("spark.jars", "hdfs://localhost:9000/var/apps/jars/jar1.jar")
val mainAppResource = JavaMainAppResource("local:///var/apps/jars/main.jar")
val orchestrator = new DriverConfigOrchestrator(
@@ -98,7 +96,7 @@
test("Submission steps with driver secrets to mount") {
val sparkConf = new SparkConf(false)
- .set(DRIVER_CONTAINER_IMAGE, DRIVER_IMAGE)
+ .set(CONTAINER_IMAGE, DRIVER_IMAGE)
.set(s"$KUBERNETES_DRIVER_SECRETS_PREFIX$SECRET_FOO", SECRET_MOUNT_PATH)
.set(s"$KUBERNETES_DRIVER_SECRETS_PREFIX$SECRET_BAR", SECRET_MOUNT_PATH)
val mainAppResource = JavaMainAppResource("local:///var/apps/jars/main.jar")
diff --git a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/steps/BasicDriverConfigurationStepSuite.scala b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/steps/BasicDriverConfigurationStepSuite.scala
index 8ee629a..b136f2c0 100644
--- a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/steps/BasicDriverConfigurationStepSuite.scala
+++ b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/steps/BasicDriverConfigurationStepSuite.scala
@@ -47,7 +47,7 @@
.set(KUBERNETES_DRIVER_LIMIT_CORES, "4")
.set(org.apache.spark.internal.config.DRIVER_MEMORY.key, "256M")
.set(org.apache.spark.internal.config.DRIVER_MEMORY_OVERHEAD, 200L)
- .set(DRIVER_CONTAINER_IMAGE, "spark-driver:latest")
+ .set(CONTAINER_IMAGE, "spark-driver:latest")
.set(s"$KUBERNETES_DRIVER_ANNOTATION_PREFIX$CUSTOM_ANNOTATION_KEY", CUSTOM_ANNOTATION_VALUE)
.set(s"$KUBERNETES_DRIVER_ENV_KEY$DRIVER_CUSTOM_ENV_KEY1", "customDriverEnv1")
.set(s"$KUBERNETES_DRIVER_ENV_KEY$DRIVER_CUSTOM_ENV_KEY2", "customDriverEnv2")
@@ -79,7 +79,7 @@
.asScala
.map(env => (env.getName, env.getValue))
.toMap
- assert(envs(ENV_SUBMIT_EXTRA_CLASSPATH) === "/opt/spark/spark-examples.jar")
+ assert(envs(ENV_CLASSPATH) === "/opt/spark/spark-examples.jar")
assert(envs(ENV_DRIVER_MEMORY) === "256M")
assert(envs(ENV_DRIVER_MAIN_CLASS) === MAIN_CLASS)
assert(envs(ENV_DRIVER_ARGS) === "arg1 arg2 \"arg 3\"")
diff --git a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/steps/initcontainer/InitContainerConfigOrchestratorSuite.scala b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/steps/initcontainer/InitContainerConfigOrchestratorSuite.scala
index 20f2e5b..09b42e4 100644
--- a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/steps/initcontainer/InitContainerConfigOrchestratorSuite.scala
+++ b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/steps/initcontainer/InitContainerConfigOrchestratorSuite.scala
@@ -40,7 +40,7 @@
test("including basic configuration step") {
val sparkConf = new SparkConf(true)
- .set(INIT_CONTAINER_IMAGE, DOCKER_IMAGE)
+ .set(CONTAINER_IMAGE, DOCKER_IMAGE)
.set(s"$KUBERNETES_DRIVER_LABEL_PREFIX$CUSTOM_LABEL_KEY", CUSTOM_LABEL_VALUE)
val orchestrator = new InitContainerConfigOrchestrator(
@@ -59,7 +59,7 @@
test("including step to mount user-specified secrets") {
val sparkConf = new SparkConf(false)
- .set(INIT_CONTAINER_IMAGE, DOCKER_IMAGE)
+ .set(CONTAINER_IMAGE, DOCKER_IMAGE)
.set(s"$KUBERNETES_DRIVER_SECRETS_PREFIX$SECRET_FOO", SECRET_MOUNT_PATH)
.set(s"$KUBERNETES_DRIVER_SECRETS_PREFIX$SECRET_BAR", SECRET_MOUNT_PATH)
diff --git a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactorySuite.scala b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactorySuite.scala
index 7cfbe54..a3c615b 100644
--- a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactorySuite.scala
+++ b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactorySuite.scala
@@ -54,7 +54,7 @@
baseConf = new SparkConf()
.set(KUBERNETES_DRIVER_POD_NAME, driverPodName)
.set(KUBERNETES_EXECUTOR_POD_NAME_PREFIX, executorPrefix)
- .set(EXECUTOR_CONTAINER_IMAGE, executorImage)
+ .set(CONTAINER_IMAGE, executorImage)
}
test("basic executor pod has reasonable defaults") {
@@ -107,7 +107,7 @@
checkEnv(executor,
Map("SPARK_JAVA_OPT_0" -> "foo=bar",
- "SPARK_EXECUTOR_EXTRA_CLASSPATH" -> "bar=baz",
+ ENV_CLASSPATH -> "bar=baz",
"qux" -> "quux"))
checkOwnerReferences(executor, driverPodUid)
}
diff --git a/resource-managers/kubernetes/docker/src/main/dockerfiles/driver/Dockerfile b/resource-managers/kubernetes/docker/src/main/dockerfiles/driver/Dockerfile
deleted file mode 100644
index 45fbcd9..0000000
--- a/resource-managers/kubernetes/docker/src/main/dockerfiles/driver/Dockerfile
+++ /dev/null
@@ -1,35 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements. See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-FROM spark-base
-
-# Before building the docker image, first build and make a Spark distribution following
-# the instructions in http://spark.apache.org/docs/latest/building-spark.html.
-# If this docker file is being used in the context of building your images from a Spark
-# distribution, the docker build command should be invoked from the top level directory
-# of the Spark distribution. E.g.:
-# docker build -t spark-driver:latest -f kubernetes/dockerfiles/driver/Dockerfile .
-
-COPY examples /opt/spark/examples
-
-CMD SPARK_CLASSPATH="${SPARK_HOME}/jars/*" && \
- env | grep SPARK_JAVA_OPT_ | sed 's/[^=]*=\(.*\)/\1/g' > /tmp/java_opts.txt && \
- readarray -t SPARK_DRIVER_JAVA_OPTS < /tmp/java_opts.txt && \
- if ! [ -z ${SPARK_MOUNTED_CLASSPATH+x} ]; then SPARK_CLASSPATH="$SPARK_MOUNTED_CLASSPATH:$SPARK_CLASSPATH"; fi && \
- if ! [ -z ${SPARK_SUBMIT_EXTRA_CLASSPATH+x} ]; then SPARK_CLASSPATH="$SPARK_SUBMIT_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && \
- if ! [ -z ${SPARK_MOUNTED_FILES_DIR+x} ]; then cp -R "$SPARK_MOUNTED_FILES_DIR/." .; fi && \
- ${JAVA_HOME}/bin/java "${SPARK_DRIVER_JAVA_OPTS[@]}" -cp "$SPARK_CLASSPATH" -Xms$SPARK_DRIVER_MEMORY -Xmx$SPARK_DRIVER_MEMORY -Dspark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS $SPARK_DRIVER_CLASS $SPARK_DRIVER_ARGS
diff --git a/resource-managers/kubernetes/docker/src/main/dockerfiles/executor/Dockerfile b/resource-managers/kubernetes/docker/src/main/dockerfiles/executor/Dockerfile
deleted file mode 100644
index 0f806cf..0000000
--- a/resource-managers/kubernetes/docker/src/main/dockerfiles/executor/Dockerfile
+++ /dev/null
@@ -1,35 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements. See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-FROM spark-base
-
-# Before building the docker image, first build and make a Spark distribution following
-# the instructions in http://spark.apache.org/docs/latest/building-spark.html.
-# If this docker file is being used in the context of building your images from a Spark
-# distribution, the docker build command should be invoked from the top level directory
-# of the Spark distribution. E.g.:
-# docker build -t spark-executor:latest -f kubernetes/dockerfiles/executor/Dockerfile .
-
-COPY examples /opt/spark/examples
-
-CMD SPARK_CLASSPATH="${SPARK_HOME}/jars/*" && \
- env | grep SPARK_JAVA_OPT_ | sed 's/[^=]*=\(.*\)/\1/g' > /tmp/java_opts.txt && \
- readarray -t SPARK_EXECUTOR_JAVA_OPTS < /tmp/java_opts.txt && \
- if ! [ -z ${SPARK_MOUNTED_CLASSPATH}+x} ]; then SPARK_CLASSPATH="$SPARK_MOUNTED_CLASSPATH:$SPARK_CLASSPATH"; fi && \
- if ! [ -z ${SPARK_EXECUTOR_EXTRA_CLASSPATH+x} ]; then SPARK_CLASSPATH="$SPARK_EXECUTOR_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && \
- if ! [ -z ${SPARK_MOUNTED_FILES_DIR+x} ]; then cp -R "$SPARK_MOUNTED_FILES_DIR/." .; fi && \
- ${JAVA_HOME}/bin/java "${SPARK_EXECUTOR_JAVA_OPTS[@]}" -Xms$SPARK_EXECUTOR_MEMORY -Xmx$SPARK_EXECUTOR_MEMORY -cp "$SPARK_CLASSPATH" org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url $SPARK_DRIVER_URL --executor-id $SPARK_EXECUTOR_ID --cores $SPARK_EXECUTOR_CORES --app-id $SPARK_APPLICATION_ID --hostname $SPARK_EXECUTOR_POD_IP
diff --git a/resource-managers/kubernetes/docker/src/main/dockerfiles/init-container/Dockerfile b/resource-managers/kubernetes/docker/src/main/dockerfiles/init-container/Dockerfile
deleted file mode 100644
index 047056a..0000000
--- a/resource-managers/kubernetes/docker/src/main/dockerfiles/init-container/Dockerfile
+++ /dev/null
@@ -1,24 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements. See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-FROM spark-base
-
-# If this docker file is being used in the context of building your images from a Spark distribution, the docker build
-# command should be invoked from the top level directory of the Spark distribution. E.g.:
-# docker build -t spark-init:latest -f kubernetes/dockerfiles/init-container/Dockerfile .
-
-ENTRYPOINT [ "/opt/entrypoint.sh", "/opt/spark/bin/spark-class", "org.apache.spark.deploy.k8s.SparkPodInitContainer" ]
diff --git a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/entrypoint.sh b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/entrypoint.sh
deleted file mode 100755
index 8255988..0000000
--- a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/entrypoint.sh
+++ /dev/null
@@ -1,37 +0,0 @@
-#!/bin/bash
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements. See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-# echo commands to the terminal output
-set -ex
-
-# Check whether there is a passwd entry for the container UID
-myuid=$(id -u)
-mygid=$(id -g)
-uidentry=$(getent passwd $myuid)
-
-# If there is no passwd entry for the container UID, attempt to create one
-if [ -z "$uidentry" ] ; then
- if [ -w /etc/passwd ] ; then
- echo "$myuid:x:$myuid:$mygid:anonymous uid:$SPARK_HOME:/bin/false" >> /etc/passwd
- else
- echo "Container ENTRYPOINT failed to add passwd entry for anonymous UID"
- fi
-fi
-
-# Execute the container CMD under tini for better hygiene
-/sbin/tini -s -- "$@"
diff --git a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/Dockerfile b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile
similarity index 87%
rename from resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/Dockerfile
rename to resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile
index da1d6b9..491b7cf 100644
--- a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/Dockerfile
+++ b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile
@@ -17,15 +17,15 @@
FROM openjdk:8-alpine
-ARG spark_jars
-ARG img_path
+ARG spark_jars=jars
+ARG img_path=kubernetes/dockerfiles
# Before building the docker image, first build and make a Spark distribution following
# the instructions in http://spark.apache.org/docs/latest/building-spark.html.
# If this docker file is being used in the context of building your images from a Spark
# distribution, the docker build command should be invoked from the top level directory
# of the Spark distribution. E.g.:
-# docker build -t spark-base:latest -f kubernetes/dockerfiles/spark-base/Dockerfile .
+# docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .
RUN set -ex && \
apk upgrade --no-cache && \
@@ -41,7 +41,9 @@
COPY bin /opt/spark/bin
COPY sbin /opt/spark/sbin
COPY conf /opt/spark/conf
-COPY ${img_path}/spark-base/entrypoint.sh /opt/
+COPY ${img_path}/spark/entrypoint.sh /opt/
+COPY examples /opt/spark/examples
+COPY data /opt/spark/data
ENV SPARK_HOME /opt/spark
diff --git a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh
new file mode 100755
index 0000000..0c28c75
--- /dev/null
+++ b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh
@@ -0,0 +1,97 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# echo commands to the terminal output
+set -ex
+
+# Check whether there is a passwd entry for the container UID
+myuid=$(id -u)
+mygid=$(id -g)
+uidentry=$(getent passwd $myuid)
+
+# If there is no passwd entry for the container UID, attempt to create one
+if [ -z "$uidentry" ] ; then
+ if [ -w /etc/passwd ] ; then
+ echo "$myuid:x:$myuid:$mygid:anonymous uid:$SPARK_HOME:/bin/false" >> /etc/passwd
+ else
+ echo "Container ENTRYPOINT failed to add passwd entry for anonymous UID"
+ fi
+fi
+
+SPARK_K8S_CMD="$1"
+if [ -z "$SPARK_K8S_CMD" ]; then
+ echo "No command to execute has been provided." 1>&2
+ exit 1
+fi
+shift 1
+
+SPARK_CLASSPATH="$SPARK_CLASSPATH:${SPARK_HOME}/jars/*"
+env | grep SPARK_JAVA_OPT_ | sed 's/[^=]*=\(.*\)/\1/g' > /tmp/java_opts.txt
+readarray -t SPARK_DRIVER_JAVA_OPTS < /tmp/java_opts.txt
+if [ -n "$SPARK_MOUNTED_CLASSPATH" ]; then
+ SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_MOUNTED_CLASSPATH"
+fi
+if [ -n "$SPARK_MOUNTED_FILES_DIR" ]; then
+ cp -R "$SPARK_MOUNTED_FILES_DIR/." .
+fi
+
+case "$SPARK_K8S_CMD" in
+ driver)
+ CMD=(
+ ${JAVA_HOME}/bin/java
+ "${SPARK_DRIVER_JAVA_OPTS[@]}"
+ -cp "$SPARK_CLASSPATH"
+ -Xms$SPARK_DRIVER_MEMORY
+ -Xmx$SPARK_DRIVER_MEMORY
+ -Dspark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS
+ $SPARK_DRIVER_CLASS
+ $SPARK_DRIVER_ARGS
+ )
+ ;;
+
+ executor)
+ CMD=(
+ ${JAVA_HOME}/bin/java
+ "${SPARK_EXECUTOR_JAVA_OPTS[@]}"
+ -Xms$SPARK_EXECUTOR_MEMORY
+ -Xmx$SPARK_EXECUTOR_MEMORY
+ -cp "$SPARK_CLASSPATH"
+ org.apache.spark.executor.CoarseGrainedExecutorBackend
+ --driver-url $SPARK_DRIVER_URL
+ --executor-id $SPARK_EXECUTOR_ID
+ --cores $SPARK_EXECUTOR_CORES
+ --app-id $SPARK_APPLICATION_ID
+ --hostname $SPARK_EXECUTOR_POD_IP
+ )
+ ;;
+
+ init)
+ CMD=(
+ "$SPARK_HOME/bin/spark-class"
+ "org.apache.spark.deploy.k8s.SparkPodInitContainer"
+ "$@"
+ )
+ ;;
+
+ *)
+ echo "Unknown command: $SPARK_K8S_CMD" 1>&2
+ exit 1
+esac
+
+# Execute the container CMD under tini for better hygiene
+exec /sbin/tini -s -- "${CMD[@]}"