blob: 62e214af7b0f14e93c44b5920c75ebfdf19a1129 [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="description" content="Apache Ozone Documentation">
<title>Documentation for Apache Ozone</title>
<link href="../css/bootstrap.min.css" rel="stylesheet">
<link href="../css/ozonedoc.css" rel="stylesheet">
</head>
<body>
<nav class="navbar navbar-inverse navbar-fixed-top">
<div class="container-fluid">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#sidebar" aria-expanded="false" aria-controls="navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a href="../index.html" class="navbar-left ozone-logo">
<img src="../ozone-logo-small.png"/>
</a>
<a class="navbar-brand hidden-xs" href="../index.html">
Apache Ozone/HDDS documentation
</a>
<a class="navbar-brand visible-xs-inline" href="#">Apache Ozone</a>
</div>
<div id="navbar" class="navbar-collapse collapse">
<ul class="nav navbar-nav navbar-right">
<li><a href="https://github.com/apache/hadoop-ozone">Source</a></li>
<li><a href="https://hadoop.apache.org">Apache Hadoop</a></li>
<li><a href="https://apache.org">ASF</a></li>
</ul>
</div>
</div>
</nav>
<div class="wrapper">
<div class="container-fluid">
<div class="row">
<div class="col-sm-2 col-md-2 sidebar" id="sidebar">
<ul class="nav nav-sidebar">
<li class="">
<a href="../index.html">
<span>Overview</span>
</a>
</li>
<li class="">
<a href="../start.html">
<span>Getting Started</span>
</a>
</li>
<li class="">
<a href="../concept.html">
<span>Architecture</span>
</a>
<ul class="nav">
<li class="">
<a href="../concept/overview.html">Overview</a>
</li>
<li class="">
<a href="../concept/ozonemanager.html">Ozone Manager</a>
</li>
<li class="">
<a href="../concept/storagecontainermanager.html">Storage Container Manager</a>
</li>
<li class="">
<a href="../concept/containers.html">Containers</a>
</li>
<li class="">
<a href="../concept/datanodes.html">Datanodes</a>
</li>
<li class="">
<a href="../concept/recon.html">Recon</a>
</li>
</ul>
</li>
<li class="">
<a href="../feature.html">
<span>Features</span>
</a>
<ul class="nav">
<li class="">
<a href="../feature/ha.html">High Availability</a>
</li>
<li class="">
<a href="../feature/topology.html">Topology awareness</a>
</li>
<li class="">
<a href="../feature/quota.html">Quota in Ozone</a>
</li>
<li class="">
<a href="../feature/recon.html">Recon Server</a>
</li>
<li class="">
<a href="../feature/observability.html">Observability</a>
</li>
</ul>
</li>
<li class="">
<a href="../interface.html">
<span>Client Interfaces</span>
</a>
<ul class="nav">
<li class="">
<a href="../interface/ofs.html">Ofs (Hadoop compatible)</a>
</li>
<li class="">
<a href="../interface/o3fs.html">O3fs (Hadoop compatible)</a>
</li>
<li class="">
<a href="../interface/s3.html">S3 Protocol</a>
</li>
<li class="">
<a href="../interface/cli.html">Command Line Interface</a>
</li>
<li class="">
<a href="../interface/reconapi.html">Recon API</a>
</li>
<li class="">
<a href="../interface/javaapi.html">Java API</a>
</li>
<li class="">
<a href="../interface/csi.html">CSI Protocol</a>
</li>
</ul>
</li>
<li class="">
<a href="../security.html">
<span>Security</span>
</a>
<ul class="nav">
<li class="">
<a href="../security/secureozone.html">Securing Ozone</a>
</li>
<li class="">
<a href="../security/securingtde.html">Transparent Data Encryption</a>
</li>
<li class="">
<a href="../security/gdpr.html">GDPR in Ozone</a>
</li>
<li class="">
<a href="../security/securingdatanodes.html">Securing Datanodes</a>
</li>
<li class="">
<a href="../security/securingozonehttp.html">Securing HTTP</a>
</li>
<li class="">
<a href="../security/securings3.html">Securing S3</a>
</li>
<li class="">
<a href="../security/securityacls.html">Ozone ACLs</a>
</li>
<li class="">
<a href="../security/securitywithranger.html">Apache Ranger</a>
</li>
</ul>
</li>
<li class="">
<a href="../tools.html">
<span>Tools</span>
</a>
</li>
<li class="">
<a href="../recipe.html">
<span>Recipes</span>
</a>
</li>
<li><a href="../design.html"><span><b>Design docs</b></span></a></li>
<li class="visible-xs"><a href="#">References</a>
<ul class="nav">
<li><a href="https://github.com/apache/hadoop"><span class="glyphicon glyphicon-new-window" aria-hidden="true"></span> Source</a></li>
<li><a href="https://hadoop.apache.org"><span class="glyphicon glyphicon-new-window" aria-hidden="true"></span> Apache Hadoop</a></li>
<li><a href="https://apache.org"><span class="glyphicon glyphicon-new-window" aria-hidden="true"></span> ASF</a></li>
</ul></li>
</ul>
</div>
<div class="col-sm-10 col-sm-offset-2 col-md-10 col-md-offset-2 main">
<div class="col-md-9">
<nav aria-label="breadcrumb">
<ol class="breadcrumb">
<li class="breadcrumb-item"><a href="../index.html">Home</a></li>
<li class="breadcrumb-item" aria-current="page"><a href="../recipe.html">Recipes</a></li>
<li class="breadcrumb-item active" aria-current="page">Spark in Kubernetes with OzoneFS</li>
</ol>
</nav>
<div class="pull-right">
<a href="../zh/recipe/sparkozonefsk8s.html"><span class="label label-success">中文</span></a>
</div>
<div class="col-md-9">
<h1>Spark in Kubernetes with OzoneFS</h1>
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<p>This recipe shows how Ozone object store can be used from Spark using:</p>
<ul>
<li>OzoneFS (Hadoop compatible file system)</li>
<li>Hadoop 2.7 (included in the Spark distribution)</li>
<li>Kubernetes Spark scheduler</li>
<li>Local spark client</li>
</ul>
<h2 id="requirements">Requirements</h2>
<p>Download latest Spark and Ozone distribution and extract them. This method is
tested with the <code>spark-2.4.6-bin-hadoop2.7</code> distribution.</p>
<p>You also need the following:</p>
<ul>
<li>A container repository to push and pull the spark+ozone images. (In this recipe we will use the dockerhub)</li>
<li>A repo/name for the custom containers (in this recipe <em>myrepo/ozone-spark</em>)</li>
<li>A dedicated namespace in kubernetes (we use <em>yournamespace</em> in this recipe)</li>
</ul>
<h2 id="create-the-docker-image-for-drivers">Create the docker image for drivers</h2>
<h3 id="create-the-base-spark-driverexecutor-image">Create the base Spark driver/executor image</h3>
<p>First of all create a docker image with the Spark image creator.
Execute the following from the Spark distribution</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">./bin/docker-image-tool.sh -r myrepo -t 2.4.6 build
</code></pre></div><p><em>Note</em>: if you use Minikube add the <code>-m</code> flag to use the docker daemon of the Minikube image:</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">./bin/docker-image-tool.sh -m -r myrepo -t 2.4.6 build
</code></pre></div><p><code>./bin/docker-image-tool.sh</code> is an official Spark tool to create container images and this step will create multiple Spark container images with the name <em>myrepo/spark</em>. The first container will be used as a base container in the following steps.</p>
<h3 id="customize-the-docker-image">Customize the docker image</h3>
<p>Create a new directory for customizing the created docker image.</p>
<p>Copy the <code>ozone-site.xml</code> from the cluster:</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">kubectl cp om-0:/opt/hadoop/etc/hadoop/ozone-site.xml .
</code></pre></div><p>And create a custom <code>core-site.xml</code>.</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-xml" data-lang="xml"><span style="color:#f92672">&lt;configuration&gt;</span>
<span style="color:#f92672">&lt;property&gt;</span>
<span style="color:#f92672">&lt;name&gt;</span>fs.AbstractFileSystem.o3fs.impl<span style="color:#f92672">&lt;/name&gt;</span>
<span style="color:#f92672">&lt;value&gt;</span>org.apache.hadoop.fs.ozone.OzFs<span style="color:#f92672">&lt;/value&gt;</span>
<span style="color:#f92672">&lt;/property&gt;</span>
<span style="color:#f92672">&lt;/configuration&gt;</span>
</code></pre></div><p>Copy the <code>ozonefs.jar</code> file from an ozone distribution (<strong>use the hadoop2 version!</strong>)</p>
<pre><code>kubectl cp om-0:/opt/hadoop/share/ozone/lib/hadoop-ozone-filesystem-hadoop2-VERSION.jar hadoop-ozone-filesystem-hadoop2.jar
</code></pre><p>Create a new Dockerfile and build the image:</p>
<pre><code>FROM myrepo/spark:2.4.6
ADD core-site.xml /opt/hadoop/conf/core-site.xml
ADD ozone-site.xml /opt/hadoop/conf/ozone-site.xml
ENV HADOOP_CONF_DIR=/opt/hadoop/conf
ENV SPARK_EXTRA_CLASSPATH=/opt/hadoop/conf
ADD hadoop-ozone-filesystem-hadoop2.jar /opt/hadoop-ozone-filesystem-hadoop2.jar
</code></pre><div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">docker build -t myrepo/spark-ozone
</code></pre></div><p>For remote Kubernetes cluster you may need to push it:</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">docker push myrepo/spark-ozone
</code></pre></div><h2 id="create-a-bucket">Create a bucket</h2>
<p>Download any text file and put it to the <code>/tmp/alice.txt</code> first.</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">kubectl port-forward s3g-0 9878:9878
aws s3api --endpoint http://localhost:9878 create-bucket --bucket<span style="color:#f92672">=</span>test
aws s3api --endpoint http://localhost:9878 put-object --bucket test --key alice.txt --body /tmp/alice.txt
</code></pre></div><h2 id="create-service-account-to-use">Create service account to use</h2>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">kubectl create serviceaccount spark -n yournamespace
kubectl create clusterrolebinding spark-role --clusterrole<span style="color:#f92672">=</span>edit --serviceaccount<span style="color:#f92672">=</span>yournamespace:spark --namespace<span style="color:#f92672">=</span>yournamespace
</code></pre></div><h2 id="execute-the-job">Execute the job</h2>
<p>Execute the following spark-submit command, but change at least the following values:</p>
<ul>
<li>the Kubernetes master url (you can check your <em>~/.kube/config</em> to find the actual value)</li>
<li>the Kubernetes namespace (<em>yournamespace</em> in this example)</li>
<li>serviceAccountName (you can use the <em>spark</em> value if you followed the previous steps)</li>
<li>container.image (in this example this is <em>myrepo/spark-ozone</em>. This is pushed to the registry in the previous steps)</li>
</ul>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">bin/spark-submit <span style="color:#ae81ff">\
</span><span style="color:#ae81ff"></span> --master k8s://https://kubernetes:6443 <span style="color:#ae81ff">\
</span><span style="color:#ae81ff"></span> --deploy-mode cluster <span style="color:#ae81ff">\
</span><span style="color:#ae81ff"></span> --name spark-word-count <span style="color:#ae81ff">\
</span><span style="color:#ae81ff"></span> --class org.apache.spark.examples.JavaWordCount <span style="color:#ae81ff">\
</span><span style="color:#ae81ff"></span> --conf spark.executor.instances<span style="color:#f92672">=</span><span style="color:#ae81ff">1</span> <span style="color:#ae81ff">\
</span><span style="color:#ae81ff"></span> --conf spark.kubernetes.namespace<span style="color:#f92672">=</span>yournamespace <span style="color:#ae81ff">\
</span><span style="color:#ae81ff"></span> --conf spark.kubernetes.authenticate.driver.serviceAccountName<span style="color:#f92672">=</span>spark <span style="color:#ae81ff">\
</span><span style="color:#ae81ff"></span> --conf spark.kubernetes.container.image<span style="color:#f92672">=</span>myrepo/spark-ozone <span style="color:#ae81ff">\
</span><span style="color:#ae81ff"></span> --conf spark.kubernetes.container.image.pullPolicy<span style="color:#f92672">=</span>Always <span style="color:#ae81ff">\
</span><span style="color:#ae81ff"></span> --jars /opt/hadoop-ozone-filesystem-hadoop2.jar <span style="color:#ae81ff">\
</span><span style="color:#ae81ff"></span> local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar <span style="color:#ae81ff">\
</span><span style="color:#ae81ff"></span> o3fs://test.s3v.ozone-om-0.ozone-om:9862/alice.txt
</code></pre></div><p>Check the available <code>spark-word-count-...</code> pods with <code>kubectl get pod</code></p>
<p>Check the output of the calculation with<br>
<code>kubectl logs spark-word-count-1549973913699-driver</code></p>
<p>You should see the output of the wordcount job. For example:</p>
<pre><code>...
name: 8
William: 3
this,': 1
SOUP!': 1
`Silence: 1
`Mine: 1
ordered.: 1
considering: 3
muttering: 3
candle: 2
...
</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="push"></div>
</div>
<footer class="footer">
<div class="container">
<span class="small text-muted">
Version: 1.1.0, Last Modified: July 27, 2020 <a class="hide-child link primary-color" href="https://github.com/apache/ozone/commit/182c344ea3e1240d5a5b2bd6a05fb3afc24eab9a">182c344ea</a>
</span>
</div>
</footer>
<script src="../js/jquery-3.5.1.min.js"></script>
<script src="../js/ozonedoc.js"></script>
<script src="../js/bootstrap.min.js"></script>
</body>
</html>