| |
| |
| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| <meta charset="utf-8"> |
| <meta http-equiv="X-UA-Compatible" content="IE=edge"> |
| <meta name="viewport" content="width=device-width, initial-scale=1"> |
| |
| <meta name="description" content="Hadoop Ozone Documentation"> |
| |
| <title>Documentation for Apache Hadoop Ozone</title> |
| |
| |
| <link href="../css/bootstrap.min.css" rel="stylesheet"> |
| |
| |
| <link href="../css/ozonedoc.css" rel="stylesheet"> |
| |
| </head> |
| |
| |
| <body> |
| |
| |
| <nav class="navbar navbar-inverse navbar-fixed-top"> |
| <div class="container-fluid"> |
| <div class="navbar-header"> |
| <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#sidebar" aria-expanded="false" aria-controls="navbar"> |
| <span class="sr-only">Toggle navigation</span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| <a href="#" class="navbar-left" style="height: 50px; padding: 5px 5px 5px 0;"> |
| <img src="../ozone-logo-small.png" width="40"/> |
| </a> |
| <a class="navbar-brand hidden-xs" href="#"> |
| Apache Hadoop Ozone/HDDS documentation |
| </a> |
| <a class="navbar-brand visible-xs-inline" href="#">Hadoop Ozone</a> |
| </div> |
| <div id="navbar" class="navbar-collapse collapse"> |
| <ul class="nav navbar-nav navbar-right"> |
| <li><a href="https://github.com/apache/hadoop-ozone">Source</a></li> |
| <li><a href="https://hadoop.apache.org">Apache Hadoop</a></li> |
| <li><a href="https://apache.org">ASF</a></li> |
| </ul> |
| </div> |
| </div> |
| </nav> |
| |
| |
| <div class="container-fluid"> |
| <div class="row"> |
| |
| <div class="col-sm-2 col-md-2 sidebar" id="sidebar"> |
| <ul class="nav nav-sidebar"> |
| |
| |
| |
| <li class=""> |
| |
| <a href="../index.html"> |
| |
| |
| |
| <span>Overview</span> |
| </a> |
| </li> |
| |
| |
| |
| <li class=""> |
| |
| <a href="../start.html"> |
| |
| |
| |
| <span>Getting Started</span> |
| </a> |
| </li> |
| |
| |
| |
| <li class=""> |
| <a href="../concept.html"> |
| |
| <span>Architecture</span> |
| </a> |
| <ul class="nav"> |
| |
| <li class=""> |
| |
| <a href="../concept/overview.html">Overview</a> |
| |
| </li> |
| |
| <li class=""> |
| |
| <a href="../concept/ozonemanager.html">Ozone Manager</a> |
| |
| </li> |
| |
| <li class=""> |
| |
| <a href="../concept/storagecontainermanager.html">Storage Container Manager</a> |
| |
| </li> |
| |
| <li class=""> |
| |
| <a href="../concept/containers.html">Containers</a> |
| |
| </li> |
| |
| <li class=""> |
| |
| <a href="../concept/datanodes.html">Datanodes</a> |
| |
| </li> |
| |
| </ul> |
| </li> |
| |
| |
| |
| <li class=""> |
| <a href="../feature.html"> |
| |
| <span>Features</span> |
| </a> |
| <ul class="nav"> |
| |
| <li class=""> |
| |
| <a href="../feature/ha.html">High Availability</a> |
| |
| </li> |
| |
| <li class=""> |
| |
| <a href="../feature/topology.html">Topology awareness</a> |
| |
| </li> |
| |
| <li class=""> |
| |
| <a href="../feature/gdpr.html">GDPR in Ozone</a> |
| |
| </li> |
| |
| <li class=""> |
| |
| <a href="../feature/recon.html">Recon</a> |
| |
| </li> |
| |
| <li class=""> |
| |
| <a href="../feature/observability.html">Observability</a> |
| |
| </li> |
| |
| </ul> |
| </li> |
| |
| |
| |
| <li class=""> |
| <a href="../interface.html"> |
| |
| <span>Client Interfaces</span> |
| </a> |
| <ul class="nav"> |
| |
| <li class=""> |
| |
| <a href="../interface/ofs.html">Ofs (Hadoop compatible)</a> |
| |
| </li> |
| |
| <li class=""> |
| |
| <a href="../interface/o3fs.html">O3fs (Hadoop compatible)</a> |
| |
| </li> |
| |
| <li class=""> |
| |
| <a href="../interface/s3.html">S3 Protocol</a> |
| |
| </li> |
| |
| <li class=""> |
| |
| <a href="../interface/cli.html">Command Line Interface</a> |
| |
| </li> |
| |
| <li class=""> |
| |
| <a href="../interface/javaapi.html">Java API</a> |
| |
| </li> |
| |
| <li class=""> |
| |
| <a href="../interface/csi.html">CSI Protocol</a> |
| |
| </li> |
| |
| </ul> |
| </li> |
| |
| |
| |
| <li class=""> |
| <a href="../security.html"> |
| |
| <span>Security</span> |
| </a> |
| <ul class="nav"> |
| |
| <li class=""> |
| |
| <a href="../security/secureozone.html">Securing Ozone</a> |
| |
| </li> |
| |
| <li class=""> |
| |
| <a href="../security/securingtde.html">Transparent Data Encryption</a> |
| |
| </li> |
| |
| <li class=""> |
| |
| <a href="../security/securingdatanodes.html">Securing Datanodes</a> |
| |
| </li> |
| |
| <li class=""> |
| |
| <a href="../security/securingozonehttp.html">Securing HTTP</a> |
| |
| </li> |
| |
| <li class=""> |
| |
| <a href="../security/securings3.html">Securing S3</a> |
| |
| </li> |
| |
| <li class=""> |
| |
| <a href="../security/securityacls.html">Ozone ACLs</a> |
| |
| </li> |
| |
| <li class=""> |
| |
| <a href="../security/securitywithranger.html">Apache Ranger</a> |
| |
| </li> |
| |
| </ul> |
| </li> |
| |
| |
| |
| <li class=""> |
| |
| <a href="../tools.html"> |
| |
| |
| |
| <span>Tools</span> |
| </a> |
| </li> |
| |
| |
| |
| <li class=""> |
| |
| <a href="../recipe.html"> |
| |
| |
| |
| <span>Recipes</span> |
| </a> |
| </li> |
| |
| |
| <li><a href="../design.html"><span><b>Design docs</b></span></a></li> |
| <li class="visible-xs"><a href="#">References</a> |
| <ul class="nav"> |
| <li><a href="https://github.com/apache/hadoop"><span class="glyphicon glyphicon-new-window" aria-hidden="true"></span> Source</a></li> |
| <li><a href="https://hadoop.apache.org"><span class="glyphicon glyphicon-new-window" aria-hidden="true"></span> Apache Hadoop</a></li> |
| <li><a href="https://apache.org"><span class="glyphicon glyphicon-new-window" aria-hidden="true"></span> ASF</a></li> |
| </ul></li> |
| </ul> |
| |
| </div> |
| |
| <div class="col-sm-10 col-sm-offset-2 col-md-10 col-md-offset-2 main"> |
| |
| |
| |
| <div class="col-md-9"> |
| <nav aria-label="breadcrumb"> |
| <ol class="breadcrumb"> |
| <li class="breadcrumb-item"><a href="../">Home</a></li> |
| <li class="breadcrumb-item" aria-current="page"><a href="../recipe.html">Recipes</a></li> |
| <li class="breadcrumb-item active" aria-current="page">Spark in Kubernetes with OzoneFS</li> |
| </ol> |
| </nav> |
| |
| |
| |
| <div class="pull-right"> |
| |
| |
| |
| |
| <a href="../zh/recipe/sparkozonefsk8s.html"><span class="label label-success">ä¸æ–‡</span></a> |
| |
| |
| </div> |
| |
| |
| <div class="col-md-9"> |
| <h1>Spark in Kubernetes with OzoneFS</h1> |
| |
| <!--- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| <p>This recipe shows how Ozone object store can be used from Spark using:</p> |
| <ul> |
| <li>OzoneFS (Hadoop compatible file system)</li> |
| <li>Hadoop 2.7 (included in the Spark distribution)</li> |
| <li>Kubernetes Spark scheduler</li> |
| <li>Local spark client</li> |
| </ul> |
| <h2 id="requirements">Requirements</h2> |
| <p>Download latest Spark and Ozone distribution and extract them. This method is |
| tested with the <code>spark-2.4.6-bin-hadoop2.7</code> distribution.</p> |
| <p>You also need the following:</p> |
| <ul> |
| <li>A container repository to push and pull the spark+ozone images. (In this recipe we will use the dockerhub)</li> |
| <li>A repo/name for the custom containers (in this recipe <em>myrepo/ozone-spark</em>)</li> |
| <li>A dedicated namespace in kubernetes (we use <em>yournamespace</em> in this recipe)</li> |
| </ul> |
| <h2 id="create-the-docker-image-for-drivers">Create the docker image for drivers</h2> |
| <h3 id="create-the-base-spark-driverexecutor-image">Create the base Spark driver/executor image</h3> |
| <p>First of all create a docker image with the Spark image creator. |
| Execute the following from the Spark distribution</p> |
| <div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">./bin/docker-image-tool.sh -r myrepo -t 2.4.6 build |
| </code></pre></div><p><em>Note</em>: if you use Minikube add the <code>-m</code> flag to use the docker daemon of the Minikube image:</p> |
| <div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">./bin/docker-image-tool.sh -m -r myrepo -t 2.4.6 build |
| </code></pre></div><p><code>./bin/docker-image-tool.sh</code> is an official Spark tool to create container images and this step will create multiple Spark container images with the name <em>myrepo/spark</em>. The first container will be used as a base container in the following steps.</p> |
| <h3 id="customize-the-docker-image">Customize the docker image</h3> |
| <p>Create a new directory for customizing the created docker image.</p> |
| <p>Copy the <code>ozone-site.xml</code> from the cluster:</p> |
| <div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">kubectl cp om-0:/opt/hadoop/etc/hadoop/ozone-site.xml . |
| </code></pre></div><p>And create a custom <code>core-site.xml</code>.</p> |
| <div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-xml" data-lang="xml"><span style="color:#f92672"><configuration></span> |
| <span style="color:#f92672"><property></span> |
| <span style="color:#f92672"><name></span>fs.AbstractFileSystem.o3fs.impl<span style="color:#f92672"></name></span> |
| <span style="color:#f92672"><value></span>org.apache.hadoop.fs.ozone.OzFs<span style="color:#f92672"></value></span> |
| <span style="color:#f92672"></property></span> |
| <span style="color:#f92672"></configuration></span> |
| </code></pre></div><p>Copy the <code>ozonefs.jar</code> file from an ozone distribution (<strong>use the hadoop2 version!</strong>)</p> |
| <pre><code>kubectl cp om-0:/opt/hadoop/share/ozone/lib/hadoop-ozone-filesystem-hadoop2-VERSION.jar hadoop-ozone-filesystem-hadoop2.jar |
| </code></pre><p>Create a new Dockerfile and build the image:</p> |
| <pre><code>FROM myrepo/spark:2.4.6 |
| ADD core-site.xml /opt/hadoop/conf/core-site.xml |
| ADD ozone-site.xml /opt/hadoop/conf/ozone-site.xml |
| ENV HADOOP_CONF_DIR=/opt/hadoop/conf |
| ENV SPARK_EXTRA_CLASSPATH=/opt/hadoop/conf |
| ADD hadoop-ozone-filesystem-hadoop2.jar /opt/hadoop-ozone-filesystem-hadoop2.jar |
| </code></pre><div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">docker build -t myrepo/spark-ozone |
| </code></pre></div><p>For remote Kubernetes cluster you may need to push it:</p> |
| <div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">docker push myrepo/spark-ozone |
| </code></pre></div><h2 id="create-a-bucket">Create a bucket</h2> |
| <p>Download any text file and put it to the <code>/tmp/alice.txt</code> first.</p> |
| <div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">kubectl port-forward s3g-0 9878:9878 |
| aws s3api --endpoint http://localhost:9878 create-bucket --bucket<span style="color:#f92672">=</span>test |
| aws s3api --endpoint http://localhost:9878 put-object --bucket test --key alice.txt --body /tmp/alice.txt |
| </code></pre></div><h2 id="create-service-account-to-use">Create service account to use</h2> |
| <div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">kubectl create serviceaccount spark -n yournamespace |
| kubectl create clusterrolebinding spark-role --clusterrole<span style="color:#f92672">=</span>edit --serviceaccount<span style="color:#f92672">=</span>yournamespace:spark --namespace<span style="color:#f92672">=</span>yournamespace |
| </code></pre></div><h2 id="execute-the-job">Execute the job</h2> |
| <p>Execute the following spark-submit command, but change at least the following values:</p> |
| <ul> |
| <li>the Kubernetes master url (you can check your <em>~/.kube/config</em> to find the actual value)</li> |
| <li>the Kubernetes namespace (<em>yournamespace</em> in this example)</li> |
| <li>serviceAccountName (you can use the <em>spark</em> value if you followed the previous steps)</li> |
| <li>container.image (in this example this is <em>myrepo/spark-ozone</em>. This is pushed to the registry in the previous steps)</li> |
| </ul> |
| <div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">bin/spark-submit <span style="color:#ae81ff">\ |
| </span><span style="color:#ae81ff"></span> --master k8s://https://kubernetes:6443 <span style="color:#ae81ff">\ |
| </span><span style="color:#ae81ff"></span> --deploy-mode cluster <span style="color:#ae81ff">\ |
| </span><span style="color:#ae81ff"></span> --name spark-word-count <span style="color:#ae81ff">\ |
| </span><span style="color:#ae81ff"></span> --class org.apache.spark.examples.JavaWordCount <span style="color:#ae81ff">\ |
| </span><span style="color:#ae81ff"></span> --conf spark.executor.instances<span style="color:#f92672">=</span><span style="color:#ae81ff">1</span> <span style="color:#ae81ff">\ |
| </span><span style="color:#ae81ff"></span> --conf spark.kubernetes.namespace<span style="color:#f92672">=</span>yournamespace <span style="color:#ae81ff">\ |
| </span><span style="color:#ae81ff"></span> --conf spark.kubernetes.authenticate.driver.serviceAccountName<span style="color:#f92672">=</span>spark <span style="color:#ae81ff">\ |
| </span><span style="color:#ae81ff"></span> --conf spark.kubernetes.container.image<span style="color:#f92672">=</span>myrepo/spark-ozone <span style="color:#ae81ff">\ |
| </span><span style="color:#ae81ff"></span> --conf spark.kubernetes.container.image.pullPolicy<span style="color:#f92672">=</span>Always <span style="color:#ae81ff">\ |
| </span><span style="color:#ae81ff"></span> --jars /opt/hadoop-ozone-filesystem-hadoop2.jar <span style="color:#ae81ff">\ |
| </span><span style="color:#ae81ff"></span> local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar <span style="color:#ae81ff">\ |
| </span><span style="color:#ae81ff"></span> o3fs://test.s3v.ozone-om-0.ozone-om:9862/alice.txt |
| </code></pre></div><p>Check the available <code>spark-word-count-...</code> pods with <code>kubectl get pod</code></p> |
| <p>Check the output of the calculation with<br> |
| <code>kubectl logs spark-word-count-1549973913699-driver</code></p> |
| <p>You should see the output of the wordcount job. For example:</p> |
| <pre><code>... |
| name: 8 |
| William: 3 |
| this,': 1 |
| SOUP!': 1 |
| `Silence: 1 |
| `Mine: 1 |
| ordered.: 1 |
| considering: 3 |
| muttering: 3 |
| candle: 2 |
| ... |
| </code></pre> |
| |
| |
| </div> |
| |
| </div> |
| </div> |
| </div> |
| </div> |
| |
| |
| |
| |
| <script src="../js/jquery-3.5.1.min.js"></script> |
| <script src="../js/ozonedoc.js"></script> |
| <script src="../js/bootstrap.min.js"></script> |
| |
| |
| </body> |
| |
| </html> |