| |
| |
| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| <meta charset="utf-8"> |
| <meta http-equiv="X-UA-Compatible" content="IE=edge"> |
| <meta name="viewport" content="width=device-width, initial-scale=1"> |
| |
| <meta name="description" content="Hadoop Ozone Documentation"> |
| |
| <title>Documentation for Apache Hadoop Ozone</title> |
| |
| |
| <link href="../../css/bootstrap.min.css" rel="stylesheet"> |
| |
| |
| <link href="../../css/ozonedoc.css" rel="stylesheet"> |
| |
| </head> |
| |
| |
| <body> |
| |
| |
| <nav class="navbar navbar-inverse navbar-fixed-top"> |
| <div class="container-fluid"> |
| <div class="navbar-header"> |
| <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#sidebar" aria-expanded="false" aria-controls="navbar"> |
| <span class="sr-only">Toggle navigation</span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| <a href="#" class="navbar-left" style="height: 50px; padding: 5px 5px 5px 0;"> |
| <img src="../../ozone-logo-small.png" width="40"/> |
| </a> |
| <a class="navbar-brand hidden-xs" href="#"> |
| Apache Hadoop Ozone/HDDS documentation |
| </a> |
| <a class="navbar-brand visible-xs-inline" href="#">Hadoop Ozone</a> |
| </div> |
| <div id="navbar" class="navbar-collapse collapse"> |
| <ul class="nav navbar-nav navbar-right"> |
| <li><a href="https://github.com/apache/hadoop-ozone">Source</a></li> |
| <li><a href="https://hadoop.apache.org">Apache Hadoop</a></li> |
| <li><a href="https://apache.org">ASF</a></li> |
| </ul> |
| </div> |
| </div> |
| </nav> |
| |
| |
| <div class="container-fluid"> |
| <div class="row"> |
| |
| <div class="col-sm-2 col-md-2 sidebar" id="sidebar"> |
| <ul class="nav nav-sidebar"> |
| |
| |
| |
| <li class=""> |
| |
| <a href="../../zh/"> |
| |
| |
| |
| <span>概述</span> |
| </a> |
| </li> |
| |
| |
| |
| <li class=""> |
| |
| <a href="../../zh/start.html"> |
| |
| |
| |
| <span>快速入门</span> |
| </a> |
| </li> |
| |
| |
| |
| <li class=""> |
| |
| <a href="../../zh/interface.html"> |
| |
| |
| |
| <span>编程接口</span> |
| </a> |
| </li> |
| |
| |
| |
| <li class=""> |
| |
| <a href="../../zh/feature.html"> |
| |
| |
| |
| <span>GDPR</span> |
| </a> |
| </li> |
| |
| |
| |
| <li class=""> |
| |
| <a href="../../zh/security.html"> |
| |
| |
| |
| <span>安全</span> |
| </a> |
| </li> |
| |
| |
| |
| <li class=""> |
| |
| <a href="../../zh/concept.html"> |
| |
| |
| |
| <span>概念</span> |
| </a> |
| </li> |
| |
| |
| |
| <li class=""> |
| |
| <a href="../../zh/tools.html"> |
| |
| |
| |
| <span>工具</span> |
| </a> |
| </li> |
| |
| |
| |
| <li class=""> |
| |
| <a href="../../zh/recipe.html"> |
| |
| |
| |
| <span>使用配方</span> |
| </a> |
| </li> |
| |
| |
| <li><a href="../../design.html"><span><b>Design docs</b></span></a></li> |
| <li class="visible-xs"><a href="#">References</a> |
| <ul class="nav"> |
| <li><a href="https://github.com/apache/hadoop"><span class="glyphicon glyphicon-new-window" aria-hidden="true"></span> Source</a></li> |
| <li><a href="https://hadoop.apache.org"><span class="glyphicon glyphicon-new-window" aria-hidden="true"></span> Apache Hadoop</a></li> |
| <li><a href="https://apache.org"><span class="glyphicon glyphicon-new-window" aria-hidden="true"></span> ASF</a></li> |
| </ul></li> |
| </ul> |
| |
| </div> |
| |
| <div class="col-sm-10 col-sm-offset-2 col-md-10 col-md-offset-2 main"> |
| |
| |
| |
| <div class="col-md-9"> |
| <nav aria-label="breadcrumb"> |
| <ol class="breadcrumb"> |
| <li class="breadcrumb-item"><a href="../../">Home</a></li> |
| <li class="breadcrumb-item" aria-current="page"><a href="../../zh/recipe.html">使用配方</a></li> |
| <li class="breadcrumb-item active" aria-current="page">Kubernetes 上运行 Spark 和 OzoneFS</li> |
| </ol> |
| </nav> |
| |
| |
| |
| <div class="pull-right"> |
| |
| |
| <a href="../../recipe/sparkozonefsk8s.html"><span class="label label-success">English</span></a> |
| |
| |
| |
| |
| </div> |
| |
| |
| <div class="col-md-9"> |
| <h1>Kubernetes 上运行 Spark 和 OzoneFS</h1> |
| |
| <!--- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| <p>本页介绍如何通过以下组件在 Spark 中使用 Ozone 对象存储:</p> |
| <ul> |
| <li>OzoneFS (兼容 Hadoop 的文件系统)</li> |
| <li>Hadoop 2.7 (包含在 Spark 发行包中)</li> |
| <li>Kubernetes 的 Spark 调度器</li> |
| <li>本地 Spark 客户端</li> |
| </ul> |
| <h2 id="准备">准备</h2> |
| <p>下载 Spark 和 Ozone 的最新发行包并解压,本方法使用 <code>spark-2.4.6-bin-hadoop2.7</code> 进行了测试。</p> |
| <p>你还需要准备以下内容:</p> |
| <ul> |
| <li>用来上传下载 spark+ozone 镜像的仓库(本文档中使用 Docker Hub)</li> |
| <li>自定义镜像的名称,形如 repo/name(本文档中使用 <em>myrepo/ozone-spark</em>)</li> |
| <li>专门的 Kubernetes 命名空间(本文档中使用 <em>yournamespace</em>)</li> |
| </ul> |
| <h2 id="为-driver-创建-docker-镜像">为 driver 创建 docker 镜像</h2> |
| <h3 id="创建-spark-driverexecutor-基础镜像">创建 Spark driver/executor 基础镜像</h3> |
| <p>首先使用 Spark 的镜像创建工具创建一个镜像。 |
| 在 Spark 发行包中运行以下命令:</p> |
| <div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">./bin/docker-image-tool.sh -r myrepo -t 2.4.6 build |
| </code></pre></div><p><em>注意</em>: 如果你使用 Minikube,需要加上 <code>-m</code> 参数来使用 Minikube 镜像的 docker 进程。</p> |
| <div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">./bin/docker-image-tool.sh -m -r myrepo -t 2.4.6 build |
| </code></pre></div><p><code>./bin/docker-image-tool.sh</code> 是 Spark 用来创建镜像的官方工具,上面的步骤会创建多个名为 <em>myrepo/spark</em> 的 Spark 镜像,其中的第一个镜像用作接下来步骤的基础镜像。</p> |
| <h3 id="定制镜像">定制镜像</h3> |
| <p>创建一个用于定制镜像的目录。</p> |
| <p>从集群中拷贝 <code>ozone-site.xml</code>:</p> |
| <div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">kubectl cp om-0:/opt/hadoop/etc/hadoop/ozone-site.xml . |
| </code></pre></div><p>从 Ozone 目录中拷贝 <code>ozonefs.jar</code>(<strong>使用 hadoop2 版本!</strong>)</p> |
| <div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-xml" data-lang="xml"><span style="color:#f92672"><configuration></span> |
| <span style="color:#f92672"><property></span> |
| <span style="color:#f92672"><name></span>fs.AbstractFileSystem.o3fs.impl<span style="color:#f92672"></name></span> |
| <span style="color:#f92672"><value></span>org.apache.hadoop.fs.ozone.OzFs<span style="color:#f92672"></value></span> |
| <span style="color:#f92672"></property></span> |
| <span style="color:#f92672"></configuration></span> |
| </code></pre></div><p>kubectl cp om-0:/opt/hadoop/share/ozone/lib/hadoop-ozone-filesystem-hadoop2-VERSION.jar hadoop-ozone-filesystem-hadoop2.jar</p> |
| <pre><code> |
| |
| 编写新的 Dockerfile 并构建镜像: |
| </code></pre><p>FROM myrepo/spark:2.4.6 |
| ADD core-site.xml /opt/hadoop/conf/core-site.xml |
| ADD ozone-site.xml /opt/hadoop/conf/ozone-site.xml |
| ENV HADOOP_CONF_DIR=/opt/hadoop/conf |
| ENV SPARK_EXTRA_CLASSPATH=/opt/hadoop/conf |
| ADD hadoop-ozone-filesystem-hadoop2.jar /opt/hadoop-ozone-filesystem-hadoop2.jar</p> |
| <pre><code> |
| ```bash |
| docker build -t myrepo/spark-ozone |
| </code></pre><p>对于远程的 Kubernetes 集群,你可能需要推送镜像:</p> |
| <div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">docker push myrepo/spark-ozone |
| </code></pre></div><h2 id="创建桶并获取-ozonefs-路径">创建桶并获取 OzoneFS 路径</h2> |
| <p>下载任意文本文件并保存为 <code>/tmp/alice.txt</code>。</p> |
| <div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">kubectl port-forward s3g-0 9878:9878 |
| aws s3api --endpoint http://localhost:9878 create-bucket --bucket<span style="color:#f92672">=</span>test |
| aws s3api --endpoint http://localhost:9878 put-object --bucket test --key alice.txt --body /tmp/alice.txt |
| </code></pre></div><p>记下 Ozone 文件系统的 URI,在接下来的 spark-submit 命令中会用到它。</p> |
| <h2 id="创建服务账号">创建服务账号</h2> |
| <div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">kubectl create serviceaccount spark -n yournamespace |
| kubectl create clusterrolebinding spark-role --clusterrole<span style="color:#f92672">=</span>edit --serviceaccount<span style="color:#f92672">=</span>yournamespace:spark --namespace<span style="color:#f92672">=</span>yournamespace |
| </code></pre></div><h2 id="运行任务">运行任务</h2> |
| <p>运行如下的 spark-submit 命令,但需要对下列的值进行修改:</p> |
| <ul> |
| <li>kubernetes master url(你可以查看 <em>~/.kube/config</em> 来获取实际值)</li> |
| <li>kubernetes namespace(本例中为 <em>yournamespace</em>)</li> |
| <li>serviceAccountName (如果你按照上面的步骤做了,使用 <em>spark</em> 即可)</li> |
| <li>container.image (在本例中该值为 <em>myrepo/spark-ozone</em>,在上一步中这个镜像被推送至镜像仓库)</li> |
| <li>输入文件的位置(o3fs://…),使用上面 <code>ozone s3 path <桶名></code> 命令输出中的字符串即可)</li> |
| </ul> |
| <div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">bin/spark-submit <span style="color:#ae81ff">\ |
| </span><span style="color:#ae81ff"></span> --master k8s://https://kubernetes:6443 <span style="color:#ae81ff">\ |
| </span><span style="color:#ae81ff"></span> --deploy-mode cluster <span style="color:#ae81ff">\ |
| </span><span style="color:#ae81ff"></span> --name spark-word-count <span style="color:#ae81ff">\ |
| </span><span style="color:#ae81ff"></span> --class org.apache.spark.examples.JavaWordCount <span style="color:#ae81ff">\ |
| </span><span style="color:#ae81ff"></span> --conf spark.executor.instances<span style="color:#f92672">=</span><span style="color:#ae81ff">1</span> <span style="color:#ae81ff">\ |
| </span><span style="color:#ae81ff"></span> --conf spark.kubernetes.namespace<span style="color:#f92672">=</span>yournamespace <span style="color:#ae81ff">\ |
| </span><span style="color:#ae81ff"></span> --conf spark.kubernetes.authenticate.driver.serviceAccountName<span style="color:#f92672">=</span>spark <span style="color:#ae81ff">\ |
| </span><span style="color:#ae81ff"></span> --conf spark.kubernetes.container.image<span style="color:#f92672">=</span>myrepo/spark-ozone <span style="color:#ae81ff">\ |
| </span><span style="color:#ae81ff"></span> --conf spark.kubernetes.container.image.pullPolicy<span style="color:#f92672">=</span>Always <span style="color:#ae81ff">\ |
| </span><span style="color:#ae81ff"></span> --jars /opt/hadoop-ozone-filesystem-hadoop2.jar <span style="color:#ae81ff">\ |
| </span><span style="color:#ae81ff"></span> local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar <span style="color:#ae81ff">\ |
| </span><span style="color:#ae81ff"></span> o3fs://test.s3v.ozone-om-0.ozone-om:9862/alice.txt |
| </code></pre></div><p>使用 <code>kubectl get pod</code> 命令查看可用的 <code>spark-word-count-...</code> pod。</p> |
| <p>使用 <code>kubectl logs spark-word-count-1549973913699-driver</code> 命令查看计算结果。</p> |
| <p>输出的结果类似如下:</p> |
| <pre><code>... |
| name: 8 |
| William: 3 |
| this,': 1 |
| SOUP!': 1 |
| `Silence: 1 |
| `Mine: 1 |
| ordered.: 1 |
| considering: 3 |
| muttering: 3 |
| candle: 2 |
| ... |
| </code></pre> |
| |
| |
| </div> |
| |
| </div> |
| </div> |
| </div> |
| </div> |
| |
| |
| |
| |
| <script src="../../js/jquery-3.5.1.min.js"></script> |
| <script src="../../js/ozonedoc.js"></script> |
| <script src="../../js/bootstrap.min.js"></script> |
| |
| |
| </body> |
| |
| </html> |