blob: 3c2c66081eca56ef39d17b35670ec0bbb62a4748 [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="description" content="Hadoop Ozone Documentation">
<title>Documentation for Apache Hadoop Ozone</title>
<link href="../../css/bootstrap.min.css" rel="stylesheet">
<link href="../../css/ozonedoc.css" rel="stylesheet">
</head>
<body>
<nav class="navbar navbar-inverse navbar-fixed-top">
<div class="container-fluid">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#sidebar" aria-expanded="false" aria-controls="navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a href="#" class="navbar-left" style="height: 50px; padding: 5px 5px 5px 0;">
<img src="../../ozone-logo-small.png" width="40"/>
</a>
<a class="navbar-brand hidden-xs" href="#">
Apache Hadoop Ozone/HDDS documentation
</a>
<a class="navbar-brand visible-xs-inline" href="#">Hadoop Ozone</a>
</div>
<div id="navbar" class="navbar-collapse collapse">
<ul class="nav navbar-nav navbar-right">
<li><a href="https://github.com/apache/hadoop-ozone">Source</a></li>
<li><a href="https://hadoop.apache.org">Apache Hadoop</a></li>
<li><a href="https://apache.org">ASF</a></li>
</ul>
</div>
</div>
</nav>
<div class="container-fluid">
<div class="row">
<div class="col-sm-2 col-md-2 sidebar" id="sidebar">
<ul class="nav nav-sidebar">
<li class="">
<a href="../../zh/">
<span>概述</span>
</a>
</li>
<li class="">
<a href="../../zh/start.html">
<span>快速入门</span>
</a>
</li>
<li class="">
<a href="../../zh/interface.html">
<span>编程接口</span>
</a>
</li>
<li class="">
<a href="../../zh/feature.html">
<span>GDPR</span>
</a>
</li>
<li class="">
<a href="../../zh/security.html">
<span>安全</span>
</a>
</li>
<li class="">
<a href="../../zh/concept.html">
<span>概念</span>
</a>
</li>
<li class="">
<a href="../../zh/tools.html">
<span>工具</span>
</a>
</li>
<li class="">
<a href="../../zh/recipe.html">
<span>使用配方</span>
</a>
</li>
<li><a href="../../design.html"><span><b>Design docs</b></span></a></li>
<li class="visible-xs"><a href="#">References</a>
<ul class="nav">
<li><a href="https://github.com/apache/hadoop"><span class="glyphicon glyphicon-new-window" aria-hidden="true"></span> Source</a></li>
<li><a href="https://hadoop.apache.org"><span class="glyphicon glyphicon-new-window" aria-hidden="true"></span> Apache Hadoop</a></li>
<li><a href="https://apache.org"><span class="glyphicon glyphicon-new-window" aria-hidden="true"></span> ASF</a></li>
</ul></li>
</ul>
</div>
<div class="col-sm-10 col-sm-offset-2 col-md-10 col-md-offset-2 main">
<div class="col-md-9">
<nav aria-label="breadcrumb">
<ol class="breadcrumb">
<li class="breadcrumb-item"><a href="../../">Home</a></li>
<li class="breadcrumb-item" aria-current="page"><a href="../../zh/recipe.html">使用配方</a></li>
<li class="breadcrumb-item active" aria-current="page">Kubernetes 上运行 Spark 和 OzoneFS</li>
</ol>
</nav>
<div class="pull-right">
<a href="../../recipe/sparkozonefsk8s.html"><span class="label label-success">English</span></a>
</div>
<div class="col-md-9">
<h1>Kubernetes 上运行 Spark 和 OzoneFS</h1>
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<p>本页介绍如何通过以下组件在 Spark 中使用 Ozone 对象存储:</p>
<ul>
<li>OzoneFS (兼容 Hadoop 的文件系统)</li>
<li>Hadoop 2.7 (包含在 Spark 发行包中)</li>
<li>Kubernetes 的 Spark 调度器</li>
<li>本地 Spark 客户端</li>
</ul>
<h2 id="准备">准备</h2>
<p>下载 Spark 和 Ozone 的最新发行包并解压,本方法使用 <code>spark-2.4.6-bin-hadoop2.7</code> 进行了测试。</p>
<p>你还需要准备以下内容:</p>
<ul>
<li>用来上传下载 spark+ozone 镜像的仓库(本文档中使用 Docker Hub)</li>
<li>自定义镜像的名称,形如 repo/name(本文档中使用 <em>myrepo/ozone-spark</em></li>
<li>专门的 Kubernetes 命名空间(本文档中使用 <em>yournamespace</em></li>
</ul>
<h2 id="为-driver-创建-docker-镜像">为 driver 创建 docker 镜像</h2>
<h3 id="创建-spark-driverexecutor-基础镜像">创建 Spark driver/executor 基础镜像</h3>
<p>首先使用 Spark 的镜像创建工具创建一个镜像。
在 Spark 发行包中运行以下命令:</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">./bin/docker-image-tool.sh -r myrepo -t 2.4.6 build
</code></pre></div><p><em>注意</em>: 如果你使用 Minikube,需要加上 <code>-m</code> 参数来使用 Minikube 镜像的 docker 进程。</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">./bin/docker-image-tool.sh -m -r myrepo -t 2.4.6 build
</code></pre></div><p><code>./bin/docker-image-tool.sh</code> 是 Spark 用来创建镜像的官方工具,上面的步骤会创建多个名为 <em>myrepo/spark</em> 的 Spark 镜像,其中的第一个镜像用作接下来步骤的基础镜像。</p>
<h3 id="定制镜像">定制镜像</h3>
<p>创建一个用于定制镜像的目录。</p>
<p>从集群中拷贝 <code>ozone-site.xml</code></p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">kubectl cp om-0:/opt/hadoop/etc/hadoop/ozone-site.xml .
</code></pre></div><p>从 Ozone 目录中拷贝 <code>ozonefs.jar</code><strong>使用 hadoop2 版本!</strong></p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-xml" data-lang="xml"><span style="color:#f92672">&lt;configuration&gt;</span>
<span style="color:#f92672">&lt;property&gt;</span>
<span style="color:#f92672">&lt;name&gt;</span>fs.AbstractFileSystem.o3fs.impl<span style="color:#f92672">&lt;/name&gt;</span>
<span style="color:#f92672">&lt;value&gt;</span>org.apache.hadoop.fs.ozone.OzFs<span style="color:#f92672">&lt;/value&gt;</span>
<span style="color:#f92672">&lt;/property&gt;</span>
<span style="color:#f92672">&lt;/configuration&gt;</span>
</code></pre></div><p>kubectl cp om-0:/opt/hadoop/share/ozone/lib/hadoop-ozone-filesystem-hadoop2-VERSION.jar hadoop-ozone-filesystem-hadoop2.jar</p>
<pre><code>
编写新的 Dockerfile 并构建镜像:
</code></pre><p>FROM myrepo/spark:2.4.6
ADD core-site.xml /opt/hadoop/conf/core-site.xml
ADD ozone-site.xml /opt/hadoop/conf/ozone-site.xml
ENV HADOOP_CONF_DIR=/opt/hadoop/conf
ENV SPARK_EXTRA_CLASSPATH=/opt/hadoop/conf
ADD hadoop-ozone-filesystem-hadoop2.jar /opt/hadoop-ozone-filesystem-hadoop2.jar</p>
<pre><code>
```bash
docker build -t myrepo/spark-ozone
</code></pre><p>对于远程的 Kubernetes 集群,你可能需要推送镜像:</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">docker push myrepo/spark-ozone
</code></pre></div><h2 id="创建桶并获取-ozonefs-路径">创建桶并获取 OzoneFS 路径</h2>
<p>下载任意文本文件并保存为 <code>/tmp/alice.txt</code></p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">kubectl port-forward s3g-0 9878:9878
aws s3api --endpoint http://localhost:9878 create-bucket --bucket<span style="color:#f92672">=</span>test
aws s3api --endpoint http://localhost:9878 put-object --bucket test --key alice.txt --body /tmp/alice.txt
</code></pre></div><p>记下 Ozone 文件系统的 URI,在接下来的 spark-submit 命令中会用到它。</p>
<h2 id="创建服务账号">创建服务账号</h2>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">kubectl create serviceaccount spark -n yournamespace
kubectl create clusterrolebinding spark-role --clusterrole<span style="color:#f92672">=</span>edit --serviceaccount<span style="color:#f92672">=</span>yournamespace:spark --namespace<span style="color:#f92672">=</span>yournamespace
</code></pre></div><h2 id="运行任务">运行任务</h2>
<p>运行如下的 spark-submit 命令,但需要对下列的值进行修改:</p>
<ul>
<li>kubernetes master url(你可以查看 <em>~/.kube/config</em> 来获取实际值)</li>
<li>kubernetes namespace(本例中为 <em>yournamespace</em></li>
<li>serviceAccountName (如果你按照上面的步骤做了,使用 <em>spark</em> 即可)</li>
<li>container.image (在本例中该值为 <em>myrepo/spark-ozone</em>,在上一步中这个镜像被推送至镜像仓库)</li>
<li>输入文件的位置(o3fs://&hellip;),使用上面 <code>ozone s3 path &lt;桶名&gt;</code> 命令输出中的字符串即可)</li>
</ul>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">bin/spark-submit <span style="color:#ae81ff">\
</span><span style="color:#ae81ff"></span> --master k8s://https://kubernetes:6443 <span style="color:#ae81ff">\
</span><span style="color:#ae81ff"></span> --deploy-mode cluster <span style="color:#ae81ff">\
</span><span style="color:#ae81ff"></span> --name spark-word-count <span style="color:#ae81ff">\
</span><span style="color:#ae81ff"></span> --class org.apache.spark.examples.JavaWordCount <span style="color:#ae81ff">\
</span><span style="color:#ae81ff"></span> --conf spark.executor.instances<span style="color:#f92672">=</span><span style="color:#ae81ff">1</span> <span style="color:#ae81ff">\
</span><span style="color:#ae81ff"></span> --conf spark.kubernetes.namespace<span style="color:#f92672">=</span>yournamespace <span style="color:#ae81ff">\
</span><span style="color:#ae81ff"></span> --conf spark.kubernetes.authenticate.driver.serviceAccountName<span style="color:#f92672">=</span>spark <span style="color:#ae81ff">\
</span><span style="color:#ae81ff"></span> --conf spark.kubernetes.container.image<span style="color:#f92672">=</span>myrepo/spark-ozone <span style="color:#ae81ff">\
</span><span style="color:#ae81ff"></span> --conf spark.kubernetes.container.image.pullPolicy<span style="color:#f92672">=</span>Always <span style="color:#ae81ff">\
</span><span style="color:#ae81ff"></span> --jars /opt/hadoop-ozone-filesystem-hadoop2.jar <span style="color:#ae81ff">\
</span><span style="color:#ae81ff"></span> local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar <span style="color:#ae81ff">\
</span><span style="color:#ae81ff"></span> o3fs://test.s3v.ozone-om-0.ozone-om:9862/alice.txt
</code></pre></div><p>使用 <code>kubectl get pod</code> 命令查看可用的 <code>spark-word-count-...</code> pod。</p>
<p>使用 <code>kubectl logs spark-word-count-1549973913699-driver</code> 命令查看计算结果。</p>
<p>输出的结果类似如下:</p>
<pre><code>...
name: 8
William: 3
this,': 1
SOUP!': 1
`Silence: 1
`Mine: 1
ordered.: 1
considering: 3
muttering: 3
candle: 2
...
</code></pre>
</div>
</div>
</div>
</div>
</div>
<script src="../../js/jquery-3.5.1.min.js"></script>
<script src="../../js/ozonedoc.js"></script>
<script src="../../js/bootstrap.min.js"></script>
</body>
</html>