blob: 7562aadc3bbd56d0b9a19f565f13de2857019538 [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="description" content="Apache Ozone Documentation">
<title>Documentation for Apache Ozone</title>
<link href="../../css/bootstrap.min.css" rel="stylesheet">
<link href="../../css/ozonedoc.css" rel="stylesheet">
<link href="../../swagger-resources/swagger-ui.css" rel="stylesheet">
<script>
var _paq = window._paq = window._paq || [];
_paq.push(['disableCookies']);
_paq.push(['trackPageView']);
_paq.push(['enableLinkTracking']);
(function() {
var u="//analytics.apache.org/";
_paq.push(['setTrackerUrl', u+'matomo.php']);
_paq.push(['setSiteId', '34']);
var d=document, g=d.createElement('script'),
s=d.getElementsByTagName('script')[0];
g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s);
})();
</script>
</head>
<body>
<nav class="navbar navbar-inverse navbar-fixed-top">
<div class="container-fluid">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#sidebar" aria-expanded="false" aria-controls="navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a href="../../zh/index.html" class="navbar-left ozone-logo">
<img src="../../ozone-logo-small.png"/>
</a>
<a class="navbar-brand hidden-xs" href="../../zh/index.html">
Apache Ozone/HDDS Documentation
</a>
<a class="navbar-brand visible-xs-inline" href="#">Apache Ozone</a>
</div>
<div id="navbar" class="navbar-collapse collapse">
<ul class="nav navbar-nav navbar-right">
<li><a href="https://github.com/apache/ozone">Source</a></li>
<li><a href="https://ozone.apache.org">Apache Ozone</a></li>
<li><a href="https://apache.org">ASF</a></li>
</ul>
</div>
</div>
</nav>
<div class="wrapper">
<div class="container-fluid">
<div class="row">
<div class="col-sm-2 col-md-2 sidebar" id="sidebar">
<ul class="nav nav-sidebar">
<li class="">
<a href="../../zh/index.html">
<span>概述</span>
</a>
</li>
<li class="">
<a href="../../zh/start.html">
<span>快速入门</span>
</a>
</li>
<li class="">
<a href="../../zh/concept.html">
<span>概念</span>
</a>
<ul class="nav">
<li class="">
<a href="../../zh/concept/overview.html">概览</a>
</li>
<li class="">
<a href="../../zh/concept/ozonemanager.html">Ozone Manager</a>
</li>
<li class="">
<a href="../../zh/concept/storagecontainermanager.html">Storage Container Manager</a>
</li>
<li class="">
<a href="../../zh/concept/datanodes.html">数据节点</a>
</li>
<li class="">
<a href="../../zh/concept/containers.html">Containers</a>
</li>
<li class="">
<a href="../../zh/concept/recon.html">Recon</a>
</li>
</ul>
</li>
<li class="">
<a href="../../zh/feature.html">
<span>特性</span>
</a>
<ul class="nav">
<li class="">
<a href="../../zh/feature/decommission.html">Decommissioning</a>
</li>
<li class="">
<a href="../../zh/feature/erasurecoding.html">纠删码</a>
</li>
<li class="">
<a href="../../zh/feature/om-ha.html">高可用 OM</a>
</li>
<li class="">
<a href="../../zh/feature/scm-ha.html">高可用 SCM</a>
</li>
<li class="">
<a href="../../zh/feature/dn-merge-rocksdb.html">在DataNode上合并Container的RocksDB</a>
</li>
<li class="">
<a href="../../zh/feature/prefixfso.html">基于前缀的文件系统优化</a>
</li>
<li class="">
<a href="../../zh/feature/topology.html">拓扑感知能力</a>
</li>
<li class="">
<a href="../../zh/feature/quota.html">Ozone 中的配额</a>
</li>
<li class="">
<a href="../../zh/feature/recon.html">Recon 服务器</a>
</li>
<li class="">
<a href="../../zh/feature/reconfigurability.html">动态加载配置</a>
</li>
</ul>
</li>
<li class="">
<a href="../../zh/security.html">
<span>安全</span>
</a>
<ul class="nav">
<li class="">
<a href="../../zh/security/secureozone.html">安全化 Ozone</a>
</li>
<li class="">
<a href="../../zh/security/securingtde.html">透明数据加密</a>
</li>
<li class="">
<a href="../../zh/security/gdpr.html">Ozone 中的 GDPR</a>
</li>
<li class="">
<a href="../../zh/security/securingdatanodes.html">安全化 Datanode</a>
</li>
<li class="">
<a href="../../zh/security/securings3.html">安全化 S3</a>
</li>
<li class="">
<a href="../../zh/security/securityacls.html">Ozone 访问控制列表</a>
</li>
<li class="">
<a href="../../zh/security/securitywithranger.html">Apache Ranger</a>
</li>
</ul>
</li>
<li class="">
<a href="../../zh/interface.html">
<span>编程接口</span>
</a>
<ul class="nav">
<li class="">
<a href="../../zh/interface/javaapi.html">Java API</a>
</li>
<li class="">
<a href="../../zh/interface/o3fs.html">Ozone 文件系统</a>
</li>
<li class="">
<a href="../../zh/interface/csi.html">CSI 协议</a>
</li>
<li class="">
<a href="../../zh/interface/s3.html">S3 协议接口</a>
</li>
<li class="">
<a href="../../zh/interface/reconapi.html">Recon API</a>
</li>
</ul>
</li>
<li class="">
<a href="../../zh/tools.html">
<span>工具</span>
</a>
</li>
<li class="">
<a href="../../zh/recipe.html">
<span>使用配方</span>
</a>
</li>
<li><a href="../../design.html"><span><b>Design docs</b></span></a></li>
<li class="visible-xs"><a href="#">References</a>
<ul class="nav">
<li><a href="https://github.com/apache/ozone"><span class="glyphicon glyphicon-new-window" aria-hidden="true"></span> Source</a></li>
<li><a href="https://ozone.apache.org"><span class="glyphicon glyphicon-new-window" aria-hidden="true"></span> Apache Ozone</a></li>
<li><a href="https://apache.org"><span class="glyphicon glyphicon-new-window" aria-hidden="true"></span> ASF</a></li>
</ul></li>
</ul>
</div>
<div class="col-sm-10 col-sm-offset-2 col-md-10 col-md-offset-2 main-content">
<div class="col-md-9">
<nav aria-label="breadcrumb">
<ol class="breadcrumb">
<li class="breadcrumb-item"><a href="../../zh/index.html">Home</a></li>
<li class="breadcrumb-item" aria-current="page"><a href="../../zh/feature.html">特性</a></li>
<li class="breadcrumb-item active" aria-current="page">可观察性</li>
</ol>
</nav>
<div class="pull-right">
<a href="../../feature/observability.html"><span class="label label-success">English</span></a>
</div>
<div class="col-md-9">
<h1>可观察性</h1>
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<p>Ozone 提供了多种工具来获取有关集群当前状态的更多信息。</p>
<h2 id="prometheus">Prometheus</h2>
<p>Ozone 原生支持 Prometheus 集成。所有内部指标(由 Hadoop 指标框架收集)都发布在 <code>/prom</code> 的 HTTP 端点下。(例如,在 SCM 的 http://localhost:9876/prom)。</p>
<p>Prometheus 端点默认是打开的,但可以通过<code>hdds.prometheus.endpoint.enabled</code>配置变量把它关闭。</p>
<p>在安全环境中,该页面是用 SPNEGO 认证来保护的,但 Prometheus 不支持这种认证。为了在安全环境中启用监控,可以配置一个特定的认证令牌。</p>
<p><code>ozone-site.xml</code> 配置示例:</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-XML" data-lang="XML"><span style="color:#f92672">&lt;property&gt;</span>
<span style="color:#f92672">&lt;name&gt;</span>hdds.prometheus.endpoint.token<span style="color:#f92672">&lt;/name&gt;</span>
<span style="color:#f92672">&lt;value&gt;</span>putyourtokenhere<span style="color:#f92672">&lt;/value&gt;</span>
<span style="color:#f92672">&lt;/property&gt;</span>
</code></pre></div><p>prometheus 配置示例:</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-YAML" data-lang="YAML"><span style="color:#f92672">scrape_configs</span>:
- <span style="color:#f92672">job_name</span>: <span style="color:#ae81ff">ozone</span>
<span style="color:#f92672">bearer_token</span>: <span style="color:#ae81ff">&lt;putyourtokenhere&gt;</span>
<span style="color:#f92672">metrics_path</span>: <span style="color:#ae81ff">/prom</span>
<span style="color:#f92672">static_configs</span>:
- <span style="color:#f92672">targets</span>:
- <span style="color:#e6db74">&#34;127.0.0.1:9876&#34;</span>
</code></pre></div><h2 id="分布式跟踪">分布式跟踪</h2>
<p>分布式跟踪可以通过可视化端到端的性能来帮助了解性能瓶颈。</p>
<p>Ozone 使用 <a href="https://jaegertracing.io">jaeger</a> 跟踪库收集跟踪,可以将跟踪数据发送到任何兼容的后端(Zipkin,…)。</p>
<p>默认情况下,跟踪功能是关闭的,可以通过 <code>ozon-site.xml</code><code>hdds.tracing.enabled</code> 配置变量打开。</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-XML" data-lang="XML"><span style="color:#f92672">&lt;property&gt;</span>
<span style="color:#f92672">&lt;name&gt;</span>hdds.tracing.enabled<span style="color:#f92672">&lt;/name&gt;</span>
<span style="color:#f92672">&lt;value&gt;</span>true<span style="color:#f92672">&lt;/value&gt;</span>
<span style="color:#f92672">&lt;/property&gt;</span>
</code></pre></div><p>Jaeger 客户端可以用环境变量进行配置,如<a href="https://github.com/jaegertracing/jaeger-client-java/blob/master/jaeger-core/README.md">这份</a>文档所述。</p>
<p>例如:</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-shell" data-lang="shell">JAEGER_SAMPLER_PARAM<span style="color:#f92672">=</span>0.01
JAEGER_SAMPLER_TYPE<span style="color:#f92672">=</span>probabilistic
JAEGER_AGENT_HOST<span style="color:#f92672">=</span>jaeger
</code></pre></div><p>此配置将记录1%的请求,以限制性能开销。有关 Jaeger 抽样的更多信息,请查看<a href="https://www.jaegertracing.io/docs/1.18/sampling/#client-sampling-configuration">文档</a></p>
<h2 id="ozone-insight">Ozone Insight</h2>
<p>Ozone Insight 是一个用于检查 Ozone 集群当前状态的工具,它可以显示特定组件的日志记录、指标和配置。</p>
<p>请使用<code>ozone insight list</code>命令检查可用的组件:</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-shell" data-lang="shell">&gt; ozone insight list
Available insight points:
scm.node-manager SCM Datanode management related information.
scm.replica-manager SCM closed container replication manager
scm.event-queue Information about the internal async event delivery
scm.protocol.block-location SCM Block location protocol endpoint
scm.protocol.container-location SCM Container location protocol endpoint
scm.protocol.security SCM Block location protocol endpoint
om.key-manager OM Key Manager
om.protocol.client Ozone Manager RPC endpoint
datanode.pipeline More information about one ratis datanode ring.
</code></pre></div><h2 id="配置">配置</h2>
<p><code>ozone insight config</code> 可以显示与特定组件有关的配置(只支持选定的组件)。</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-shell" data-lang="shell">&gt; ozone insight config scm.replica-manager
Configuration <span style="color:#66d9ef">for</span> <span style="color:#e6db74">`</span>scm.replica-manager<span style="color:#e6db74">`</span> <span style="color:#f92672">(</span>SCM closed container replication manager<span style="color:#f92672">)</span>
&gt;&gt;&gt; hdds.scm.replication.thread.interval
default: 300s
current: 300s
There is a replication monitor thread running inside SCM which takes care of replicating the containers in the cluster. This property is used to configure the interval in which that thread runs.
&gt;&gt;&gt; hdds.scm.replication.event.timeout
default: 30m
current: 30m
Timeout <span style="color:#66d9ef">for</span> the container replication/deletion commands sent to datanodes. After this timeout the command will be retried.
</code></pre></div><h2 id="指标">指标</h2>
<p><code>ozone insight metrics</code> 可以显示与特定组件相关的指标(只支持选定的组件)。</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-shell" data-lang="shell">&gt; ozone insight metrics scm.protocol.block-location
Metrics <span style="color:#66d9ef">for</span> <span style="color:#e6db74">`</span>scm.protocol.block-location<span style="color:#e6db74">`</span> <span style="color:#f92672">(</span>SCM Block location protocol endpoint<span style="color:#f92672">)</span>
RPC connections
Open connections: <span style="color:#ae81ff">0</span>
Dropped connections: <span style="color:#ae81ff">0</span>
Received bytes: <span style="color:#ae81ff">1267</span>
Sent bytes: <span style="color:#ae81ff">2420</span>
RPC queue
RPC average queue time: 0.0
RPC call queue length: <span style="color:#ae81ff">0</span>
RPC performance
RPC processing time average: 0.0
Number of slow calls: <span style="color:#ae81ff">0</span>
Message type counters
Number of AllocateScmBlock: ???
Number of DeleteScmKeyBlocks: ???
Number of GetScmInfo: ???
Number of SortDatanodes: ???
</code></pre></div><h2 id="日志">日志</h2>
<p><code>ozone insights logs</code> 可以连接到所需的服务并显示与一个特定组件相关的DEBUG/TRACE日志。例如,显示RPC消息:</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-shell" data-lang="shell">&gt;ozone insight logs om.protocol.client
<span style="color:#f92672">[</span>OM<span style="color:#f92672">]</span> 2020-07-28 12:31:49,988 <span style="color:#f92672">[</span>DEBUG|org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB|OzoneProtocolMessageDispatcher<span style="color:#f92672">]</span> OzoneProtocol ServiceList request is received
<span style="color:#f92672">[</span>OM<span style="color:#f92672">]</span> 2020-07-28 12:31:50,095 <span style="color:#f92672">[</span>DEBUG|org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB|OzoneProtocolMessageDispatcher<span style="color:#f92672">]</span> OzoneProtocol CreateVolume request is received
</code></pre></div><p>使用 <code>-v</code> 标志,也可以显示 protobuf 信息的内容(TRACE级别的日志):</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-shell" data-lang="shell">ozone insight logs -v om.protocol.client
<span style="color:#f92672">[</span>OM<span style="color:#f92672">]</span> 2020-07-28 12:33:28,463 <span style="color:#f92672">[</span>TRACE|org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB|OzoneProtocolMessageDispatcher<span style="color:#f92672">]</span> <span style="color:#f92672">[</span>service<span style="color:#f92672">=</span>OzoneProtocol<span style="color:#f92672">]</span> <span style="color:#f92672">[</span>type<span style="color:#f92672">=</span>CreateVolume<span style="color:#f92672">]</span> request is received:
cmdType: CreateVolume
traceID: <span style="color:#e6db74">&#34;&#34;</span>
clientId: <span style="color:#e6db74">&#34;client-A31DF5C6ECF2&#34;</span>
createVolumeRequest <span style="color:#f92672">{</span>
volumeInfo <span style="color:#f92672">{</span>
adminName: <span style="color:#e6db74">&#34;hadoop&#34;</span>
ownerName: <span style="color:#e6db74">&#34;hadoop&#34;</span>
volume: <span style="color:#e6db74">&#34;vol1&#34;</span>
quotaInBytes: <span style="color:#ae81ff">1152921504606846976</span>
volumeAcls <span style="color:#f92672">{</span>
type: USER
name: <span style="color:#e6db74">&#34;hadoop&#34;</span>
rights: <span style="color:#e6db74">&#34;200&#34;</span>
aclScope: ACCESS
<span style="color:#f92672">}</span>
volumeAcls <span style="color:#f92672">{</span>
type: GROUP
name: <span style="color:#e6db74">&#34;users&#34;</span>
rights: <span style="color:#e6db74">&#34;200&#34;</span>
aclScope: ACCESS
<span style="color:#f92672">}</span>
creationTime: <span style="color:#ae81ff">1595939608460</span>
objectID: <span style="color:#ae81ff">0</span>
updateID: <span style="color:#ae81ff">0</span>
modificationTime: <span style="color:#ae81ff">0</span>
<span style="color:#f92672">}</span>
<span style="color:#f92672">}</span>
<span style="color:#f92672">[</span>OM<span style="color:#f92672">]</span> 2020-07-28 12:33:28,474 <span style="color:#f92672">[</span>TRACE|org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB|OzoneProtocolMessageDispatcher<span style="color:#f92672">]</span> <span style="color:#f92672">[</span>service<span style="color:#f92672">=</span>OzoneProtocol<span style="color:#f92672">]</span> <span style="color:#f92672">[</span>type<span style="color:#f92672">=</span>CreateVolume<span style="color:#f92672">]</span> request is processed. Response:
cmdType: CreateVolume
traceID: <span style="color:#e6db74">&#34;&#34;</span>
success: false
message: <span style="color:#e6db74">&#34;Volume already exists&#34;</span>
status: VOLUME_ALREADY_EXISTS
</code></pre></div><div class="alert alert-warning" role="alert">
<p>实际上 <code>ozone insight</code> 是通过 HTTP 端点来检索所需的信息(<code>/conf</code><code>/prom</code><code>/logLevel</code>端点),它在安全环境中还不被支持。</p>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="push"></div>
</div>
<footer class="footer">
<div class="container">
<span class="small text-muted">
Version: 1.5.0-SNAPSHOT, Last Modified: February 26, 2024 <a class="hide-child link primary-color" href="https://github.com/apache/ozone/commit/1b48186a0107711235abcd2636977ae0242f6be8">1b48186</a>
</span>
</div>
</footer>
<script src="../../js/jquery-3.5.1.min.js"></script>
<script src="../../js/ozonedoc.js"></script>
<script src="../../js/bootstrap.min.js"></script>
</body>
</html>