blob: 90c363fb8a6a8c61b63d9b7d904f904b5a5f0332 [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="description" content="Apache Ozone Documentation">
<title>Documentation for Apache Ozone</title>
<link href="../css/bootstrap.min.css" rel="stylesheet">
<link href="../css/ozonedoc.css" rel="stylesheet">
<link href="../swagger-resources/swagger-ui.css" rel="stylesheet">
<script>
var _paq = window._paq = window._paq || [];
_paq.push(['disableCookies']);
_paq.push(['trackPageView']);
_paq.push(['enableLinkTracking']);
(function() {
var u="//analytics.apache.org/";
_paq.push(['setTrackerUrl', u+'matomo.php']);
_paq.push(['setSiteId', '34']);
var d=document, g=d.createElement('script'),
s=d.getElementsByTagName('script')[0];
g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s);
})();
</script>
</head>
<body>
<nav class="navbar navbar-inverse navbar-fixed-top">
<div class="container-fluid">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#sidebar" aria-expanded="false" aria-controls="navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a href="../index.html" class="navbar-left ozone-logo">
<img src="../ozone-logo-small.png"/>
</a>
<a class="navbar-brand hidden-xs" href="../index.html">
Apache Ozone/HDDS Documentation
</a>
<a class="navbar-brand visible-xs-inline" href="#">Apache Ozone</a>
</div>
<div id="navbar" class="navbar-collapse collapse">
<ul class="nav navbar-nav navbar-right">
<li><a href="https://github.com/apache/ozone">Source</a></li>
<li><a href="https://ozone.apache.org">Apache Ozone</a></li>
<li><a href="https://apache.org">ASF</a></li>
</ul>
</div>
</div>
</nav>
<div class="wrapper">
<div class="container-fluid">
<div class="row">
<div class="col-sm-2 col-md-2 sidebar" id="sidebar">
<ul class="nav nav-sidebar">
<li class="">
<a href="../index.html">
<span>Overview</span>
</a>
</li>
<li class="">
<a href="../start.html">
<span>Getting Started</span>
</a>
</li>
<li class="">
<a href="../concept.html">
<span>Architecture</span>
</a>
<ul class="nav">
<li class="">
<a href="../concept/overview.html">Overview</a>
</li>
<li class="">
<a href="../concept/ozonemanager.html">Ozone Manager</a>
</li>
<li class="">
<a href="../concept/storagecontainermanager.html">Storage Container Manager</a>
</li>
<li class="">
<a href="../concept/containers.html">Containers</a>
</li>
<li class="">
<a href="../concept/datanodes.html">Datanodes</a>
</li>
<li class="">
<a href="../concept/recon.html">Recon</a>
</li>
</ul>
</li>
<li class="">
<a href="../feature.html">
<span>Features</span>
</a>
<ul class="nav">
<li class="">
<a href="../feature/decommission.html">Decommissioning</a>
</li>
<li class="">
<a href="../feature/om-ha.html">OM High Availability</a>
</li>
<li class="">
<a href="../feature/erasurecoding.html">Ozone Erasure Coding</a>
</li>
<li class="">
<a href="../feature/snapshot.html">Ozone Snapshot</a>
</li>
<li class="">
<a href="../feature/scm-ha.html">SCM High Availability</a>
</li>
<li class="">
<a href="../feature/streaming-write-pipeline.html">Streaming Write Pipeline</a>
</li>
<li class="">
<a href="../feature/dn-merge-rocksdb.html">Merge Container RocksDB in DN</a>
</li>
<li class="">
<a href="../feature/prefixfso.html">Prefix based File System Optimization</a>
</li>
<li class="">
<a href="../feature/topology.html">Topology awareness</a>
</li>
<li class="">
<a href="../feature/quota.html">Quota in Ozone</a>
</li>
<li class="">
<a href="../feature/recon.html">Recon Server</a>
</li>
<li class="active">
<a href="../feature/observability.html">Observability</a>
</li>
<li class="">
<a href="../feature/nonrolling-upgrade.html">Non-Rolling Upgrades and Downgrades</a>
</li>
<li class="">
<a href="../feature/s3-multi-tenancy.html">
<span>S3 Multi-Tenancy</span>
</a>
<ul class="nav">
<li class="">
<a href="../feature/s3-multi-tenancy-setup.html">Setup</a>
</li>
<li class="">
<a href="../feature/s3-tenant-commands.html">Tenant commands</a>
</li>
<li class="">
<a href="../feature/s3-multi-tenancy-access-control.html">Access Control</a>
</li>
</ul>
</li>
<li class="">
<a href="../feature/reconfigurability.html">Reconfigurability</a>
</li>
</ul>
</li>
<li class="">
<a href="../interface.html">
<span>Client Interfaces</span>
</a>
<ul class="nav">
<li class="">
<a href="../interface/ofs.html">Ofs (Hadoop compatible)</a>
</li>
<li class="">
<a href="../interface/o3fs.html">O3fs (Hadoop compatible)</a>
</li>
<li class="">
<a href="../interface/s3.html">S3 Protocol</a>
</li>
<li class="">
<a href="../interface/cli.html">Command Line Interface</a>
</li>
<li class="">
<a href="../interface/reconapi.html">Recon API</a>
</li>
<li class="">
<a href="../interface/javaapi.html">Java API</a>
</li>
<li class="">
<a href="../interface/csi.html">CSI Protocol</a>
</li>
<li class="">
<a href="../interface/httpfs.html">HttpFS Gateway</a>
</li>
</ul>
</li>
<li class="">
<a href="../security.html">
<span>Security</span>
</a>
<ul class="nav">
<li class="">
<a href="../security/secureozone.html">Securing Ozone</a>
</li>
<li class="">
<a href="../security/securingtde.html">Transparent Data Encryption</a>
</li>
<li class="">
<a href="../security/gdpr.html">GDPR in Ozone</a>
</li>
<li class="">
<a href="../security/securingdatanodes.html">Securing Datanodes</a>
</li>
<li class="">
<a href="../security/securingozonehttp.html">Securing HTTP</a>
</li>
<li class="">
<a href="../security/securings3.html">Securing S3</a>
</li>
<li class="">
<a href="../security/securityacls.html">Ozone ACLs</a>
</li>
<li class="">
<a href="../security/securitywithranger.html">Apache Ranger</a>
</li>
</ul>
</li>
<li class="">
<a href="../tools.html">
<span>Tools</span>
</a>
</li>
<li class="">
<a href="../recipe.html">
<span>Recipes</span>
</a>
</li>
<li><a href="../design.html"><span><b>Design docs</b></span></a></li>
<li class="visible-xs"><a href="#">References</a>
<ul class="nav">
<li><a href="https://github.com/apache/ozone"><span class="glyphicon glyphicon-new-window" aria-hidden="true"></span> Source</a></li>
<li><a href="https://ozone.apache.org"><span class="glyphicon glyphicon-new-window" aria-hidden="true"></span> Apache Ozone</a></li>
<li><a href="https://apache.org"><span class="glyphicon glyphicon-new-window" aria-hidden="true"></span> ASF</a></li>
</ul></li>
</ul>
</div>
<div class="col-sm-10 col-sm-offset-2 col-md-10 col-md-offset-2 main-content">
<div class="col-md-9">
<nav aria-label="breadcrumb">
<ol class="breadcrumb">
<li class="breadcrumb-item"><a href="../index.html">Home</a></li>
<li class="breadcrumb-item" aria-current="page"><a href="../feature.html">Features</a></li>
<li class="breadcrumb-item active" aria-current="page">Observability</li>
</ol>
</nav>
<div class="pull-right">
<a href="../zh/feature/observability.html"><span class="label label-success">中文</span></a>
</div>
<div class="col-md-9">
<h1>Observability</h1>
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<p>Ozone provides multiple tools to get more information about the current state of the cluster.</p>
<h2 id="prometheus">Prometheus</h2>
<p>Ozone has native support for Prometheus integration. All internal metrics (collected by Hadoop metrics framework) are published under the <code>/prom</code> HTTP endpoint. (For example under http://localhost:9876/prom for SCM).</p>
<p>The Prometheus endpoint is turned on by default but can be turned off by the <code>hdds.prometheus.endpoint.enabled</code> configuration variable.</p>
<p>In a secure environment the page is guarded with SPNEGO authentication which is not supported by Prometheus. To enable monitoring in a secure environment, a specific authentication token can be configured</p>
<p>Example <code>ozone-site.xml</code>:</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-XML" data-lang="XML"><span style="color:#f92672">&lt;property&gt;</span>
<span style="color:#f92672">&lt;name&gt;</span>hdds.prometheus.endpoint.token<span style="color:#f92672">&lt;/name&gt;</span>
<span style="color:#f92672">&lt;value&gt;</span>putyourtokenhere<span style="color:#f92672">&lt;/value&gt;</span>
<span style="color:#f92672">&lt;/property&gt;</span>
</code></pre></div><p>Example prometheus configuration:</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-YAML" data-lang="YAML"><span style="color:#f92672">scrape_configs</span>:
- <span style="color:#f92672">job_name</span>: <span style="color:#ae81ff">ozone</span>
<span style="color:#f92672">bearer_token</span>: <span style="color:#ae81ff">&lt;putyourtokenhere&gt;</span>
<span style="color:#f92672">metrics_path</span>: <span style="color:#ae81ff">/prom</span>
<span style="color:#f92672">static_configs</span>:
- <span style="color:#f92672">targets</span>:
- <span style="color:#e6db74">&#34;127.0.0.1:9876&#34;</span>
</code></pre></div><h2 id="grafana">Grafana</h2>
<p>Once Prometheus is up and running, Grana can be configured to monitor and visualize Ozone metrics.</p>
<h3 id="add-prometheus-as-a-data-source">Add Prometheus as a data source</h3>
<p>In the Grafana web UI, go to <code>Add Data Sources</code> and then select <code>Prometheus</code>.</p>
<p>Enter the Prometheus hostname/port in the <code>HTTP</code>. For example, http://localhost:9094 (verify the port used by looking at Prometheus command line flags <code>-web.listen-address</code>. The port can also be found from Prometheus web UI → Status → Command-Line Flags.)</p>
<p>Choose Prometheus type: <code>Prometheus</code></p>
<p>Choose Prometheus version: <code>2.37.x</code></p>
<p>Finish the setup by clicking on <code>Save and Test</code>.</p>
<h3 id="import-a-grafana-dashboard-for-ozone">Import a Grafana dashboard for Ozone</h3>
<p>Apache Ozone comes with a default Grafana dashboard. Follow the instructions below to import it:</p>
<p>Download dashboard json:</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-shell" data-lang="shell">wget https://raw.githubusercontent.com/apache/ozone/master/hadoop-ozone/dist/src/main/compose/common/grafana/dashboards/Ozone%20-%20Overall%20Metrics.json
</code></pre></div><p>Open Grafana portal and click on Dashboards on the left and select <code>Import</code>.</p>
<p>Click at <code>Upload JSON file</code> and select the file <code>Ozone - Overall Metrics.json</code> that was just downloaded.</p>
<p>The dashboard is now imported.</p>
<p>
<img src="GrafanaOzoneOverall.png" alt='Overall dashboard' class="img-responsive" /></p>
<p>Repeat the same for <a href="https://raw.githubusercontent.com/Xushaohong/ozone/master/hadoop-ozone/dist/src/main/compose/common/grafana/dashboards/Ozone%20-%20Object%20Metrics.json">Object Metrics</a> dashboard and <a href="https://raw.githubusercontent.com/Xushaohong/ozone/master/hadoop-ozone/dist/src/main/compose/common/grafana/dashboards/Ozone%20-%20RPC%20Metrics.json">RPC Metrics</a> dashboard.</p>
<p>
<img src="GrafanaOzoneObjectMetrics.png" alt='Object dashboard' class="img-responsive" /></p>
<p>
<img src="GrafanaOzoneRPCMetrics.png" alt='RPC dashboard' class="img-responsive" /></p>
<h2 id="distributed-tracing">Distributed tracing</h2>
<p>Distributed tracing can help to understand performance bottleneck with visualizing end-to-end performance.</p>
<p>Ozone uses <a href="https://jaegertracing.io">jaeger</a> tracing library to collect traces which can send tracing data to any compatible backend (Zipkin, &hellip;).</p>
<p>Tracing is turned off by default, but can be turned on with <code>hdds.tracing.enabled</code> from <code>ozone-site.xml</code></p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-XML" data-lang="XML"><span style="color:#f92672">&lt;property&gt;</span>
<span style="color:#f92672">&lt;name&gt;</span>hdds.tracing.enabled<span style="color:#f92672">&lt;/name&gt;</span>
<span style="color:#f92672">&lt;value&gt;</span>true<span style="color:#f92672">&lt;/value&gt;</span>
<span style="color:#f92672">&lt;/property&gt;</span>
</code></pre></div><p>Jaeger client can be configured with environment variables as documented <a href="https://github.com/jaegertracing/jaeger-client-java/blob/master/jaeger-core/README.md">here</a>:</p>
<p>For example:</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-shell" data-lang="shell">JAEGER_SAMPLER_PARAM<span style="color:#f92672">=</span>0.01
JAEGER_SAMPLER_TYPE<span style="color:#f92672">=</span>probabilistic
JAEGER_AGENT_HOST<span style="color:#f92672">=</span>jaeger
</code></pre></div><p>This configuration will record 1% of the requests to limit the performance overhead. For more information about jaeger sampling <a href="https://www.jaegertracing.io/docs/1.18/sampling/#client-sampling-configuration">check the documentation</a></p>
<h2 id="ozone-insight">ozone insight</h2>
<p>Ozone insight is a swiss-army-knife tool to for checking the current state of Ozone cluster. It can show logging, metrics and configuration for a particular component.</p>
<p>To check the available components use <code>ozone insight list</code>:</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-shell" data-lang="shell">&gt; ozone insight list
Available insight points:
scm.node-manager SCM Datanode management related information.
scm.replica-manager SCM closed container replication manager
scm.event-queue Information about the internal async event delivery
scm.protocol.block-location SCM Block location protocol endpoint
scm.protocol.container-location SCM Container location protocol endpoint
scm.protocol.security SCM Block location protocol endpoint
om.key-manager OM Key Manager
om.protocol.client Ozone Manager RPC endpoint
datanode.pipeline More information about one ratis datanode ring.
</code></pre></div><h3 id="configuration">Configuration</h3>
<p><code>ozone insight config</code> can show configuration related to a specific component (supported only for selected components).</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-shell" data-lang="shell">&gt; ozone insight config scm.replica-manager
Configuration <span style="color:#66d9ef">for</span> <span style="color:#e6db74">`</span>scm.replica-manager<span style="color:#e6db74">`</span> <span style="color:#f92672">(</span>SCM closed container replication manager<span style="color:#f92672">)</span>
&gt;&gt;&gt; hdds.scm.replication.thread.interval
default: 300s
current: 300s
There is a replication monitor thread running inside SCM which takes care of replicating the containers in the cluster. This property is used to configure the interval in which that thread runs.
&gt;&gt;&gt; hdds.scm.replication.event.timeout
default: 30m
current: 30m
Timeout <span style="color:#66d9ef">for</span> the container replication/deletion commands sent to datanodes. After this timeout the command will be retried.
</code></pre></div><h3 id="metrics">Metrics</h3>
<p><code>ozone insight metrics</code> can show metrics related to a specific component (supported only for selected components).</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-shell" data-lang="shell">&gt; ozone insight metrics scm.protocol.block-location
Metrics <span style="color:#66d9ef">for</span> <span style="color:#e6db74">`</span>scm.protocol.block-location<span style="color:#e6db74">`</span> <span style="color:#f92672">(</span>SCM Block location protocol endpoint<span style="color:#f92672">)</span>
RPC connections
Open connections: <span style="color:#ae81ff">0</span>
Dropped connections: <span style="color:#ae81ff">0</span>
Received bytes: <span style="color:#ae81ff">1267</span>
Sent bytes: <span style="color:#ae81ff">2420</span>
RPC queue
RPC average queue time: 0.0
RPC call queue length: <span style="color:#ae81ff">0</span>
RPC performance
RPC processing time average: 0.0
Number of slow calls: <span style="color:#ae81ff">0</span>
Message type counters
Number of AllocateScmBlock: ???
Number of DeleteScmKeyBlocks: ???
Number of GetScmInfo: ???
Number of SortDatanodes: ???
</code></pre></div><h3 id="logs">Logs</h3>
<p><code>ozone insight logs</code> can connect to the required service and show the DEBUG/TRACE log related to one specific component. For example to display RPC message:</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-shell" data-lang="shell">&gt;ozone insight logs om.protocol.client
<span style="color:#f92672">[</span>OM<span style="color:#f92672">]</span> 2020-07-28 12:31:49,988 <span style="color:#f92672">[</span>DEBUG|org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB|OzoneProtocolMessageDispatcher<span style="color:#f92672">]</span> OzoneProtocol ServiceList request is received
<span style="color:#f92672">[</span>OM<span style="color:#f92672">]</span> 2020-07-28 12:31:50,095 <span style="color:#f92672">[</span>DEBUG|org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB|OzoneProtocolMessageDispatcher<span style="color:#f92672">]</span> OzoneProtocol CreateVolume request is received
</code></pre></div><p>Using <code>-v</code> flag the content of the protobuf message can also be displayed (TRACE level log):</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-shell" data-lang="shell">ozone insight logs -v om.protocol.client
<span style="color:#f92672">[</span>OM<span style="color:#f92672">]</span> 2020-07-28 12:33:28,463 <span style="color:#f92672">[</span>TRACE|org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB|OzoneProtocolMessageDispatcher<span style="color:#f92672">]</span> <span style="color:#f92672">[</span>service<span style="color:#f92672">=</span>OzoneProtocol<span style="color:#f92672">]</span> <span style="color:#f92672">[</span>type<span style="color:#f92672">=</span>CreateVolume<span style="color:#f92672">]</span> request is received:
cmdType: CreateVolume
traceID: <span style="color:#e6db74">&#34;&#34;</span>
clientId: <span style="color:#e6db74">&#34;client-A31DF5C6ECF2&#34;</span>
createVolumeRequest <span style="color:#f92672">{</span>
volumeInfo <span style="color:#f92672">{</span>
adminName: <span style="color:#e6db74">&#34;hadoop&#34;</span>
ownerName: <span style="color:#e6db74">&#34;hadoop&#34;</span>
volume: <span style="color:#e6db74">&#34;vol1&#34;</span>
quotaInBytes: <span style="color:#ae81ff">1152921504606846976</span>
volumeAcls <span style="color:#f92672">{</span>
type: USER
name: <span style="color:#e6db74">&#34;hadoop&#34;</span>
rights: <span style="color:#e6db74">&#34;200&#34;</span>
aclScope: ACCESS
<span style="color:#f92672">}</span>
volumeAcls <span style="color:#f92672">{</span>
type: GROUP
name: <span style="color:#e6db74">&#34;users&#34;</span>
rights: <span style="color:#e6db74">&#34;200&#34;</span>
aclScope: ACCESS
<span style="color:#f92672">}</span>
creationTime: <span style="color:#ae81ff">1595939608460</span>
objectID: <span style="color:#ae81ff">0</span>
updateID: <span style="color:#ae81ff">0</span>
modificationTime: <span style="color:#ae81ff">0</span>
<span style="color:#f92672">}</span>
<span style="color:#f92672">}</span>
<span style="color:#f92672">[</span>OM<span style="color:#f92672">]</span> 2020-07-28 12:33:28,474 <span style="color:#f92672">[</span>TRACE|org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB|OzoneProtocolMessageDispatcher<span style="color:#f92672">]</span> <span style="color:#f92672">[</span>service<span style="color:#f92672">=</span>OzoneProtocol<span style="color:#f92672">]</span> <span style="color:#f92672">[</span>type<span style="color:#f92672">=</span>CreateVolume<span style="color:#f92672">]</span> request is processed. Response:
cmdType: CreateVolume
traceID: <span style="color:#e6db74">&#34;&#34;</span>
success: false
message: <span style="color:#e6db74">&#34;Volume already exists&#34;</span>
status: VOLUME_ALREADY_EXISTS
</code></pre></div><div class="alert alert-warning" role="alert">
<p>Under the hood <code>ozone insight</code> uses HTTP endpoints to retrieve the required information (<code>/conf</code>, <code>/prom</code> and <code>/logLevel</code> endpoints). It&rsquo;s not yet supported in secure environment.</p>
</div>
<a class="btn btn-success btn-lg" href="../feature/nonrolling-upgrade.html">Next >></a>
</div>
</div>
</div>
</div>
</div>
<div class="push"></div>
</div>
<footer class="footer">
<div class="container">
<span class="small text-muted">
Version: 1.5.0-SNAPSHOT, Last Modified: February 27, 2024 <a class="hide-child link primary-color" href="https://github.com/apache/ozone/commit/7939faf7d6c904bf1e4ad32baa5d6d0c1de19003">7939faf</a>
</span>
</div>
</footer>
<script src="../js/jquery-3.5.1.min.js"></script>
<script src="../js/ozonedoc.js"></script>
<script src="../js/bootstrap.min.js"></script>
</body>
</html>