| <!DOCTYPE html> |
| <html> |
| <head> |
| <meta charset="utf-8"> |
| <title>Apache Mesos - Observability Metrics</title> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| |
| <meta property="og:locale" content="en_US"/> |
| <meta property="og:type" content="website"/> |
| <meta property="og:title" content="Apache Mesos"/> |
| <meta property="og:site_name" content="Apache Mesos"/> |
| <meta property="og:url" content="http://mesos.apache.org/"/> |
| <meta property="og:image" content="http://mesos.apache.org/assets/img/mesos_logo_fb_preview.png"/> |
| <meta property="og:description" |
| content="Apache Mesos abstracts resources away from machines, |
| enabling fault-tolerant and elastic distributed systems |
| to easily be built and run effectively."/> |
| |
| <meta name="twitter:card" content="summary"/> |
| <meta name="twitter:site" content="@ApacheMesos"/> |
| <meta name="twitter:title" content="Apache Mesos"/> |
| <meta name="twitter:image" content="http://mesos.apache.org/assets/img/mesos_logo_fb_preview.png"/> |
| <meta name="twitter:description" |
| content="Apache Mesos abstracts resources away from machines, |
| enabling fault-tolerant and elastic distributed systems |
| to easily be built and run effectively."/> |
| |
| <link href="//netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet"> |
| <link rel="alternate" type="application/atom+xml" title="Apache Mesos Blog" href="/blog/feed.xml"> |
| <link href="../../assets/css/main.css" media="screen" rel="stylesheet" type="text/css" /> |
| |
| |
| |
| <!-- Google Analytics Magic --> |
| <script type="text/javascript"> |
| var _gaq = _gaq || []; |
| _gaq.push(['_setAccount', 'UA-20226872-1']); |
| _gaq.push(['_setDomainName', 'apache.org']); |
| _gaq.push(['_trackPageview']); |
| |
| (function() { |
| var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; |
| ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; |
| var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); |
| })(); |
| </script> |
| |
| </head> |
| <body> |
| <!-- magical breadcrumbs --> |
| <div class="topnav"> |
| <div class="container"> |
| <ul class="breadcrumb"> |
| <li> |
| <div class="dropdown"> |
| <a data-toggle="dropdown" href="#">Apache Software Foundation <span class="caret"></span></a> |
| <ul class="dropdown-menu" role="menu" aria-labelledby="dLabel"> |
| <li><a href="http://www.apache.org">Apache Homepage</a></li> |
| <li><a href="http://www.apache.org/licenses/">License</a></li> |
| <li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li> |
| <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li> |
| <li><a href="http://www.apache.org/security/">Security</a></li> |
| </ul> |
| </div> |
| </li> |
| |
| <li><a href="http://mesos.apache.org">Apache Mesos</a></li> |
| |
| |
| <li><a href="/documentation |
| /">Documentation |
| </a></li> |
| |
| |
| </ul><!-- /.breadcrumb --> |
| </div><!-- /.container --> |
| </div><!-- /.topnav --> |
| |
| <!-- navbar excitement --> |
| <div class="navbar navbar-default navbar-static-top" role="navigation"> |
| <div class="container"> |
| <div class="navbar-header"> |
| <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#mesos-menu" aria-expanded="false"> |
| <span class="sr-only">Toggle navigation</span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| <a class="navbar-brand" href="/"><img src="/assets/img/mesos_logo.png" alt="Apache Mesos logo"/></a> |
| </div><!-- /.navbar-header --> |
| |
| <div class="navbar-collapse collapse" id="mesos-menu"> |
| <ul class="nav navbar-nav navbar-right"> |
| <li><a href="/gettingstarted/">Getting Started</a></li> |
| <li><a href="/blog/">Blog</a></li> |
| <li><a href="/documentation/latest/">Documentation</a></li> |
| <li><a href="/downloads/">Downloads</a></li> |
| <li><a href="/community/">Community</a></li> |
| </ul> |
| </div><!-- /#mesos-menu --> |
| </div><!-- /.container --> |
| </div><!-- /.navbar --> |
| |
| <div class="content"> |
| <div class="container"> |
| <div class="row-fluid"> |
| <div class="col-md-4"> |
| <h4>If you're new to Mesos</h4> |
| <p>See the <a href="/gettingstarted/">getting started</a> page for more |
| information about downloading, building, and deploying Mesos.</p> |
| |
| <h4>If you'd like to get involved or you're looking for support</h4> |
| <p>See our <a href="/community/">community</a> page for more details.</p> |
| </div> |
| <div class="col-md-8"> |
| <h1>Mesos Observability Metrics</h1> |
| |
| <p>This document describes the observability metrics provided by Mesos master and |
| agent nodes. This document also provides some initial guidance on which metrics |
| you should monitor to detect abnormal situations in your cluster.</p> |
| |
| <h2>Overview</h2> |
| |
| <p>Mesos master and agent nodes report a set of statistics and metrics that enable |
| cluster operators to monitor resource usage and detect abnormal situations early. The |
| information reported by Mesos includes details about available resources, used |
| resources, registered frameworks, active agents, and task state. You can use |
| this information to create automated alerts and to plot different metrics over |
| time inside a monitoring dashboard.</p> |
| |
| <p>Metric information is not persisted to disk at either master or agent |
| nodes, which means that metrics will be reset when masters and agents |
| are restarted. Similarly, if the current leading master fails and a new |
| leading master is elected, metrics at the new master will be reset.</p> |
| |
| <h2>Metric Types</h2> |
| |
| <p>Mesos provides two different kinds of metrics: counters and gauges.</p> |
| |
| <p><strong>Counters</strong> keep track of discrete events and are monotonically increasing. The |
| value of a metric of this type is always a natural number. Examples include the |
| number of failed tasks and the number of agent registrations. For some metrics |
| of this type, the rate of change is often more useful than the value itself.</p> |
| |
| <p><strong>Gauges</strong> represent an instantaneous sample of some magnitude. Examples include |
| the amount of used memory in the cluster and the number of connected agents. For |
| some metrics of this type, it is often useful to determine whether the value is |
| above or below a threshold for a sustained period of time.</p> |
| |
| <p>The tables in this document indicate the type of each available metric.</p> |
| |
| <h2>Master Nodes</h2> |
| |
| <p>Metrics from each master node are available via the |
| <a href="/documentation/latest/./endpoints/metrics/snapshot/">/metrics/snapshot</a> master endpoint. The response |
| is a JSON object that contains metrics names and values as key-value pairs.</p> |
| |
| <h3>Observability metrics</h3> |
| |
| <p>This section lists all available metrics from Mesos master nodes grouped by |
| category.</p> |
| |
| <h4>Resources</h4> |
| |
| <p>The following metrics provide information about the total resources available in |
| the cluster and their current usage. High resource usage for sustained periods |
| of time may indicate that you need to add capacity to your cluster or that a |
| framework is misbehaving.</p> |
| |
| <table class="table table-striped"> |
| <thead> |
| <tr><th>Metric</th><th>Description</th><th>Type</th> |
| </thead> |
| <tr> |
| <td> |
| <code>master/cpus_percent</code> |
| </td> |
| <td>Percentage of allocated CPUs</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/cpus_used</code> |
| </td> |
| <td>Number of allocated CPUs</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/cpus_total</code> |
| </td> |
| <td>Number of CPUs</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/cpus_revocable_percent</code> |
| </td> |
| <td>Percentage of allocated revocable CPUs</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/cpus_revocable_total</code> |
| </td> |
| <td>Number of revocable CPUs</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/cpus_revocable_used</code> |
| </td> |
| <td>Number of allocated revocable CPUs</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/disk_percent</code> |
| </td> |
| <td>Percentage of allocated disk space</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/disk_used</code> |
| </td> |
| <td>Allocated disk space in MB</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/disk_total</code> |
| </td> |
| <td>Disk space in MB</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/disk_revocable_percent</code> |
| </td> |
| <td>Percentage of allocated revocable disk space</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/disk_revocable_total</code> |
| </td> |
| <td>Revocable disk space in MB</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/disk_revocable_used</code> |
| </td> |
| <td>Allocated revocable disk space in MB</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/gpus_percent</code> |
| </td> |
| <td>Percentage of allocated GPUs</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/gpus_used</code> |
| </td> |
| <td>Number of allocated GPUs</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/gpus_total</code> |
| </td> |
| <td>Number of GPUs</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/gpus_revocable_percent</code> |
| </td> |
| <td>Percentage of allocated revocable GPUs</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/gpus_revocable_total</code> |
| </td> |
| <td>Number of revocable GPUs</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/gpus_revocable_used</code> |
| </td> |
| <td>Number of allocated revocable GPUs</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/mem_percent</code> |
| </td> |
| <td>Percentage of allocated memory</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/mem_used</code> |
| </td> |
| <td>Allocated memory in MB</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/mem_total</code> |
| </td> |
| <td>Memory in MB</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/mem_revocable_percent</code> |
| </td> |
| <td>Percentage of allocated revocable memory</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/mem_revocable_total</code> |
| </td> |
| <td>Revocable memory in MB</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/mem_revocable_used</code> |
| </td> |
| <td>Allocated revocable memory in MB</td> |
| <td>Gauge</td> |
| </tr> |
| </table> |
| |
| |
| <h4>Master</h4> |
| |
| <p>The following metrics provide information about whether a master is currently |
| elected and how long it has been running. A cluster with no elected master |
| for sustained periods of time indicates a malfunctioning cluster. This |
| points to either leadership election issues (so check the connection to |
| ZooKeeper) or a flapping Master process. A low uptime value indicates that the |
| master has restarted recently.</p> |
| |
| <table class="table table-striped"> |
| <thead> |
| <tr><th>Metric</th><th>Description</th><th>Type</th> |
| </thead> |
| <tr> |
| <td> |
| <code>master/elected</code> |
| </td> |
| <td>Whether this is the elected master</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/uptime_secs</code> |
| </td> |
| <td>Uptime in seconds</td> |
| <td>Gauge</td> |
| </tr> |
| </table> |
| |
| |
| <h4>System</h4> |
| |
| <p>The following metrics provide information about the resources available on this |
| master node and their current usage. High resource usage in a master node for |
| sustained periods of time may degrade the performance of the cluster.</p> |
| |
| <table class="table table-striped"> |
| <thead> |
| <tr><th>Metric</th><th>Description</th><th>Type</th> |
| </thead> |
| <tr> |
| <td> |
| <code>system/cpus_total</code> |
| </td> |
| <td>Number of CPUs available in this master node</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>system/load_15min</code> |
| </td> |
| <td>Load average for the past 15 minutes</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>system/load_5min</code> |
| </td> |
| <td>Load average for the past 5 minutes</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>system/load_1min</code> |
| </td> |
| <td>Load average for the past minute</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>system/mem_free_bytes</code> |
| </td> |
| <td>Free memory in bytes</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>system/mem_total_bytes</code> |
| </td> |
| <td>Total memory in bytes</td> |
| <td>Gauge</td> |
| </tr> |
| </table> |
| |
| |
| <h4>Agents</h4> |
| |
| <p>The following metrics provide information about agent events, agent counts, and |
| agent states. A low number of active agents may indicate that agents are |
| unhealthy or that they are not able to connect to the elected master.</p> |
| |
| <table class="table table-striped"> |
| <thead> |
| <tr><th>Metric</th><th>Description</th><th>Type</th> |
| </thead> |
| <tr> |
| <td> |
| <code>master/slave_registrations</code> |
| </td> |
| <td>Number of agents that were able to cleanly re-join the cluster and |
| connect back to the master after the master is disconnected.</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/slave_removals</code> |
| </td> |
| <td>Number of agent removed for various reasons, including maintenance</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/slave_reregistrations</code> |
| </td> |
| <td>Number of agent re-registrations</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/slave_unreachable_scheduled</code> |
| </td> |
| <td>Number of agents which have failed their health check and are scheduled |
| to be marked unreachable. They will not be marked unreachable immediately due to the Agent |
| Removal Rate-Limit, but <code>master/slave_unreachable_completed</code> |
| will start increasing as they do get removed.</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/slave_unreachable_canceled</code> |
| </td> |
| <td>Number of times that an agent was due to be marked unreachable but this |
| transition was cancelled. This happens when the agent removal rate limit |
| is enabled and the agent sends a <code>PONG</code> response message to the |
| master before the rate limit allows the agent to be marked unreachable.</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/slave_unreachable_completed</code> |
| </td> |
| <td>Number of agents that were marked as unreachable because they failed |
| health checks. These are agents which were not heard from despite the |
| agent-removal rate limit, and have been marked as unreachable in the |
| master's agent registry.</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/slaves_active</code> |
| </td> |
| <td>Number of active agents</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/slaves_connected</code> |
| </td> |
| <td>Number of connected agents</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/slaves_disconnected</code> |
| </td> |
| <td>Number of disconnected agents</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/slaves_inactive</code> |
| </td> |
| <td>Number of inactive agents</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/slaves_inactive</code> |
| </td> |
| <td>Number of unreachable agents. Unreachable agents are periodically |
| garbage collected from the registry, which will cause this value to |
| decrease.</td> |
| <td>Gauge</td> |
| </tr> |
| </table> |
| |
| |
| <h4>Frameworks</h4> |
| |
| <p>The following metrics provide information about the registered frameworks in the |
| cluster. No active or connected frameworks may indicate that a scheduler is not |
| registered or that it is misbehaving.</p> |
| |
| <table class="table table-striped"> |
| <thead> |
| <tr><th>Metric</th><th>Description</th><th>Type</th> |
| </thead> |
| <tr> |
| <td> |
| <code>master/frameworks_active</code> |
| </td> |
| <td>Number of active frameworks</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/frameworks_connected</code> |
| </td> |
| <td>Number of connected frameworks</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/frameworks_disconnected</code> |
| </td> |
| <td>Number of disconnected frameworks</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/frameworks_inactive</code> |
| </td> |
| <td>Number of inactive frameworks</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/outstanding_offers</code> |
| </td> |
| <td>Number of outstanding resource offers</td> |
| <td>Gauge</td> |
| </tr> |
| </table> |
| |
| |
| <h4>Tasks</h4> |
| |
| <p>The following metrics provide information about active and terminated tasks. A |
| high rate of lost tasks may indicate that there is a problem with the cluster. |
| The task states listed here match those of the task state machine.</p> |
| |
| <table class="table table-striped"> |
| <thead> |
| <tr><th>Metric</th><th>Description</th><th>Type</th> |
| </thead> |
| <tr> |
| <td> |
| <code>master/tasks_error</code> |
| </td> |
| <td>Number of tasks that were invalid</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/tasks_failed</code> |
| </td> |
| <td>Number of failed tasks</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/tasks_finished</code> |
| </td> |
| <td>Number of finished tasks</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/tasks_killed</code> |
| </td> |
| <td>Number of killed tasks</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/tasks_killing</code> |
| </td> |
| <td>Number of tasks currently being killed</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/tasks_lost</code> |
| </td> |
| <td>Number of lost tasks</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/tasks_running</code> |
| </td> |
| <td>Number of running tasks</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/tasks_staging</code> |
| </td> |
| <td>Number of staging tasks</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/tasks_starting</code> |
| </td> |
| <td>Number of starting tasks</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/tasks_unreachable</code> |
| </td> |
| <td>Number of unreachable tasks</td> |
| <td>Gauge</td> |
| </tr> |
| </table> |
| |
| |
| <h4>Messages</h4> |
| |
| <p>The following metrics provide information about messages between the master and |
| the agents and between the framework and the executors. A high rate of dropped |
| messages may indicate that there is a problem with the network.</p> |
| |
| <table class="table table-striped"> |
| <thead> |
| <tr><th>Metric</th><th>Description</th><th>Type</th> |
| </thead> |
| <tr> |
| <td> |
| <code>master/invalid_executor_to_framework_messages</code> |
| </td> |
| <td>Number of invalid executor to framework messages</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/invalid_framework_to_executor_messages</code> |
| </td> |
| <td>Number of invalid framework to executor messages</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/invalid_status_update_acknowledgements</code> |
| </td> |
| <td>Number of invalid status update acknowledgements</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/invalid_status_updates</code> |
| </td> |
| <td>Number of invalid status updates</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/dropped_messages</code> |
| </td> |
| <td>Number of dropped messages</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/messages_authenticate</code> |
| </td> |
| <td>Number of authentication messages</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/messages_deactivate_framework</code> |
| </td> |
| <td>Number of framework deactivation messages</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/messages_decline_offers</code> |
| </td> |
| <td>Number of offers declined</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/messages_executor_to_framework</code> |
| </td> |
| <td>Number of executor to framework messages</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/messages_exited_executor</code> |
| </td> |
| <td>Number of terminated executor messages</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/messages_framework_to_executor</code> |
| </td> |
| <td>Number of messages from a framework to an executor</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/messages_kill_task</code> |
| </td> |
| <td>Number of kill task messages</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/messages_launch_tasks</code> |
| </td> |
| <td>Number of launch task messages</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/messages_reconcile_tasks</code> |
| </td> |
| <td>Number of reconcile task messages</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/messages_register_framework</code> |
| </td> |
| <td>Number of framework registration messages</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/messages_register_slave</code> |
| </td> |
| <td>Number of agent registration messages</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/messages_reregister_framework</code> |
| </td> |
| <td>Number of framework re-registration messages</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/messages_reregister_slave</code> |
| </td> |
| <td>Number of agent re-registration messages</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/messages_resource_request</code> |
| </td> |
| <td>Number of resource request messages</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/messages_revive_offers</code> |
| </td> |
| <td>Number of offer revival messages</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/messages_status_update</code> |
| </td> |
| <td>Number of status update messages</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/messages_status_update_acknowledgement</code> |
| </td> |
| <td>Number of status update acknowledgement messages</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/messages_unregister_framework</code> |
| </td> |
| <td>Number of framework unregistration messages</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/messages_unregister_slave</code> |
| </td> |
| <td>Number of agent unregistration messages</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/messages_update_slave</code> |
| </td> |
| <td>Number of update agent messages</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/recovery_slave_removals</code> |
| </td> |
| <td>Number of agents not re-registered during master failover</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/slave_removals/reason_registered</code> |
| </td> |
| <td>Number of agents removed when new agents registered at the same address</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/slave_removals/reason_unhealthy</code> |
| </td> |
| <td>Number of agents failed due to failed health checks</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/slave_removals/reason_unregistered</code> |
| </td> |
| <td>Number of agents unregistered</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/valid_framework_to_executor_messages</code> |
| </td> |
| <td>Number of valid framework to executor messages</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/valid_status_update_acknowledgements</code> |
| </td> |
| <td>Number of valid status update acknowledgement messages</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/valid_status_updates</code> |
| </td> |
| <td>Number of valid status update messages</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/task_lost/source_master/reason_invalid_offers</code> |
| </td> |
| <td>Number of tasks lost due to invalid offers</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/task_lost/source_master/reason_slave_removed</code> |
| </td> |
| <td>Number of tasks lost due to agent removal</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/task_lost/source_slave/reason_executor_terminated</code> |
| </td> |
| <td>Number of tasks lost due to executor termination</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/valid_executor_to_framework_messages</code> |
| </td> |
| <td>Number of valid executor to framework messages</td> |
| <td>Counter</td> |
| </tr> |
| </table> |
| |
| |
| <h4>Event queue</h4> |
| |
| <p>The following metrics provide information about different types of events in the |
| event queue.</p> |
| |
| <table class="table table-striped"> |
| <thead> |
| <tr><th>Metric</th><th>Description</th><th>Type</th> |
| </thead> |
| <tr> |
| <td> |
| <code>master/event_queue_dispatches</code> |
| </td> |
| <td>Number of dispatches in the event queue</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/event_queue_http_requests</code> |
| </td> |
| <td>Number of HTTP requests in the event queue</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>master/event_queue_messages</code> |
| </td> |
| <td>Number of messages in the event queue</td> |
| <td>Gauge</td> |
| </tr> |
| </table> |
| |
| |
| <h4>Registrar</h4> |
| |
| <p>The following metrics provide information about read and write latency to the |
| agent registrar.</p> |
| |
| <table class="table table-striped"> |
| <thead> |
| <tr><th>Metric</th><th>Description</th><th>Type</th> |
| </thead> |
| <tr> |
| <td> |
| <code>registrar/state_fetch_ms</code> |
| </td> |
| <td>Registry read latency in ms </td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>registrar/state_store_ms</code> |
| </td> |
| <td>Registry write latency in ms </td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>registrar/state_store_ms/max</code> |
| </td> |
| <td>Maximum registry write latency in ms</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>registrar/state_store_ms/min</code> |
| </td> |
| <td>Minimum registry write latency in ms</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>registrar/state_store_ms/p50</code> |
| </td> |
| <td>Median registry write latency in ms</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>registrar/state_store_ms/p90</code> |
| </td> |
| <td>90th percentile registry write latency in ms</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>registrar/state_store_ms/p95</code> |
| </td> |
| <td>95th percentile registry write latency in ms</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>registrar/state_store_ms/p99</code> |
| </td> |
| <td>99th percentile registry write latency in ms</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>registrar/state_store_ms/p999</code> |
| </td> |
| <td>99.9th percentile registry write latency in ms</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>registrar/state_store_ms/p9999</code> |
| </td> |
| <td>99.99th percentile registry write latency in ms</td> |
| <td>Gauge</td> |
| </tr> |
| </table> |
| |
| |
| <h4>Replicated log</h4> |
| |
| <p>The following metrics provide information about the replicated log underneath |
| the registrar, which is the persistent store for masters.</p> |
| |
| <table class="table table-striped"> |
| <thead> |
| <tr><th>Metric</th><th>Description</th><th>Type</th> |
| </thead> |
| <tr> |
| <td> |
| <code>registrar/log/recovered</code> |
| </td> |
| <td> |
| Whether the replicated log for the registrar has caught up with the other |
| masters in the cluster. A cluster is operational as long as a quorum of |
| "recovered" masters is available in the cluster. |
| </td> |
| <td>Gauge</td> |
| </tr> |
| </table> |
| |
| |
| <h4>Allocator</h4> |
| |
| <p>The following metrics provide information about performance |
| and resource allocations in the allocator.</p> |
| |
| <table class="table table-stripped"> |
| <thead> |
| <tr><th>Metric</th><th>Description</th><th>Type</th> |
| </thead> |
| <tr> |
| <td> |
| <code>allocator/mesos/allocation_run_ms</code> |
| </td> |
| <td>Allocation algorithm latency in ms</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>allocator/mesos/allocation_run_ms/count</code> |
| </td> |
| <td>Number of allocation algorithm latency measurements in the window</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>allocator/mesos/allocation_run_ms/max</code> |
| </td> |
| <td>Maximum allocation algorithm latency in ms</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>allocator/mesos/allocation_run_ms/min</code> |
| </td> |
| <td>Minimum allocation algorithm latency in ms</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>allocator/mesos/allocation_run_ms/p50</code> |
| </td> |
| <td>Median allocation algorithm latency in ms</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>allocator/mesos/allocation_run_ms/p90</code> |
| </td> |
| <td>90th percentile allocation algorithm latency in ms</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>allocator/mesos/allocation_run_ms/p95</code> |
| </td> |
| <td>95th percentile allocation algorithm latency in ms</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>allocator/mesos/allocation_run_ms/p99</code> |
| </td> |
| <td>99th percentile allocation algorithm latency in ms</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>allocator/mesos/allocation_run_ms/p999</code> |
| </td> |
| <td>99.9th percentile allocation algorithm latency in ms</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>allocator/mesos/allocation_run_ms/p9999</code> |
| </td> |
| <td>99.99th percentile allocation algorithm latency in ms</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>allocator/mesos/allocation_runs</code> |
| </td> |
| <td>Number of times the allocation algorithm has run</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>allocator/mesos/roles/<role>/shares/dominant</code> |
| </td> |
| <td>Dominant resource share for the role, exposed as a percentage (0.0-1.0)</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>allocator/mesos/event_queue_dispatches</code> |
| </td> |
| <td>Number of dispatch events in the event queue</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>allocator/mesos/offer_filters/roles/<role>/active</code> |
| </td> |
| <td>Number of active offer filters for all frameworks within the role</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>allocator/mesos/quota/roles/<role>/resources/<resource>/offered_or_allocated</code> |
| </td> |
| <td>Amount of resources considered offered or allocated towards |
| a role's quota guarantee</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>allocator/mesos/quota/roles/<role>/resources/<resource>/guarantee</code> |
| </td> |
| <td>Amount of resources guaranteed for a role via quota</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>allocator/mesos/resources/cpus/offered_or_allocated</code> |
| </td> |
| <td>Number of CPUs offered or allocated</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>allocator/mesos/resources/cpus/total</code> |
| </td> |
| <td>Number of CPUs</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>allocator/mesos/resources/disk/offered_or_allocated</code> |
| </td> |
| <td>Allocated or offered disk space in MB</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>allocator/mesos/resources/disk/total</code> |
| </td> |
| <td>Total disk space in MB</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>allocator/mesos/resources/mem/offered_or_allocated</code> |
| </td> |
| <td>Allocated or offered memory in MB</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>allocator/mesos/resources/mem/total</code> |
| </td> |
| <td>Total memory in MB</td> |
| <td>Gauge</td> |
| </tr> |
| </table> |
| |
| |
| <h3>Basic Alerts</h3> |
| |
| <p>This section lists some examples of basic alerts that you can use to detect |
| abnormal situations in a cluster.</p> |
| |
| <h4>master/uptime_secs is low</h4> |
| |
| <p>The master has restarted.</p> |
| |
| <h4>master/uptime_secs < 60 for sustained periods of time</h4> |
| |
| <p>The cluster has a flapping master node.</p> |
| |
| <h4>master/tasks_lost is increasing rapidly</h4> |
| |
| <p>Tasks in the cluster are disappearing. Possible causes include hardware |
| failures, bugs in one of the frameworks, or bugs in Mesos.</p> |
| |
| <h4>master/slaves_active is low</h4> |
| |
| <p>Agents are having trouble connecting to the master.</p> |
| |
| <h4>master/cpus_percent > 0.9 for sustained periods of time</h4> |
| |
| <p>Cluster CPU utilization is close to capacity.</p> |
| |
| <h4>master/mem_percent > 0.9 for sustained periods of time</h4> |
| |
| <p>Cluster memory utilization is close to capacity.</p> |
| |
| <h4>master/elected is 0 for sustained periods of time</h4> |
| |
| <p>No master is currently elected.</p> |
| |
| <h2>Agent Nodes</h2> |
| |
| <p>Metrics from each agent node are available via the |
| <a href="/documentation/latest/./endpoints/metrics/snapshot/">/metrics/snapshot</a> agent endpoint. The response |
| is a JSON object that contains metrics names and values as key-value pairs.</p> |
| |
| <h3>Observability Metrics</h3> |
| |
| <p>This section lists all available metrics from Mesos agent nodes grouped by |
| category.</p> |
| |
| <h4>Resources</h4> |
| |
| <p>The following metrics provide information about the total resources available in |
| the agent and their current usage.</p> |
| |
| <table class="table table-striped"> |
| <thead> |
| <tr><th>Metric</th><th>Description</th><th>Type</th> |
| </thead> |
| <tr> |
| <td> |
| <code>slave/cpus_percent</code> |
| </td> |
| <td>Percentage of allocated CPUs</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/cpus_used</code> |
| </td> |
| <td>Number of allocated CPUs</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/cpus_total</code> |
| </td> |
| <td>Number of CPUs</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/cpus_revocable_percent</code> |
| </td> |
| <td>Percentage of allocated revocable CPUs</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/cpus_revocable_total</code> |
| </td> |
| <td>Number of revocable CPUs</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/cpus_revocable_used</code> |
| </td> |
| <td>Number of allocated revocable CPUs</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/disk_percent</code> |
| </td> |
| <td>Percentage of allocated disk space</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/disk_used</code> |
| </td> |
| <td>Allocated disk space in MB</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/disk_total</code> |
| </td> |
| <td>Disk space in MB</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/gpus_percent</code> |
| </td> |
| <td>Percentage of allocated GPUs</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/gpus_used</code> |
| </td> |
| <td>Number of allocated GPUs</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/gpus_total</code> |
| </td> |
| <td>Number of GPUs</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/gpus_revocable_percent</code> |
| </td> |
| <td>Percentage of allocated revocable GPUs</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/gpus_revocable_total</code> |
| </td> |
| <td>Number of revocable GPUs</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/gpus_revocable_used</code> |
| </td> |
| <td>Number of allocated revocable GPUs</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/mem_percent</code> |
| </td> |
| <td>Percentage of allocated memory</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/disk_revocable_percent</code> |
| </td> |
| <td>Percentage of allocated revocable disk space</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/disk_revocable_total</code> |
| </td> |
| <td>Revocable disk space in MB</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/disk_revocable_used</code> |
| </td> |
| <td>Allocated revocable disk space in MB</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/mem_used</code> |
| </td> |
| <td>Allocated memory in MB</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/mem_total</code> |
| </td> |
| <td>Memory in MB</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/mem_revocable_percent</code> |
| </td> |
| <td>Percentage of allocated revocable memory</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/mem_revocable_total</code> |
| </td> |
| <td>Revocable memory in MB</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/mem_revocable_used</code> |
| </td> |
| <td>Allocated revocable memory in MB</td> |
| <td>Gauge</td> |
| </tr> |
| </table> |
| |
| |
| <h4>Agent</h4> |
| |
| <p>The following metrics provide information about whether an agent is currently |
| registered with a master and for how long it has been running.</p> |
| |
| <table class="table table-striped"> |
| <thead> |
| <tr><th>Metric</th><th>Description</th><th>Type</th> |
| </thead> |
| <tr> |
| <td> |
| <code>slave/registered</code> |
| </td> |
| <td>Whether this agent is registered with a master</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/uptime_secs</code> |
| </td> |
| <td>Uptime in seconds</td> |
| <td>Gauge</td> |
| </tr> |
| </table> |
| |
| |
| <h4>System</h4> |
| |
| <p>The following metrics provide information about the agent system.</p> |
| |
| <table class="table table-striped"> |
| <thead> |
| <tr><th>Metric</th><th>Description</th><th>Type</th> |
| </thead> |
| <tr> |
| <td> |
| <code>system/cpus_total</code> |
| </td> |
| <td>Number of CPUs available</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>system/load_15min</code> |
| </td> |
| <td>Load average for the past 15 minutes</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>system/load_5min</code> |
| </td> |
| <td>Load average for the past 5 minutes</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>system/load_1min</code> |
| </td> |
| <td>Load average for the past minute</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>system/mem_free_bytes</code> |
| </td> |
| <td>Free memory in bytes</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>system/mem_total_bytes</code> |
| </td> |
| <td>Total memory in bytes</td> |
| <td>Gauge</td> |
| </tr> |
| </table> |
| |
| |
| <h4>Executors</h4> |
| |
| <p>The following metrics provide information about the executor instances running |
| on the agent.</p> |
| |
| <table class="table table-striped"> |
| <thead> |
| <tr><th>Metric</th><th>Description</th><th>Type</th> |
| </thead> |
| <tr> |
| <td> |
| <code>containerizer/mesos/container_destroy_errors</code> |
| </td> |
| <td>Number of containers destroyed due to launch errors</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/container_launch_errors</code> |
| </td> |
| <td>Number of container launch errors</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/executors_preempted</code> |
| </td> |
| <td>Number of executors destroyed due to preemption</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/frameworks_active</code> |
| </td> |
| <td>Number of active frameworks</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/executor_directory_max_allowed_age_secs</code> |
| </td> |
| <td>Maximum allowed age in seconds to delete executor directory</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/executors_registering</code> |
| </td> |
| <td>Number of executors registering</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/executors_running</code> |
| </td> |
| <td>Number of executors running</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/executors_terminated</code> |
| </td> |
| <td>Number of terminated executors</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/executors_terminating</code> |
| </td> |
| <td>Number of terminating executors</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/recovery_errors</code> |
| </td> |
| <td>Number of errors encountered during agent recovery</td> |
| <td>Gauge</td> |
| </tr> |
| </table> |
| |
| |
| <h4>Tasks</h4> |
| |
| <p>The following metrics provide information about active and terminated tasks.</p> |
| |
| <table class="table table-striped"> |
| <thead> |
| <tr><th>Metric</th><th>Description</th><th>Type</th> |
| </thead> |
| <tr> |
| <td> |
| <code>slave/tasks_failed</code> |
| </td> |
| <td>Number of failed tasks</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/tasks_finished</code> |
| </td> |
| <td>Number of finished tasks</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/tasks_killed</code> |
| </td> |
| <td>Number of killed tasks</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/tasks_lost</code> |
| </td> |
| <td>Number of lost tasks</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/tasks_running</code> |
| </td> |
| <td>Number of running tasks</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/tasks_staging</code> |
| </td> |
| <td>Number of staging tasks</td> |
| <td>Gauge</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/tasks_starting</code> |
| </td> |
| <td>Number of starting tasks</td> |
| <td>Gauge</td> |
| </tr> |
| </table> |
| |
| |
| <h4>Messages</h4> |
| |
| <p>The following metrics provide information about messages between the agents and |
| the master it is registered with.</p> |
| |
| <table class="table table-striped"> |
| <thead> |
| <tr><th>Metric</th><th>Description</th><th>Type</th> |
| </thead> |
| <tr> |
| <td> |
| <code>slave/invalid_framework_messages</code> |
| </td> |
| <td>Number of invalid framework messages</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/invalid_status_updates</code> |
| </td> |
| <td>Number of invalid status updates</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/valid_framework_messages</code> |
| </td> |
| <td>Number of valid framework messages</td> |
| <td>Counter</td> |
| </tr> |
| <tr> |
| <td> |
| <code>slave/valid_status_updates</code> |
| </td> |
| <td>Number of valid status updates</td> |
| <td>Counter</td> |
| </tr> |
| </table> |
| |
| |
| </div> |
| </div> |
| |
| </div><!-- /.container --> |
| </div><!-- /.content --> |
| |
| <hr> |
| |
| |
| |
| <!-- footer --> |
| <div class="footer"> |
| <div class="container"> |
| <div class="col-md-4 social-blk"> |
| <span class="social"> |
| <a href="https://twitter.com/ApacheMesos" |
| class="twitter-follow-button" |
| data-show-count="false" data-size="large">Follow @ApacheMesos</a> |
| <script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document, 'script', 'twitter-wjs');</script> |
| <a href="https://twitter.com/intent/tweet?button_hashtag=mesos" |
| class="twitter-hashtag-button" |
| data-size="large" |
| data-related="ApacheMesos">Tweet #mesos</a> |
| <script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document, 'script', 'twitter-wjs');</script> |
| </span> |
| </div> |
| |
| <div class="col-md-8 trademark"> |
| <p>© 2012-2017 <a href="http://apache.org">The Apache Software Foundation</a>. |
| Apache Mesos, the Apache feather logo, and the Apache Mesos project logo are trademarks of The Apache Software Foundation. |
| <p> |
| </div> |
| </div><!-- /.container --> |
| </div><!-- /.footer --> |
| |
| <!-- JS --> |
| <script src="//code.jquery.com/jquery-1.11.0.min.js" type="text/javascript"></script> |
| <script src="//netdna.bootstrapcdn.com/bootstrap/3.1.1/js/bootstrap.min.js" type="text/javascript"></script> |
| </body> |
| </html> |