docs/_docs/monitoring-metrics/intro.adoc - ignite - Git at Google

 // Licensed to the Apache Software Foundation (ASF) under one or more
 // contributor license agreements.  See the NOTICE file distributed with
 // this work for additional information regarding copyright ownership.
 // The ASF licenses this file to You under the Apache License, Version 2.0
 // (the "License"); you may not use this file except in compliance with
 // the License.  You may obtain a copy of the License at
 //
 // http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing, software
 // distributed under the License is distributed on an "AS IS" BASIS,
 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 // See the License for the specific language governing permissions and
 // limitations under the License.
 = Introduction: Monitoring and Metrics

 This chapter covers monitoring and metrics for Ignite. We'll start with an overview of the methods available for monitoring, and then we'll delve into the Ignite specifics, including a list of JMX metrics and MBeans.

 == Overview
 The basic task of monitoring in Ignite involves metrics. You have several approaches for accessing metrics:

 -  via link:monitoring-metrics/metrics[JMX]
 -  Programmatically
 -  link:monitoring-metrics/system-views[System views]


 == What to Monitor
 You can start by monitoring:

   - Each node in isolation
   - The Connection between nodes
   - The system as a whole

 Note that a node consists of several layers: hardware, the operating system, the Virtual Machine (JVM, etc.), and the application. You need to check all of these levels, and the *network* surrounding it.

   - Hardware (Hypervisor): CPU/Memory/Disk => System Logs/Cloud Provider's Logs
   - Operating System
   - JVM: GC Logs, JMX, Java Flight Recorder, Thread Dumps, Heap dumps, etc.
   - Application: Logs, JMX, Throughput/Latency, Test queries
       * For log based monitoring, the key is that you can act proactively, watch the logs for trends/etc., don't just wait to check the logs until something breaks.
   - Network: ping monitoring, network hardware monitoring, TCP dumps

 This should give you a good place to start for setting up monitoring of your hardware, operating system, and network. To monitor the application layer (the nodes that make up your in-memory computing solution), you'll need to perform Ignite-specific monitoring via metrics you access with JMX/Beans or programmatically.


 == Global vs. Node-specific Metrics

 The information exposed through different metrics has different scope (applicability), and may be different depending on the node where you get the metrics.
 The following list explains different metric scopes.

 *Global metrics*:: Provide information about the cluster in general, for example: the number nodes, state of the cluster. This information is available on any node of the cluster.

 *Node-specific metrics*:: Provide information specific to the node on which you obtain the metrics, for example: memory consumption, data region metrics, WAL size, queue size, etc.

 Cache-related metrics can be global as well as node-specific.
 For example, the total number of entries in a cache is a global metric, and you can obtain it on any node.
 You can also get the number of entries of the cache that are stored on a specific node, in which case it will be a node-specific metric.
	// Licensed to the Apache Software Foundation (ASF) under one or more
	// contributor license agreements. See the NOTICE file distributed with
	// this work for additional information regarding copyright ownership.
	// The ASF licenses this file to You under the Apache License, Version 2.0
	// (the "License"); you may not use this file except in compliance with
	// the License. You may obtain a copy of the License at
	//
	// http://www.apache.org/licenses/LICENSE-2.0
	//
	// Unless required by applicable law or agreed to in writing, software
	// distributed under the License is distributed on an "AS IS" BASIS,
	// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	// See the License for the specific language governing permissions and
	// limitations under the License.
	= Introduction: Monitoring and Metrics

	This chapter covers monitoring and metrics for Ignite. We'll start with an overview of the methods available for monitoring, and then we'll delve into the Ignite specifics, including a list of JMX metrics and MBeans.

	== Overview
	The basic task of monitoring in Ignite involves metrics. You have several approaches for accessing metrics:

	- via link:monitoring-metrics/metrics[JMX]
	- Programmatically
	- link:monitoring-metrics/system-views[System views]


	== What to Monitor
	You can start by monitoring:

	- Each node in isolation
	- The Connection between nodes
	- The system as a whole

	Note that a node consists of several layers: hardware, the operating system, the Virtual Machine (JVM, etc.), and the application. You need to check all of these levels, and the network surrounding it.

	- Hardware (Hypervisor): CPU/Memory/Disk => System Logs/Cloud Provider's Logs
	- Operating System
	- JVM: GC Logs, JMX, Java Flight Recorder, Thread Dumps, Heap dumps, etc.
	- Application: Logs, JMX, Throughput/Latency, Test queries
	* For log based monitoring, the key is that you can act proactively, watch the logs for trends/etc., don't just wait to check the logs until something breaks.
	- Network: ping monitoring, network hardware monitoring, TCP dumps

	This should give you a good place to start for setting up monitoring of your hardware, operating system, and network. To monitor the application layer (the nodes that make up your in-memory computing solution), you'll need to perform Ignite-specific monitoring via metrics you access with JMX/Beans or programmatically.


	== Global vs. Node-specific Metrics

	The information exposed through different metrics has different scope (applicability), and may be different depending on the node where you get the metrics.
	The following list explains different metric scopes.

	Global metrics:: Provide information about the cluster in general, for example: the number nodes, state of the cluster. This information is available on any node of the cluster.

	Node-specific metrics:: Provide information specific to the node on which you obtain the metrics, for example: memory consumption, data region metrics, WAL size, queue size, etc.

	Cache-related metrics can be global as well as node-specific.
	For example, the total number of entries in a cache is a global metric, and you can obtain it on any node.
	You can also get the number of entries of the cache that are stored on a specific node, in which case it will be a node-specific metric.