// Licensed to the Apache Software Foundation (ASF) under one or more
// contributor license agreements.  See the NOTICE file distributed with
// this work for additional information regarding copyright ownership.
// The ASF licenses this file to You under the Apache License, Version 2.0
// (the "License"); you may not use this file except in compliance with
// the License.  You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
= Metrics

This page describes metrics registers (categories) and the metrics available in each register.


== System


System metrics such as JVM or CPU metrics.

Register name: `sys`

[cols="2,1,3",opts="header"]
|===
|Name    |Type|    Description
|CpuLoad| double|  CPU load.
|CurrentThreadCpuTime  |  long|    ThreadMXBean.getCurrentThreadCpuTime()
|CurrentThreadUserTime|   long   | ThreadMXBean.getCurrentThreadUserTime()
|DaemonThreadCount|   integer| ThreadMXBean.getDaemonThreadCount()
|GcCpuLoad   |double|  GC CPU load.
|PeakThreadCount |integer| ThreadMXBean.getPeakThreadCount
|SystemLoadAverage|   java.lang.Double|    OperatingSystemMXBean.getSystemLoadAverage()
|ThreadCount |integer| ThreadMXBean.getThreadCount
|TotalExecutedTasks  |long|    Total executed tasks.
|TotalStartedThreadCount |long|    ThreadMXBean.getTotalStartedThreadCount
|UpTime|  long  |  RuntimeMxBean.getUptime()
|memory.heap.committed|   long|    MemoryUsage.getHeapMemoryUsage().getCommitted()
|memory.heap.init |   long|    MemoryUsage.getHeapMemoryUsage().getInit()
|memory.heap.used    |long|    MemoryUsage.getHeapMemoryUsage().getUsed()
|memory.nonheap.committed|    long|    MemoryUsage.getNonHeapMemoryUsage().getCommitted()
|memory.nonheap.init |long  |  MemoryUsage.getNonHeapMemoryUsage().getInit()
|memory.nonheap.max  |long  |  MemoryUsage.getNonHeapMemoryUsage().getMax()
|memory.nonheap.used |long  |  MemoryUsage.getNonHeapMemoryUsage().getUsed()
|===


== Caches

Cache metrics.

Register name: `cache.{cache_name}.{near}`

[cols="2,1,3",opts="header"]
|===
|Name | Type | Description
|CacheEvictions | long|The total number of evictions from the cache.
|CacheGets   |long|The total number of gets to the cache.
|CacheHits   |long|The number of get requests that were satisfied by the cache.
|CacheMisses |long|A miss is a get request that is not satisfied.
|CachePuts   |long|The total number of puts to the cache.
|CacheRemovals  | long|The total number of removals from the cache.
|CacheTxCommits | long|Total number of transaction commits.
|CacheTxRollbacks |long|Total number of transaction rollbacks.
|CacheSize|long|Local cache size.
|CommitTime  |histogram  | Commit time in nanoseconds.
|CommitTimeTotal |long| The total time of commit, in nanoseconds.
|EntryProcessorHits | long|The total number of invocations on keys, which exist in cache.
|EntryProcessorInvokeTimeNanos | long|The total time of cache invocations, in nanoseconds.
|EntryProcessorMaxInvocationTime |long|So far, the maximum time to execute cache invokes.
|EntryProcessorMinInvocationTime |long|So far, the minimum time to execute cache invokes.
|EntryProcessorMisses |long|The total number of invocations on keys, which don't exist in cache.
|EntryProcessorPuts   |long|The total number of cache invocations, caused update.
|EntryProcessorReadOnlyInvocations   |long|The total number of cache invocations, caused no updates.
|EntryProcessorRemovals  |long|The total number of cache invocations, caused removals.
|EstimatedRebalancingKeys|long|Number estimated to rebalance keys.
|GetTime |histogram|   Get time in nanoseconds.
|GetTimeTotal|long|The total time of cache gets, in nanoseconds.
|HeapEntriesCount|long|Onheap entries count.
|IndexRebuildKeysProcessed|long | The number of keys with rebuilt indexes.
|IsIndexRebuildInProgress|boolean | True if index build or rebuild is in progress.
|OffHeapBackupEntriesCount|long|Offheap backup entries count.
|OffHeapEntriesCount|long|Offheap entries count.
|OffHeapEvictions|long|The total number of evictions from the off-heap memory.
|OffHeapGets |long|The total number of get requests to the off-heap memory.
|OffHeapHits |long|The number of get requests that were satisfied by the off-heap memory.
|OffHeapMisses   |long|A miss is a get request that is not satisfied by off-heap memory.
|OffHeapPrimaryEntriesCount|long|Offheap primary entries count.
|OffHeapPuts |long|The total number of put requests to the off-heap memory.
|OffHeapRemovals |long|The total number of removals from the off-heap memory.
|PutTime | histogram|   Put time in nanoseconds.
|PutTimeTotal|long|The total time of cache puts, in nanoseconds.
|QueryCompleted  |long|Count of completed queries.
|QueryExecuted   |long|Count of executed queries.
|QueryFailed |long|Count of failed queries.
|QueryMaximumTime |long| Maximum query execution time.
|QueryMinimalTime |long| Minimum query execution time.
|QuerySumTime |long| Query summary time.
|RebalanceClearingPartitionsLeft |long| Number of partitions need to be cleared before actual rebalance start.
|RebalanceStartTime  |long| Rebalance start time.
|RebalancedKeys |long| Number of already rebalanced keys.
|RebalancingBytesRate|long|Estimated rebalancing speed in bytes.
|RebalancingKeysRate |long|Estimated rebalancing speed in keys.
|RemoveTime  |histogram|   Remove time in nanoseconds.
|RemoveTimeTotal |long|The total time of cache removal, in nanoseconds.
|RollbackTime|histogram|   Rollback time in nanoseconds.
|RollbackTimeTotal   |long|The total time of rollback, in nanoseconds.
|TotalRebalancedBytes|long|Number of already rebalanced bytes.
|===

== Cache Groups


Register name: `cacheGroups.{group_name}`

[cols="2,1,3",opts="header"]
|===
|Name | Type | Description
|AffinityPartitionsAssignmentMap |java.util.Map|  Affinity partitions assignment map.
|Caches  |java.util.ArrayList| List of caches
|IndexBuildCountPartitionsLeft |  long|    Number of partitions need processed for finished indexes create or rebuilding.
|LocalNodeMovingPartitionsCount  |integer| Count of partitions with state MOVING for this cache group located on this node.
|LocalNodeOwningPartitionsCount  |integer| Count of partitions with state OWNING for this cache group located on this node.
|LocalNodeRentingEntriesCount |   long|    Count of entries remains to evict in RENTING partitions located on this node for this cache group.
|LocalNodeRentingPartitionsCount |integer| Count of partitions with state RENTING for this cache group located on this node.
|MaximumNumberOfPartitionCopies | integer| Maximum number of partition copies for all partitions of this cache group.
|MinimumNumberOfPartitionCopies  |integer| Minimum number of partition copies for all partitions of this cache group.
|MovingPartitionsAllocationMap   |java.util.Map|  Allocation map of partitions with state MOVING in the cluster.
|OwningPartitionsAllocationMap   |java.util.Map | Allocation map of partitions with state OWNING in the cluster.
|PartitionIds    |java.util.ArrayList| Local partition ids.
|SparseStorageSize  | long|    Storage space allocated for group adjusted for possible sparsity, in bytes.
|StorageSize |long|    Storage space allocated for group, in bytes.
|TotalAllocatedPages |long|    Cache group total allocated pages.
|TotalAllocatedSize  |long|    Total size of memory allocated for group, in bytes.
|ReencryptionBytesLeft |long| The number of bytes left for re-encryption.
|ReencryptionFinished |boolean| The flag indicates whether re-encryption is finished or not.
|===


== Transactions

Transaction metrics.

Register name: `tx`

[cols="2,1,3",opts="header"]
|===
|Name   | Type |    Description
|AllOwnerTransactions|    java.util.HashMap|   Map of local node owning transactions.
|LockedKeysNumber   | long|    The number of keys locked on the node.
|OwnerTransactionsNumber |long|    The number of active transactions for which this node is the initiator.
|TransactionsHoldingLockNumber |  long|    The number of active transactions holding at least one key lock.
|LastCommitTime  |long|    Last commit time.
|nodeSystemTimeHistogram| histogram|   Transactions system times on node represented as histogram.
|nodeUserTimeHistogram|   histogram|   Transactions user times on node represented as histogram.
|LastRollbackTime|    long|    Last rollback time.
|totalNodeSystemTime |long|    Total transactions system time on node.
|totalNodeUserTime   |long|    Total transactions user time on node.
|txCommits   |integer| Number of transaction commits.
|txRollbacks |integer| Number of transaction rollbacks.
|===


== Partition Map Exchange

Partition map exchange metrics.

Register name: `pme`

[cols="2,1,3",opts="header"]
|===
|Name    |Type |   Description
|CacheOperationsBlockedDuration  |long  |  Current PME cache operations blocked duration in milliseconds.
|CacheOperationsBlockedDurationHistogram |histogram |  Histogram of cache operations blocked PME durations in milliseconds.
|Duration    |long |   Current PME duration in milliseconds.
|DurationHistogram |  histogram  | Histogram of PME durations in milliseconds.
|===


== Compute Jobs

Register name: `compute.jobs`

[cols="2,1,3",opts="header"]
|===
|Name|    Type|    Description
|compute.jobs.Active  |long|    Number of active jobs currently executing.
|compute.jobs.Canceled    |long|    Number of cancelled jobs that are still running.
|compute.jobs.ExecutionTime   |long|    Total execution time of jobs.
|compute.jobs.Finished    |long|    Number of finished jobs.
|compute.jobs.Rejected    |long|    Number of jobs rejected after more recent collision resolution operation.
|compute.jobs.Started |long|    Number of started jobs.
|compute.jobs.Waiting |long|    Number of currently queued jobs waiting to be executed.
|compute.jobs.WaitingTime |long|    Total time jobs spent on waiting queue.
|===

== Thread Pools

Register name: `threadPools.{thread_pool_name}`

[cols="2,1,3",opts="header"]
|===
|Name |   Type |   Description
|ActiveCount |long  |  Approximate number of threads that are actively executing tasks.
|CompletedTaskCount|  long |   Approximate total number of tasks that have completed execution.
|CorePoolSize    |long  |  The core number of threads.
|KeepAliveTime|   long  |  Thread keep-alive time, which is the amount of time which threads in excess of the core pool size may remain idle before being terminated.
|LargestPoolSize| long  |  Largest number of threads that have ever simultaneously been in the pool.
|MaximumPoolSize |long  |  The maximum allowed number of threads.
|PoolSize    |long|    Current number of threads in the pool.
|QueueSize   |long |   Current size of the execution queue.
|RejectedExecutionHandlerClass|   string | Class name of current rejection handler.
|Shutdown  |  boolean| True if this executor has been shut down.
|TaskCount |  long |   Approximate total number of tasks that have been scheduled for execution.
|Terminated  |boolean| True if all tasks have completed following shut down.
|Terminating |long|    True if terminating but not yet terminated.
|ThreadFactoryClass|  string|  Class name of thread factory used to create new threads.
|===


== Cache Group IO

Register name: `io.statistics.cacheGroups.{group_name}`


[cols="2,1,3",opts="header"]
|===
|Name |   Type |   Description
|LOGICAL_READS  | long |   Number of logical reads
|PHYSICAL_READS | long |   Number of physical reads
|grpId  | integer | Group id
|name  |  string | Name of the index
|startTime  | long |   Statistics collect start time
|===


== Sorted Indexes

Register name: `io.statistics.sortedIndexes.{cache_name}.{index_name}`

[cols="2,1,3",opts="header"]
|===
|Name |    Type |    Description
|LOGICAL_READS_INNER |long|    Number of logical reads for inner tree node
|LOGICAL_READS_LEAF | long  |  Number of logical reads for leaf tree node
|PHYSICAL_READS_INNER|    long|    Number of physical reads for inner tree node
|PHYSICAL_READS_LEAF| long|    Number of physical reads for leaf tree node
|indexName|   string|  Name of the index
|name|    string|  Name of the cache
|startTime|   long|    Statistics collection start time
|===


== Hash Indexes

Register name: `io.statistics.hashIndexes.{cache_name}.{index_name}`


[cols="2,1,3",opts="header"]
|===
|Name |   Type|    Description
|LOGICAL_READS_INNER| long|    Number of logical reads for inner tree node
|LOGICAL_READS_LEAF|  long|    Number of logical reads for leaf tree node
|PHYSICAL_READS_INNER|    long|    Number of physical reads for inner tree node
|PHYSICAL_READS_LEAF| long|    Number of physical reads for leaf tree node
|indexName|   string|  Name of the index
|name|    string|  Name of the cache
|startTime|   long|    Statistics collection start time
|===


== Communication IO

Register name: `io.communication`


[cols="2,1,3",opts="header"]
|===
|Name|    Type|    Description
|ActiveSessionsCount|   integer|   Active TCP sessions count.
|OutboundMessagesQueueSize|   integer| Outbound messages queue size.
|SentMessagesCount  | integer| Sent messages count.
|SentBytesCount | long  |  Sent bytes count.
|ReceivedBytesCount|  long|    Received bytes count.
|ReceivedMessagesCount|   integer| Received messages count.
|RejectedSslSessionsCount|   integer|   TCP sessions count that were rejected due to the SSL errors (metric is exported only if SSL is enabled).
|SslEnabled|   boolean|   Indicates whether SSL is enabled.
|SslHandshakeDurationHistogram|   histogram|   Histogram of SSL handshake duration in milliseconds (metric is exported only if SSL is enabled).
|===


== Ignite Thin Client Connector

Register name: `client.connector`


[cols="2,1,3",opts="header"]
|===
|Name|    Type|    Description
|ActiveSessionsCount|   integer|   Active TCP sessions count.
|ReceivedBytesCount|   long|   Received bytes count.
|RejectedSslSessionsCount|   integer|   TCP sessions count that were rejected due to the SSL errors (metric is exported only if SSL is enabled).
|RejectedSessionsTimeout|   integer|   TCP sessions count that were rejected due to handshake timeout.
|RejectedSessionsAuthenticationFailed|   integer|   TCP sessions count that were rejected due to failed authentication.
|RejectedSessionsTotal|   integer|   Total number of rejected TCP connections.
|{clientType}.AcceptedSessions|   integer|   Number of successfully established sessions for the client type.
|{clientType}.ActiveSessions|   integer|   Number of active sessions for the client type.
|SentBytesCount|   long|   Sent bytes count.
|SslEnabled|   boolean|   Indicates whether SSL is enabled.
|SslHandshakeDurationHistogram|   histogram|   Histogram of SSL handshake duration in milliseconds (metric is exported only if SSL is enabled).
|===


== Ignite REST Client Connector

Register name: `rest.client`


[cols="2,1,3",opts="header"]
|===
|Name|    Type|    Description
|ActiveSessionsCount|   integer|   Active TCP sessions count.
|ReceivedBytesCount|   long|    Received bytes count.
|RejectedSslSessionsCount|   integer|   TCP sessions count that were rejected due to the SSL errors (metric is exported only if SSL is enabled).
|SentBytesCount|   long|   Sent bytes count.
|SslEnabled|   boolean|   Indicates whether SSL is enabled.
|SslHandshakeDurationHistogram|   histogram|   Histogram of SSL handshake duration in milliseconds (metric is exported only if SSL is enabled).
|===


== Discovery IO

Register name: `io.discovery`


[cols="2,1,3",opts="header"]
|===
|Name|    Type|    Description
|CoordinatorSince|   long|   Timestamp since which the local node became the coordinator (metric is exported only from server nodes).
|Coordinator|   UUID|   Coordinator ID (metric is exported only from server nodes).
|CurrentTopologyVersion|   long|   Current topology version.
|JoinedNodes|   integer|   Joined nodes count.
|LeftNodes|   integer|   Left nodes count.
|MessageWorkerQueueSize|   integer|   Current message worker queue size.
|PendingMessagesRegistered|   integer|   Pending registered messages count.
|RejectedSslConnectionsCount|   integer|   TCP discovery connections count that were rejected due to the SSL errors.
|SslEnabled|   boolean|   Indicates whether SSL is enabled.
|TotalProcessedMessages|   integer|   Total processed messages count.
|TotalReceivedMessages|   integer|   Total received messages count.
|===


== Data Region IO

Register name: `io.dataregion.{data_region_name}`

[cols="2,1,3",opts="header"]
|===
|Name |    Type |    Description
|AllocationRate | long|    Allocation rate (pages per second) averaged across rateTimeInternal.
|CheckpointBufferSize |    long |    Checkpoint buffer size in bytes.
|DirtyPages |  long|    Number of pages in memory not yet synchronized with persistent storage.
|EmptyDataPages|  long|    Calculates empty data pages count for region. It counts only totally free pages that can be reused (e. g. pages that are contained in reuse bucket of free list).
|EvictionRate|    long|    Eviction rate (pages per second).
|LargeEntriesPagesCount|  long|    Count of pages that fully ocupied by large entries that go beyond page size
|OffHeapSize| long|    Offheap size in bytes.
|OffheapUsedSize| long|    Offheap used size in bytes.
|PagesFillFactor| double|  The percentage of the used space.
|PagesRead|   long|    Number of pages read from last restart.
|PagesReplaceAge| long|    Average age at which pages in memory are replaced with pages from persistent storage (milliseconds).
|PagesReplaceRate|    long|    Rate at which pages in memory are replaced with pages from persistent storage (pages per second).
|PagesReplaced|   long|    Number of pages replaced from last restart.
|PagesWritten|    long|    Number of pages written from last restart.
|PhysicalMemoryPages| long|    Number of pages residing in physical RAM.
|PhysicalMemorySize | long|    Gets total size of pages loaded to the RAM, in bytes
|TotalAllocatedPages |long|    Total number of allocated pages.
|TotalAllocatedSize|  long  |  Gets a total size of memory allocated in the data region, in bytes
|TotalThrottlingTime| long|    Total throttling threads time in milliseconds. The Ignite throttles threads that generate dirty pages during the ongoing checkpoint.
|UsedCheckpointBufferSize  |  long|    Gets used checkpoint buffer size in bytes

|===


== Data Storage

Data Storage metrics.

Register name: `io.datastorage`

[cols="2,1,3",opts="header"]
|===
|Name |    Type |    Description
|CheckpointBeforeLockHistogram| histogram |   Histogram of checkpoint action before taken write lock duration in milliseconds.
|CheckpointFsyncHistogram| histogram |   Histogram of checkpoint fsync duration in milliseconds.
|CheckpointHistogram| histogram |   Histogram of checkpoint duration in milliseconds.
|CheckpointListenersExecuteHistogram| histogram |   Histogram of checkpoint execution listeners under write lock duration in milliseconds.
|CheckpointLockHoldHistogram| histogram |   Histogram of checkpoint lock hold duration in milliseconds.
|CheckpointLockWaitHistogram| histogram |   Histogram of checkpoint lock wait duration in milliseconds.
|CheckpointMarkHistogram| histogram |   Histogram of checkpoint mark duration in milliseconds.
|CheckpointPagesWriteHistogram| histogram |   Histogram of checkpoint pages write duration in milliseconds.
|CheckpointSplitAndSortPagesHistogram| histogram |   Histogram of splitting and sorting checkpoint pages duration in milliseconds.
|CheckpointTotalTime| long |   Total duration of checkpoint
|CheckpointWalRecordFsyncHistogram| histogram |   Histogram of the WAL fsync after logging ChTotalNodeseckpointRecord on begin of checkpoint duration in milliseconds.
|CheckpointWriteEntryHistogram| histogram |   Histogram of entry buffer writing to file duration in milliseconds.
|LastCheckpointBeforeLockDuration|  long |   Duration of the checkpoint action before taken write lock in milliseconds.
|LastCheckpointCopiedOnWritePagesNumber|  long |   Number of pages copied to a temporary checkpoint buffer during the last checkpoint.
|LastCheckpointDataPagesNumber|   long  |  Total number of data pages written during the last checkpoint.
|LastCheckpointDuration | long  |  Duration of the last checkpoint in milliseconds.
|LastCheckpointFsyncDuration| long  |  Duration of the sync phase of the last checkpoint in milliseconds.
|LastCheckpointListenersExecuteDuration|  long|    Duration of the checkpoint execution listeners under write lock in milliseconds.
|LastCheckpointLockHoldDuration|  long|    Duration of the checkpoint lock hold in milliseconds.
|LastCheckpointLockWaitDuration|  long|    Duration of the checkpoint lock wait in milliseconds.
|LastCheckpointMarkDuration | long  |  Duration of the checkpoint mark in milliseconds.
|LastCheckpointPagesWriteDuration|    long|    Duration of the checkpoint pages write in milliseconds.
|LastCheckpointTotalPagesNumber|  long|    Total number of pages written during the last checkpoint.
|LastCheckpointSplitAndSortPagesDuration|  long|    Duration of splitting and sorting checkpoint pages of the last checkpoint in milliseconds.
|LastCheckpointStart|  long|    Start timestamp of the last checkpoint.
|LastCheckpointWalRecordFsyncDuration|  long|    Duration of the WAL fsync after logging CheckpointRecord on the start of the last checkpoint in milliseconds.
|LastCheckpointWriteEntryDuration|  long|    Duration of entry buffer writing to file of the last checkpoint in milliseconds.
|SparseStorageSize  | long|    Storage space allocated adjusted for possible sparsity, in bytes.
|StorageSize | long|    Storage space allocated, in bytes.
|WalArchiveSegments | integer| Current number of WAL segments in the WAL archive.
|WalBuffPollSpinsRate|    long  |  WAL buffer poll spins number over the last time interval.
|WalFsyncTimeDuration |   long |   Total duration of fsync
|WalFsyncTimeNum |long  |  Total count of fsync
|WalLastRollOverTime |long |   Time of the last WAL segment rollover.
|WalLoggingRate | long|    Average number of WAL records per second written during the last time interval.
|WalTotalSize|    long  |  Total size in bytes for storage wal files.
|WalWritingRate|  long  |  Average number of bytes per second written during the last time interval.
|===


== Cluster

Cluster metrics.

Register name: `cluster`


[cols="2,1,3",opts="header"]
|===
|Name|    Type|    Description
|ActiveBaselineNodes| integer | Active baseline nodes count.
|Rebalanced| boolean | True if the cluster has fully achieved rebalanced state. Note that an inactive cluster always has this metric in False regardless of the real partitions state.
|TotalBaselineNodes| integer | Total baseline nodes count.
|TotalClientNodes| integer | Client nodes count.
|TotalServerNodes| integer | Server nodes count.
|===
