blob: 7bce59dcb0096c61bf51a7c27905995057cbd50f [file] [log] [blame]
// Licensed to the Apache Software Foundation (ASF) under one or more
// contributor license agreements. See the NOTICE file distributed with
// this work for additional information regarding copyright ownership.
// The ASF licenses this file to You under the Apache License, Version 2.0
// (the "License"); you may not use this file except in compliance with
// the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
= Metrics
This page describes metrics registers (categories) and the metrics available in each register.
== System
System metrics such as JVM or CPU metrics.
Register name: `sys`
[cols="2,1,3",opts="header"]
|===
|Name |Type| Description
|CpuLoad| double| CPU load.
|CurrentThreadCpuTime | long| ThreadMXBean.getCurrentThreadCpuTime()
|CurrentThreadUserTime| long | ThreadMXBean.getCurrentThreadUserTime()
|DaemonThreadCount| integer| ThreadMXBean.getDaemonThreadCount()
|GcCpuLoad |double| GC CPU load.
|PeakThreadCount |integer| ThreadMXBean.getPeakThreadCount
|SystemLoadAverage| java.lang.Double| OperatingSystemMXBean.getSystemLoadAverage()
|ThreadCount |integer| ThreadMXBean.getThreadCount
|TotalExecutedTasks |long| Total executed tasks.
|TotalStartedThreadCount |long| ThreadMXBean.getTotalStartedThreadCount
|UpTime| long | RuntimeMxBean.getUptime()
|memory.heap.committed| long| MemoryUsage.getHeapMemoryUsage().getCommitted()
|memory.heap.init | long| MemoryUsage.getHeapMemoryUsage().getInit()
|memory.heap.used |long| MemoryUsage.getHeapMemoryUsage().getUsed()
|memory.nonheap.committed| long| MemoryUsage.getNonHeapMemoryUsage().getCommitted()
|memory.nonheap.init |long | MemoryUsage.getNonHeapMemoryUsage().getInit()
|memory.nonheap.max |long | MemoryUsage.getNonHeapMemoryUsage().getMax()
|memory.nonheap.used |long | MemoryUsage.getNonHeapMemoryUsage().getUsed()
|===
== Caches
Cache metrics.
Register name: `cache.{cache_name}.{near}`
[cols="2,1,3",opts="header"]
|===
|Name | Type | Description
|CacheEvictions | long|The total number of evictions from the cache.
|CacheGets |long|The total number of gets to the cache.
|CacheHits |long|The number of get requests that were satisfied by the cache.
|CacheMisses |long|A miss is a get request that is not satisfied.
|CachePuts |long|The total number of puts to the cache.
|CacheRemovals | long|The total number of removals from the cache.
|CacheTxCommits | long|Total number of transaction commits.
|CacheTxRollbacks |long|Total number of transaction rollbacks.
|CacheSize|long|Local cache size.
|CommitTime |histogram | Commit time in nanoseconds.
|CommitTimeTotal |long| The total time of commit, in nanoseconds.
|EntryProcessorHits | long|The total number of invocations on keys, which exist in cache.
|EntryProcessorInvokeTimeNanos | long | The total time of cache invocations for which this node is the initiator, in nanoseconds.
|EntryProcessorMaxInvocationTime |long | So far, the maximum time to execute cache invokes for which this node is the initiator.
|EntryProcessorMinInvocationTime |long | So far, the minimum time to execute cache invokes for which this node is the initiator.
|EntryProcessorMisses |long|The total number of invocations on keys, which don't exist in cache.
|EntryProcessorPuts |long|The total number of cache invocations, caused update.
|EntryProcessorReadOnlyInvocations |long|The total number of cache invocations, caused no updates.
|EntryProcessorRemovals |long|The total number of cache invocations, caused removals.
|EstimatedRebalancingKeys|long|Number estimated to rebalance keys.
|GetAllTime | histogram | GetAll time for which this node is the initiator, in nanoseconds.
|GetTime | histogram | Get time for which this node is the initiator, in nanoseconds.
|GetTimeTotal | long | The total time of cache gets for which this node is the initiator, in nanoseconds.
|HeapEntriesCount|long|Onheap entries count.
|IndexRebuildKeysProcessed|long | The number of keys with rebuilt indexes.
|IsIndexRebuildInProgress|boolean | True if index build or rebuild is in progress.
|OffHeapBackupEntriesCount|long|Offheap backup entries count.
|OffHeapEntriesCount|long|Offheap entries count.
|OffHeapEvictions|long|The total number of evictions from the off-heap memory.
|OffHeapGets |long|The total number of get requests to the off-heap memory.
|OffHeapHits |long|The number of get requests that were satisfied by the off-heap memory.
|OffHeapMisses |long|A miss is a get request that is not satisfied by off-heap memory.
|OffHeapPrimaryEntriesCount|long|Offheap primary entries count.
|OffHeapPuts |long|The total number of put requests to the off-heap memory.
|OffHeapRemovals |long|The total number of removals from the off-heap memory.
|PutAllTime | histogram | PutAll time for which this node is the initiator, in nanoseconds.
|PutTime | histogram | Put time for which this node is the initiator, in nanoseconds.
|PutTimeTotal | long | The total time of cache puts for which this node is the initiator, in nanoseconds.
|QueryCompleted |long|Count of completed queries.
|QueryExecuted |long|Count of executed queries.
|QueryFailed |long|Count of failed queries.
|QueryMaximumTime |long| Maximum query execution time.
|QueryMinimalTime |long| Minimum query execution time.
|QuerySumTime |long| Query summary time.
|RebalanceClearingPartitionsLeft |long| Number of partitions need to be cleared before actual rebalance start.
|RebalanceStartTime |long| Rebalance start time.
|RebalancedKeys |long| Number of already rebalanced keys.
|RebalancingBytesRate|long|Estimated rebalancing speed in bytes.
|RebalancingKeysRate |long|Estimated rebalancing speed in keys.
|RemoveAllTime | histogram | RemoveAll time for which this node is the initiator, in nanoseconds.
|RemoveTime | histogram | Remove time for which this node is the initiator. in nanoseconds.
|RemoveTimeTotal | long | The total time of cache removal, in nanoseconds.
|RollbackTime|histogram| Rollback time in nanoseconds.
|RollbackTimeTotal |long|The total time of rollback, in nanoseconds.
|TotalRebalancedBytes|long|Number of already rebalanced bytes.
|===
== Cache Groups
Register name: `cacheGroups.{group_name}`
[cols="2,1,3",opts="header"]
|===
|Name | Type | Description
|AffinityPartitionsAssignmentMap |java.util.Map| Affinity partitions assignment map.
|Caches |java.util.ArrayList| List of caches
|IndexBuildCountPartitionsLeft | long| Number of partitions need processed for finished indexes create or rebuilding.
|LocalNodeMovingPartitionsCount |integer| Count of partitions with state MOVING for this cache group located on this node.
|LocalNodeOwningPartitionsCount |integer| Count of partitions with state OWNING for this cache group located on this node.
|LocalNodeRentingEntriesCount | long| Count of entries remains to evict in RENTING partitions located on this node for this cache group.
|LocalNodeRentingPartitionsCount |integer| Count of partitions with state RENTING for this cache group located on this node.
|MaximumNumberOfPartitionCopies | integer| Maximum number of partition copies for all partitions of this cache group.
|MinimumNumberOfPartitionCopies |integer| Minimum number of partition copies for all partitions of this cache group.
|MovingPartitionsAllocationMap |java.util.Map| Allocation map of partitions with state MOVING in the cluster.
|OwningPartitionsAllocationMap |java.util.Map | Allocation map of partitions with state OWNING in the cluster.
|PartitionIds |java.util.ArrayList| Local partition ids.
|SparseStorageSize | long| Storage space allocated for group adjusted for possible sparsity, in bytes.
|StorageSize |long| Storage space allocated for group, in bytes.
|TotalAllocatedPages |long| Cache group total allocated pages.
|TotalAllocatedSize |long| Total size of memory allocated for group, in bytes.
|ReencryptionBytesLeft |long| The number of bytes left for re-encryption.
|ReencryptionFinished |boolean| The flag indicates whether re-encryption is finished or not.
|===
== Transactions
Transaction metrics.
Register name: `tx`
[cols="2,1,3",opts="header"]
|===
|Name | Type | Description
|AllOwnerTransactions| java.util.HashMap| Map of local node owning transactions.
|LockedKeysNumber | long| The number of keys locked on the node.
|OwnerTransactionsNumber |long| The number of active transactions for which this node is the initiator.
|TransactionsHoldingLockNumber | long| The number of active transactions holding at least one key lock.
|LastCommitTime |long| Last commit time.
|nodeSystemTimeHistogram| histogram| Transactions system times on node represented as histogram.
|nodeUserTimeHistogram| histogram| Transactions user times on node represented as histogram.
|LastRollbackTime| long| Last rollback time.
|totalNodeSystemTime |long| Total transactions system time on node.
|totalNodeUserTime |long| Total transactions user time on node.
|txCommits |integer| Number of transaction commits.
|txRollbacks |integer| Number of transaction rollbacks.
|===
== Partition Map Exchange
Partition map exchange metrics.
Register name: `pme`
[cols="2,1,3",opts="header"]
|===
|Name |Type | Description
|CacheOperationsBlockedDuration |long | Current PME cache operations blocked duration in milliseconds.
|CacheOperationsBlockedDurationHistogram |histogram | Histogram of cache operations blocked PME durations in milliseconds.
|Duration |long | Current PME duration in milliseconds.
|DurationHistogram | histogram | Histogram of PME durations in milliseconds.
|===
== Compute Jobs
Register name: `compute.jobs`
[cols="2,1,3",opts="header"]
|===
|Name| Type| Description
|compute.jobs.Active |long| Number of active jobs currently executing.
|compute.jobs.Canceled |long| Number of cancelled jobs that are still running.
|compute.jobs.ExecutionTime |long| Total execution time of jobs.
|compute.jobs.Finished |long| Number of finished jobs.
|compute.jobs.Rejected |long| Number of jobs rejected after more recent collision resolution operation.
|compute.jobs.Started |long| Number of started jobs.
|compute.jobs.Waiting |long| Number of currently queued jobs waiting to be executed.
|compute.jobs.WaitingTime |long| Total time jobs spent on waiting queue.
|===
== Thread Pools
Register name: `threadPools.{thread_pool_name}`
[cols="2,1,3",opts="header"]
|===
|Name | Type | Description
|ActiveCount |long | Approximate number of threads that are actively executing tasks.
|CompletedTaskCount| long | Approximate total number of tasks that have completed execution.
|CorePoolSize |long | The core number of threads.
|KeepAliveTime| long | Thread keep-alive time, which is the amount of time which threads in excess of the core pool size may remain idle before being terminated.
|LargestPoolSize| long | Largest number of threads that have ever simultaneously been in the pool.
|MaximumPoolSize |long | The maximum allowed number of threads.
|PoolSize |long| Current number of threads in the pool.
|QueueSize |long | Current size of the execution queue.
|RejectedExecutionHandlerClass| string | Class name of current rejection handler.
|Shutdown | boolean| True if this executor has been shut down.
|TaskCount | long | Approximate total number of tasks that have been scheduled for execution.
|TaskExecutionTime | histogram | Task execution time, in milliseconds.
|Terminated |boolean| True if all tasks have completed following shut down.
|Terminating |long| True if terminating but not yet terminated.
|ThreadFactoryClass| string| Class name of thread factory used to create new threads.
|===
== Cache Group IO
Register name: `io.statistics.cacheGroups.{group_name}`
[cols="2,1,3",opts="header"]
|===
|Name | Type | Description
|LOGICAL_READS | long | Number of logical reads
|PHYSICAL_READS | long | Number of physical reads
|grpId | integer | Group id
|name | string | Name of the index
|startTime | long | Statistics collect start time
|===
== Sorted Indexes I/O statistics
Register name: `io.statistics.sortedIndexes.{cache_name}.{index_name}`
[cols="2,1,3",opts="header"]
|===
|Name | Type | Description
|LOGICAL_READS_INNER |long| Number of logical reads for inner tree node
|LOGICAL_READS_LEAF | long | Number of logical reads for leaf tree node
|PHYSICAL_READS_INNER| long| Number of physical reads for inner tree node
|PHYSICAL_READS_LEAF| long| Number of physical reads for leaf tree node
|indexName| string| Name of the index
|name| string| Name of the cache
|startTime| long| Statistics collection start time
|===
== Sorted Indexes operations
Contains metrics about low-level operations (such as `Insert`, `Search`, etc.) on pages of sorted secondary indexes.
Register name: `index.{schema_name}.{table_name}.{index_name}`
[cols="2,1,3",opts="header"]
|===
|Name | Type | Description
|{opType}Count| long| Count of {opType} operations on index.
|{opType}Time| long| Total duration (nanoseconds) of {opType} operations on index.
|===
== Hash Indexes I/O statistics
Register name: `io.statistics.hashIndexes.{cache_name}.{index_name}`
[cols="2,1,3",opts="header"]
|===
|Name | Type| Description
|LOGICAL_READS_INNER| long| Number of logical reads for inner tree node
|LOGICAL_READS_LEAF| long| Number of logical reads for leaf tree node
|PHYSICAL_READS_INNER| long| Number of physical reads for inner tree node
|PHYSICAL_READS_LEAF| long| Number of physical reads for leaf tree node
|indexName| string| Name of the index
|name| string| Name of the cache
|startTime| long| Statistics collection start time
|===
== Communication IO
Register name: `io.communication`
[cols="2,1,3",opts="header"]
|===
|Name| Type| Description
|ActiveSessionsCount| integer| Active TCP sessions count.
|OutboundMessagesQueueSize| integer| Outbound messages queue size.
|SentMessagesCount | integer| Sent messages count.
|SentBytesCount | long | Sent bytes count.
|ReceivedBytesCount| long| Received bytes count.
|ReceivedMessagesCount| integer| Received messages count.
|RejectedSslSessionsCount| integer| TCP sessions count that were rejected due to the SSL errors (metric is exported only if SSL is enabled).
|SslEnabled| boolean| Indicates whether SSL is enabled.
|SslHandshakeDurationHistogram| histogram| Histogram of SSL handshake duration in milliseconds (metric is exported only if SSL is enabled).
|===
== Ignite Thin Client Connector
Register name: `client.connector`
[cols="2,1,3",opts="header"]
|===
|Name| Type| Description
|ActiveSessionsCount| integer| Active TCP sessions count.
|ReceivedBytesCount| long| Received bytes count.
|RejectedSslSessionsCount| integer| TCP sessions count that were rejected due to the SSL errors (metric is exported only if SSL is enabled).
|RejectedSessionsTimeout| integer| TCP sessions count that were rejected due to handshake timeout.
|RejectedSessionsAuthenticationFailed| integer| TCP sessions count that were rejected due to failed authentication.
|RejectedSessionsTotal| integer| Total number of rejected TCP connections.
|{clientType}.AcceptedSessions| integer| Number of successfully established sessions for the client type.
|{clientType}.ActiveSessions| integer| Number of active sessions for the client type.
|SentBytesCount| long| Sent bytes count.
|SslEnabled| boolean| Indicates whether SSL is enabled.
|SslHandshakeDurationHistogram| histogram| Histogram of SSL handshake duration in milliseconds (metric is exported only if SSL is enabled).
|AffinityKeyRequestsHits| long| The number of affinity-aware cache key requests that were sent to the primary node.
|AffinityKeyRequestsMisses| long| The number of affinity-aware cache key requests that were sent not to the primary node.
|AffinityQueryRequestsHits| long| The number of affinity-aware query requests that were sent to the primary node.
|AffinityQueryRequestsMisses| long| The number of affinity-aware query requests that were sent not to the primary node.
|===
== Ignite REST Client Connector
Register name: `rest.client`
[cols="2,1,3",opts="header"]
|===
|Name| Type| Description
|ActiveSessionsCount| integer| Active TCP sessions count.
|ReceivedBytesCount| long| Received bytes count.
|RejectedSslSessionsCount| integer| TCP sessions count that were rejected due to the SSL errors (metric is exported only if SSL is enabled).
|SentBytesCount| long| Sent bytes count.
|SslEnabled| boolean| Indicates whether SSL is enabled.
|SslHandshakeDurationHistogram| histogram| Histogram of SSL handshake duration in milliseconds (metric is exported only if SSL is enabled).
|===
== Discovery IO
Register name: `io.discovery`
[cols="2,1,3",opts="header"]
|===
|Name| Type| Description
|CoordinatorSince| long| Timestamp since which the local node became the coordinator (metric is exported only from server nodes).
|Coordinator| UUID| Coordinator ID (metric is exported only from server nodes).
|CurrentTopologyVersion| long| Current topology version.
|JoinedNodes| integer| Joined nodes count.
|LeftNodes| integer| Left nodes count.
|MessageWorkerQueueSize| integer| Current message worker queue size.
|PendingMessagesRegistered| integer| Pending registered messages count.
|RejectedSslConnectionsCount| integer| TCP discovery connections count that were rejected due to the SSL errors.
|SslEnabled| boolean| Indicates whether SSL is enabled.
|TotalProcessedMessages| integer| Total processed messages count.
|TotalReceivedMessages| integer| Total received messages count.
|===
== Data Region IO
Register name: `io.dataregion.{data_region_name}`
[cols="2,1,3",opts="header"]
|===
|Name | Type | Description
|AllocationRate | hitrate| Allocation rate (pages per second) averaged across rateTimeInterval.
|CheckpointBufferSize | long | Checkpoint buffer size in bytes.
|DirtyPages | long| Number of pages in memory not yet synchronized with persistent storage.
|EmptyDataPages| long| Calculates empty data pages count for region. It counts only totally free pages that can be reused (e. g. pages that are contained in reuse bucket of free list).
|EvictionRate| hitrate| Eviction rate (pages per second).
|LargeEntriesPagesCount| long| Count of pages that fully ocupied by large entries that go beyond page size
|OffHeapSize| long| Offheap size in bytes.
|OffheapUsedSize| long| Offheap used size in bytes.
|PagesFillFactor| double| The percentage of the used space.
|PagesRead| long| Number of pages read from last restart.
|PagesReplaceAge| hitrate| Average age at which pages in memory are replaced with pages from persistent storage (milliseconds).
|PagesReplaceRate| hitrate| Rate at which pages in memory are replaced with pages from persistent storage (pages per second).
|PagesReplaced| long| Number of pages replaced from last restart.
|PagesWritten| long| Number of pages written from last restart.
|PhysicalMemoryPages| long| Number of pages residing in physical RAM.
|PhysicalMemorySize | long| Gets total size of pages loaded to the RAM, in bytes
|TotalAllocatedPages |long| Total number of allocated pages.
|TotalAllocatedSize| long | Gets a total size of memory allocated in the data region, in bytes
|TotalThrottlingTime| long| Total throttling threads time in milliseconds. The Ignite throttles threads that generate dirty pages during the ongoing checkpoint.
|UsedCheckpointBufferSize | long| Gets used checkpoint buffer size in bytes
|===
== Data Storage
Data Storage metrics.
Register name: `io.datastorage`
[cols="2,1,3",opts="header"]
|===
|Name | Type | Description
|CheckpointBeforeLockHistogram| histogram | Histogram of checkpoint action before taken write lock duration in milliseconds.
|CheckpointFsyncHistogram| histogram | Histogram of checkpoint fsync duration in milliseconds.
|CheckpointHistogram| histogram | Histogram of checkpoint duration in milliseconds.
|CheckpointListenersExecuteHistogram| histogram | Histogram of checkpoint execution listeners under write lock duration in milliseconds.
|CheckpointLockHoldHistogram| histogram | Histogram of checkpoint lock hold duration in milliseconds.
|CheckpointLockWaitHistogram| histogram | Histogram of checkpoint lock wait duration in milliseconds.
|CheckpointMarkHistogram| histogram | Histogram of checkpoint mark duration in milliseconds.
|CheckpointPagesWriteHistogram| histogram | Histogram of checkpoint pages write duration in milliseconds.
|CheckpointSplitAndSortPagesHistogram| histogram | Histogram of splitting and sorting checkpoint pages duration in milliseconds.
|CheckpointTotalTime| long | Total duration of checkpoint
|CheckpointWalRecordFsyncHistogram| histogram | Histogram of the WAL fsync after logging ChTotalNodeseckpointRecord on begin of checkpoint duration in milliseconds.
|CheckpointWriteEntryHistogram| histogram | Histogram of entry buffer writing to file duration in milliseconds.
|LastCheckpointBeforeLockDuration| long | Duration of the checkpoint action before taken write lock in milliseconds.
|LastCheckpointCopiedOnWritePagesNumber| long | Number of pages copied to a temporary checkpoint buffer during the last checkpoint.
|LastCheckpointDataPagesNumber| long | Total number of data pages written during the last checkpoint.
|LastCheckpointDuration | long | Duration of the last checkpoint in milliseconds.
|LastCheckpointFsyncDuration| long | Duration of the sync phase of the last checkpoint in milliseconds.
|LastCheckpointListenersExecuteDuration| long| Duration of the checkpoint execution listeners under write lock in milliseconds.
|LastCheckpointLockHoldDuration| long| Duration of the checkpoint lock hold in milliseconds.
|LastCheckpointLockWaitDuration| long| Duration of the checkpoint lock wait in milliseconds.
|LastCheckpointMarkDuration | long | Duration of the checkpoint mark in milliseconds.
|LastCheckpointPagesWriteDuration| long| Duration of the checkpoint pages write in milliseconds.
|LastCheckpointTotalPagesNumber| long| Total number of pages written during the last checkpoint.
|LastCheckpointSplitAndSortPagesDuration| long| Duration of splitting and sorting checkpoint pages of the last checkpoint in milliseconds.
|LastCheckpointStart| long| Start timestamp of the last checkpoint.
|LastCheckpointWalRecordFsyncDuration| long| Duration of the WAL fsync after logging CheckpointRecord on the start of the last checkpoint in milliseconds.
|LastCheckpointWriteEntryDuration| long| Duration of entry buffer writing to file of the last checkpoint in milliseconds.
|SparseStorageSize | long| Storage space allocated adjusted for possible sparsity, in bytes.
|StorageSize | long| Storage space allocated, in bytes.
|WalArchiveSegments | integer| Current number of WAL segments in the WAL archive.
|WalBuffPollSpinsRate| hitrate | WAL buffer poll spins number over the last time interval.
|WalFsyncTimeDuration | hitrate | Total duration of fsync
|WalFsyncTimeNum |hitrate | Total count of fsync
|WalLastRollOverTime |long | Time of the last WAL segment rollover.
|WalLoggingRate | hitrate| Average number of WAL records per second written during the last time interval.
|WalTotalSize| long | Total size in bytes for storage wal files.
|WalWritingRate| hitrate | Average number of bytes per second written during the last time interval.
|===
== Cluster
Cluster metrics.
Register name: `cluster`
[cols="2,1,3",opts="header"]
|===
|Name| Type| Description
|ActiveBaselineNodes| integer | Active baseline nodes count.
|Rebalanced| boolean | True if the cluster has fully achieved rebalanced state. Note that an inactive cluster always has this metric in False regardless of the real partitions state.
|TotalBaselineNodes| integer | Total baseline nodes count.
|TotalClientNodes| integer | Client nodes count.
|TotalServerNodes| integer | Server nodes count.
|===
== Cache processor
Cache processor metrics.
Register name: `cache`
[cols="2,1,3",opts="header"]
|===
|Name| Type| Description
|LastDataVer| long | The latest data version on the node.
|DataVersionClusterId| integer | Data version cluster id.
|===
== SQL parser metrics
Register name: `sql.parser.cache`
[cols="2,1,3",opts="header"]
|===
|Name| Type| Description
|hits| long | The number of SQL queries that were found in the parsers cache (doesn't require to be parsed and planned before execution).
|misses| long | The number of SQL queries that were parsed and planned.
|===
== SQL executor metrics
Register name: `sql.queries.user`
[cols="2,1,3",opts="header"]
|===
|Name| Type| Description
|success| long | The number of succesfully executed SQL queries.
|failed| long | The number of failed SQL queries (including canceled).
|canceled| long | The number of canceled SQL queries.
|===