| ~~ Licensed to the Apache Software Foundation (ASF) under one or more |
| ~~ contributor license agreements. See the NOTICE file distributed with |
| ~~ this work for additional information regarding copyright ownership. |
| ~~ The ASF licenses this file to You under the Apache License, Version 2.0 |
| ~~ (the "License"); you may not use this file except in compliance with |
| ~~ the License. You may obtain a copy of the License at |
| ~~ |
| ~~ http://www.apache.org/licenses/LICENSE-2.0 |
| ~~ |
| ~~ Unless required by applicable law or agreed to in writing, software |
| ~~ distributed under the License is distributed on an "AS IS" BASIS, |
| ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| ~~ See the License for the specific language governing permissions and |
| ~~ limitations under the License. |
| |
| --- |
| Metrics Guide |
| --- |
| --- |
| ${maven.build.timestamp} |
| |
| %{toc} |
| |
| Overview |
| |
| Metrics are statistical information exposed by Hadoop daemons, |
| used for monitoring, performance tuning and debug. |
| There are many metrics available by default |
| and they are very useful for troubleshooting. |
| This page shows the details of the available metrics. |
| |
| Each section describes each context into which metrics are grouped. |
| |
| The documentation of Metrics 2.0 framework is |
| {{{../../api/org/apache/hadoop/metrics2/package-summary.html}here}}. |
| |
| jvm context |
| |
| * JvmMetrics |
| |
| Each metrics record contains tags such as ProcessName, SessionID |
| and Hostname as additional information along with metrics. |
| |
| *-------------------------------------+--------------------------------------+ |
| || Name || Description |
| *-------------------------------------+--------------------------------------+ |
| |<<<MemNonHeapUsedM>>> | Current non-heap memory used in MB |
| *-------------------------------------+--------------------------------------+ |
| |<<<MemNonHeapCommittedM>>> | Current non-heap memory committed in MB |
| *-------------------------------------+--------------------------------------+ |
| |<<<MemNonHeapMaxM>>> | Max non-heap memory size in MB |
| *-------------------------------------+--------------------------------------+ |
| |<<<MemHeapUsedM>>> | Current heap memory used in MB |
| *-------------------------------------+--------------------------------------+ |
| |<<<MemHeapCommittedM>>> | Current heap memory committed in MB |
| *-------------------------------------+--------------------------------------+ |
| |<<<MemHeapMaxM>>> | Max heap memory size in MB |
| *-------------------------------------+--------------------------------------+ |
| |<<<MemMaxM>>> | Max memory size in MB |
| *-------------------------------------+--------------------------------------+ |
| |<<<ThreadsNew>>> | Current number of NEW threads |
| *-------------------------------------+--------------------------------------+ |
| |<<<ThreadsRunnable>>> | Current number of RUNNABLE threads |
| *-------------------------------------+--------------------------------------+ |
| |<<<ThreadsBlocked>>> | Current number of BLOCKED threads |
| *-------------------------------------+--------------------------------------+ |
| |<<<ThreadsWaiting>>> | Current number of WAITING threads |
| *-------------------------------------+--------------------------------------+ |
| |<<<ThreadsTimedWaiting>>> | Current number of TIMED_WAITING threads |
| *-------------------------------------+--------------------------------------+ |
| |<<<ThreadsTerminated>>> | Current number of TERMINATED threads |
| *-------------------------------------+--------------------------------------+ |
| |<<<GcInfo>>> | Total GC count and GC time in msec, grouped by the kind of GC. \ |
| | ex.) GcCountPS Scavenge=6, GCTimeMillisPS Scavenge=40, |
| | GCCountPS MarkSweep=0, GCTimeMillisPS MarkSweep=0 |
| *-------------------------------------+--------------------------------------+ |
| |<<<GcCount>>> | Total GC count |
| *-------------------------------------+--------------------------------------+ |
| |<<<GcTimeMillis>>> | Total GC time in msec |
| *-------------------------------------+--------------------------------------+ |
| |<<<LogFatal>>> | Total number of FATAL logs |
| *-------------------------------------+--------------------------------------+ |
| |<<<LogError>>> | Total number of ERROR logs |
| *-------------------------------------+--------------------------------------+ |
| |<<<LogWarn>>> | Total number of WARN logs |
| *-------------------------------------+--------------------------------------+ |
| |<<<LogInfo>>> | Total number of INFO logs |
| *-------------------------------------+--------------------------------------+ |
| |
| rpc context |
| |
| * rpc |
| |
| Each metrics record contains tags such as Hostname |
| and port (number to which server is bound) |
| as additional information along with metrics. |
| |
| *-------------------------------------+--------------------------------------+ |
| || Name || Description |
| *-------------------------------------+--------------------------------------+ |
| |<<<ReceivedBytes>>> | Total number of received bytes |
| *-------------------------------------+--------------------------------------+ |
| |<<<SentBytes>>> | Total number of sent bytes |
| *-------------------------------------+--------------------------------------+ |
| |<<<RpcQueueTimeNumOps>>> | Total number of RPC calls |
| *-------------------------------------+--------------------------------------+ |
| |<<<RpcQueueTimeAvgTime>>> | Average queue time in milliseconds |
| *-------------------------------------+--------------------------------------+ |
| |<<<RpcProcessingTimeNumOps>>> | Total number of RPC calls (same to |
| | RpcQueueTimeNumOps) |
| *-------------------------------------+--------------------------------------+ |
| |<<<RpcProcessingAvgTime>>> | Average Processing time in milliseconds |
| *-------------------------------------+--------------------------------------+ |
| |<<<RpcAuthenticationFailures>>> | Total number of authentication failures |
| *-------------------------------------+--------------------------------------+ |
| |<<<RpcAuthenticationSuccesses>>> | Total number of authentication successes |
| *-------------------------------------+--------------------------------------+ |
| |<<<RpcAuthorizationFailures>>> | Total number of authorization failures |
| *-------------------------------------+--------------------------------------+ |
| |<<<RpcAuthorizationSuccesses>>> | Total number of authorization successes |
| *-------------------------------------+--------------------------------------+ |
| |<<<NumOpenConnections>>> | Current number of open connections |
| *-------------------------------------+--------------------------------------+ |
| |<<<CallQueueLength>>> | Current length of the call queue |
| *-------------------------------------+--------------------------------------+ |
| |<<<rpcQueueTime>>><num><<<sNumOps>>> | Shows total number of RPC calls |
| | | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to |
| | | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>. |
| *-------------------------------------+--------------------------------------+ |
| |<<<rpcQueueTime>>><num><<<s50thPercentileLatency>>> | |
| | | Shows the 50th percentile of RPC queue time in milliseconds |
| | | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to |
| | | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>. |
| *-------------------------------------+--------------------------------------+ |
| |<<<rpcQueueTime>>><num><<<s75thPercentileLatency>>> | |
| | | Shows the 75th percentile of RPC queue time in milliseconds |
| | | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to |
| | | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>. |
| *-------------------------------------+--------------------------------------+ |
| |<<<rpcQueueTime>>><num><<<s90thPercentileLatency>>> | |
| | | Shows the 90th percentile of RPC queue time in milliseconds |
| | | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to |
| | | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>. |
| *-------------------------------------+--------------------------------------+ |
| |<<<rpcQueueTime>>><num><<<s95thPercentileLatency>>> | |
| | | Shows the 95th percentile of RPC queue time in milliseconds |
| | | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to |
| | | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>. |
| *-------------------------------------+--------------------------------------+ |
| |<<<rpcQueueTime>>><num><<<s99thPercentileLatency>>> | |
| | | Shows the 99th percentile of RPC queue time in milliseconds |
| | | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to |
| | | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>. |
| *-------------------------------------+--------------------------------------+ |
| |<<<rpcProcessingTime>>><num><<<sNumOps>>> | Shows total number of RPC calls |
| | | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to |
| | | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>. |
| *-------------------------------------+--------------------------------------+ |
| |<<<rpcProcessingTime>>><num><<<s50thPercentileLatency>>> | |
| | | Shows the 50th percentile of RPC processing time in milliseconds |
| | | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to |
| | | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>. |
| *-------------------------------------+--------------------------------------+ |
| |<<<rpcProcessingTime>>><num><<<s75thPercentileLatency>>> | |
| | | Shows the 75th percentile of RPC processing time in milliseconds |
| | | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to |
| | | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>. |
| *-------------------------------------+--------------------------------------+ |
| |<<<rpcProcessingTime>>><num><<<s90thPercentileLatency>>> | |
| | | Shows the 90th percentile of RPC processing time in milliseconds |
| | | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to |
| | | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>. |
| *-------------------------------------+--------------------------------------+ |
| |<<<rpcProcessingTime>>><num><<<s95thPercentileLatency>>> | |
| | | Shows the 95th percentile of RPC processing time in milliseconds |
| | | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to |
| | | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>. |
| *-------------------------------------+--------------------------------------+ |
| |<<<rpcProcessingTime>>><num><<<s99thPercentileLatency>>> | |
| | | Shows the 99th percentile of RPC processing time in milliseconds |
| | | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to |
| | | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>. |
| *-------------------------------------+--------------------------------------+ |
| |
| * RetryCache/NameNodeRetryCache |
| |
| RetryCache metrics is useful to monitor NameNode fail-over. |
| Each metrics record contains Hostname tag. |
| |
| *-------------------------------------+--------------------------------------+ |
| || Name || Description |
| *-------------------------------------+--------------------------------------+ |
| |<<<CacheHit>>> | Total number of RetryCache hit |
| *-------------------------------------+--------------------------------------+ |
| |<<<CacheCleared>>> | Total number of RetryCache cleared |
| *-------------------------------------+--------------------------------------+ |
| |<<<CacheUpdated>>> | Total number of RetryCache updated |
| *-------------------------------------+--------------------------------------+ |
| |
| rpcdetailed context |
| |
| Metrics of rpcdetailed context are exposed in unified manner by RPC |
| layer. Two metrics are exposed for each RPC based on its name. |
| Metrics named "(RPC method name)NumOps" indicates total number of |
| method calls, and metrics named "(RPC method name)AvgTime" shows |
| average turn around time for method calls in milliseconds. |
| |
| * rpcdetailed |
| |
| Each metrics record contains tags such as Hostname |
| and port (number to which server is bound) |
| as additional information along with metrics. |
| |
| The Metrics about RPCs which is not called are not included |
| in metrics record. |
| |
| *-------------------------------------+--------------------------------------+ |
| || Name || Description |
| *-------------------------------------+--------------------------------------+ |
| |<methodname><<<NumOps>>> | Total number of the times the method is called |
| *-------------------------------------+--------------------------------------+ |
| |<methodname><<<AvgTime>>> | Average turn around time of the method in |
| | milliseconds |
| *-------------------------------------+--------------------------------------+ |
| |
| dfs context |
| |
| * namenode |
| |
| Each metrics record contains tags such as ProcessName, SessionId, |
| and Hostname as additional information along with metrics. |
| |
| *-------------------------------------+--------------------------------------+ |
| || Name || Description |
| *-------------------------------------+--------------------------------------+ |
| |<<<CreateFileOps>>> | Total number of files created |
| *-------------------------------------+--------------------------------------+ |
| |<<<FilesCreated>>> | Total number of files and directories created by create |
| | or mkdir operations |
| *-------------------------------------+--------------------------------------+ |
| |<<<FilesAppended>>> | Total number of files appended |
| *-------------------------------------+--------------------------------------+ |
| |<<<GetBlockLocations>>> | Total number of getBlockLocations operations |
| *-------------------------------------+--------------------------------------+ |
| |<<<FilesRenamed>>> | Total number of rename <<operations>> (NOT number of |
| | files/dirs renamed) |
| *-------------------------------------+--------------------------------------+ |
| |<<<GetListingOps>>> | Total number of directory listing operations |
| *-------------------------------------+--------------------------------------+ |
| |<<<DeleteFileOps>>> | Total number of delete operations |
| *-------------------------------------+--------------------------------------+ |
| |<<<FilesDeleted>>> | Total number of files and directories deleted by delete |
| | or rename operations |
| *-------------------------------------+--------------------------------------+ |
| |<<<FileInfoOps>>> | Total number of getFileInfo and getLinkFileInfo |
| | operations |
| *-------------------------------------+--------------------------------------+ |
| |<<<AddBlockOps>>> | Total number of addBlock operations succeeded |
| *-------------------------------------+--------------------------------------+ |
| |<<<GetAdditionalDatanodeOps>>> | Total number of getAdditionalDatanode |
| | operations |
| *-------------------------------------+--------------------------------------+ |
| |<<<CreateSymlinkOps>>> | Total number of createSymlink operations |
| *-------------------------------------+--------------------------------------+ |
| |<<<GetLinkTargetOps>>> | Total number of getLinkTarget operations |
| *-------------------------------------+--------------------------------------+ |
| |<<<FilesInGetListingOps>>> | Total number of files and directories listed by |
| | directory listing operations |
| *-------------------------------------+--------------------------------------+ |
| |<<<AllowSnapshotOps>>> | Total number of allowSnapshot operations |
| *-------------------------------------+--------------------------------------+ |
| |<<<DisallowSnapshotOps>>> | Total number of disallowSnapshot operations |
| *-------------------------------------+--------------------------------------+ |
| |<<<CreateSnapshotOps>>> | Total number of createSnapshot operations |
| *-------------------------------------+--------------------------------------+ |
| |<<<DeleteSnapshotOps>>> | Total number of deleteSnapshot operations |
| *-------------------------------------+--------------------------------------+ |
| |<<<RenameSnapshotOps>>> | Total number of renameSnapshot operations |
| *-------------------------------------+--------------------------------------+ |
| |<<<ListSnapshottableDirOps>>> | Total number of snapshottableDirectoryStatus |
| | operations |
| *-------------------------------------+--------------------------------------+ |
| |<<<SnapshotDiffReportOps>>> | Total number of getSnapshotDiffReport |
| | operations |
| *-------------------------------------+--------------------------------------+ |
| |<<<TransactionsNumOps>>> | Total number of Journal transactions |
| *-------------------------------------+--------------------------------------+ |
| |<<<TransactionsAvgTime>>> | Average time of Journal transactions in |
| | milliseconds |
| *-------------------------------------+--------------------------------------+ |
| |<<<SyncsNumOps>>> | Total number of Journal syncs |
| *-------------------------------------+--------------------------------------+ |
| |<<<SyncsAvgTime>>> | Average time of Journal syncs in milliseconds |
| *-------------------------------------+--------------------------------------+ |
| |<<<TransactionsBatchedInSync>>> | Total number of Journal transactions batched |
| | in sync |
| *-------------------------------------+--------------------------------------+ |
| |<<<BlockReportNumOps>>> | Total number of processing block reports from |
| | DataNode |
| *-------------------------------------+--------------------------------------+ |
| |<<<BlockReportAvgTime>>> | Average time of processing block reports in |
| | milliseconds |
| *-------------------------------------+--------------------------------------+ |
| |<<<CacheReportNumOps>>> | Total number of processing cache reports from |
| | DataNode |
| *-------------------------------------+--------------------------------------+ |
| |<<<CacheReportAvgTime>>> | Average time of processing cache reports in |
| | milliseconds |
| *-------------------------------------+--------------------------------------+ |
| |<<<SafeModeTime>>> | The interval between FSNameSystem starts and the last |
| | time safemode leaves in milliseconds. \ |
| | (sometimes not equal to the time in SafeMode, |
| | see {{{https://issues.apache.org/jira/browse/HDFS-5156}HDFS-5156}}) |
| *-------------------------------------+--------------------------------------+ |
| |<<<FsImageLoadTime>>> | Time loading FS Image at startup in milliseconds |
| *-------------------------------------+--------------------------------------+ |
| |<<<FsImageLoadTime>>> | Time loading FS Image at startup in milliseconds |
| *-------------------------------------+--------------------------------------+ |
| |<<<GetEditNumOps>>> | Total number of edits downloads from SecondaryNameNode |
| *-------------------------------------+--------------------------------------+ |
| |<<<GetEditAvgTime>>> | Average edits download time in milliseconds |
| *-------------------------------------+--------------------------------------+ |
| |<<<GetImageNumOps>>> |Total number of fsimage downloads from SecondaryNameNode |
| *-------------------------------------+--------------------------------------+ |
| |<<<GetImageAvgTime>>> | Average fsimage download time in milliseconds |
| *-------------------------------------+--------------------------------------+ |
| |<<<PutImageNumOps>>> | Total number of fsimage uploads to SecondaryNameNode |
| *-------------------------------------+--------------------------------------+ |
| |<<<PutImageAvgTime>>> | Average fsimage upload time in milliseconds |
| *-------------------------------------+--------------------------------------+ |
| |
| * FSNamesystem |
| |
| Each metrics record contains tags such as HAState and Hostname |
| as additional information along with metrics. |
| |
| *-------------------------------------+--------------------------------------+ |
| || Name || Description |
| *-------------------------------------+--------------------------------------+ |
| |<<<MissingBlocks>>> | Current number of missing blocks |
| *-------------------------------------+--------------------------------------+ |
| |<<<ExpiredHeartbeats>>> | Total number of expired heartbeats |
| *-------------------------------------+--------------------------------------+ |
| |<<<TransactionsSinceLastCheckpoint>>> | Total number of transactions since |
| | last checkpoint |
| *-------------------------------------+--------------------------------------+ |
| |<<<TransactionsSinceLastLogRoll>>> | Total number of transactions since last |
| | edit log roll |
| *-------------------------------------+--------------------------------------+ |
| |<<<LastWrittenTransactionId>>> | Last transaction ID written to the edit log |
| *-------------------------------------+--------------------------------------+ |
| |<<<LastCheckpointTime>>> | Time in milliseconds since epoch of last checkpoint |
| *-------------------------------------+--------------------------------------+ |
| |<<<CapacityTotal>>> | Current raw capacity of DataNodes in bytes |
| *-------------------------------------+--------------------------------------+ |
| |<<<CapacityTotalGB>>> | Current raw capacity of DataNodes in GB |
| *-------------------------------------+--------------------------------------+ |
| |<<<CapacityUsed>>> | Current used capacity across all DataNodes in bytes |
| *-------------------------------------+--------------------------------------+ |
| |<<<CapacityUsedGB>>> | Current used capacity across all DataNodes in GB |
| *-------------------------------------+--------------------------------------+ |
| |<<<CapacityRemaining>>> | Current remaining capacity in bytes |
| *-------------------------------------+--------------------------------------+ |
| |<<<CapacityRemainingGB>>> | Current remaining capacity in GB |
| *-------------------------------------+--------------------------------------+ |
| |<<<CapacityUsedNonDFS>>> | Current space used by DataNodes for non DFS |
| | purposes in bytes |
| *-------------------------------------+--------------------------------------+ |
| |<<<TotalLoad>>> | Current number of connections |
| *-------------------------------------+--------------------------------------+ |
| |<<<SnapshottableDirectories>>> | Current number of snapshottable directories |
| *-------------------------------------+--------------------------------------+ |
| |<<<Snapshots>>> | Current number of snapshots |
| *-------------------------------------+--------------------------------------+ |
| |<<<BlocksTotal>>> | Current number of allocated blocks in the system |
| *-------------------------------------+--------------------------------------+ |
| |<<<FilesTotal>>> | Current number of files and directories |
| *-------------------------------------+--------------------------------------+ |
| |<<<PendingReplicationBlocks>>> | Current number of blocks pending to be |
| | replicated |
| *-------------------------------------+--------------------------------------+ |
| |<<<UnderReplicatedBlocks>>> | Current number of blocks under replicated |
| *-------------------------------------+--------------------------------------+ |
| |<<<CorruptBlocks>>> | Current number of blocks with corrupt replicas. |
| *-------------------------------------+--------------------------------------+ |
| |<<<ScheduledReplicationBlocks>>> | Current number of blocks scheduled for |
| | replications |
| *-------------------------------------+--------------------------------------+ |
| |<<<PendingDeletionBlocks>>> | Current number of blocks pending deletion |
| *-------------------------------------+--------------------------------------+ |
| |<<<ExcessBlocks>>> | Current number of excess blocks |
| *-------------------------------------+--------------------------------------+ |
| |<<<PostponedMisreplicatedBlocks>>> | (HA-only) Current number of blocks |
| | postponed to replicate |
| *-------------------------------------+--------------------------------------+ |
| |<<<PendingDataNodeMessageCourt>>> | (HA-only) Current number of pending |
| | block-related messages for later |
| | processing in the standby NameNode |
| *-------------------------------------+--------------------------------------+ |
| |<<<MillisSinceLastLoadedEdits>>> | (HA-only) Time in milliseconds since the |
| | last time standby NameNode load edit log. |
| | In active NameNode, set to 0 |
| *-------------------------------------+--------------------------------------+ |
| |<<<BlockCapacity>>> | Current number of block capacity |
| *-------------------------------------+--------------------------------------+ |
| |<<<StaleDataNodes>>> | Current number of DataNodes marked stale due to delayed |
| | heartbeat |
| *-------------------------------------+--------------------------------------+ |
| |<<<TotalFiles>>> |Current number of files and directories (same as FilesTotal) |
| *-------------------------------------+--------------------------------------+ |
| |
| * JournalNode |
| |
| The server-side metrics for a journal from the JournalNode's perspective. |
| Each metrics record contains Hostname tag as additional information |
| along with metrics. |
| |
| *-------------------------------------+--------------------------------------+ |
| || Name || Description |
| *-------------------------------------+--------------------------------------+ |
| |<<<Syncs60sNumOps>>> | Number of sync operations (1 minute granularity) |
| *-------------------------------------+--------------------------------------+ |
| |<<<Syncs60s50thPercentileLatencyMicros>>> | The 50th percentile of sync |
| | | latency in microseconds (1 minute granularity) |
| *-------------------------------------+--------------------------------------+ |
| |<<<Syncs60s75thPercentileLatencyMicros>>> | The 75th percentile of sync |
| | | latency in microseconds (1 minute granularity) |
| *-------------------------------------+--------------------------------------+ |
| |<<<Syncs60s90thPercentileLatencyMicros>>> | The 90th percentile of sync |
| | | latency in microseconds (1 minute granularity) |
| *-------------------------------------+--------------------------------------+ |
| |<<<Syncs60s95thPercentileLatencyMicros>>> | The 95th percentile of sync |
| | | latency in microseconds (1 minute granularity) |
| *-------------------------------------+--------------------------------------+ |
| |<<<Syncs60s99thPercentileLatencyMicros>>> | The 99th percentile of sync |
| | | latency in microseconds (1 minute granularity) |
| *-------------------------------------+--------------------------------------+ |
| |<<<Syncs300sNumOps>>> | Number of sync operations (5 minutes granularity) |
| *-------------------------------------+--------------------------------------+ |
| |<<<Syncs300s50thPercentileLatencyMicros>>> | The 50th percentile of sync |
| | | latency in microseconds (5 minutes granularity) |
| *-------------------------------------+--------------------------------------+ |
| |<<<Syncs300s75thPercentileLatencyMicros>>> | The 75th percentile of sync |
| | | latency in microseconds (5 minutes granularity) |
| *-------------------------------------+--------------------------------------+ |
| |<<<Syncs300s90thPercentileLatencyMicros>>> | The 90th percentile of sync |
| | | latency in microseconds (5 minutes granularity) |
| *-------------------------------------+--------------------------------------+ |
| |<<<Syncs300s95thPercentileLatencyMicros>>> | The 95th percentile of sync |
| | | latency in microseconds (5 minutes granularity) |
| *-------------------------------------+--------------------------------------+ |
| |<<<Syncs300s99thPercentileLatencyMicros>>> | The 99th percentile of sync |
| | | latency in microseconds (5 minutes granularity) |
| *-------------------------------------+--------------------------------------+ |
| |<<<Syncs3600sNumOps>>> | Number of sync operations (1 hour granularity) |
| *-------------------------------------+--------------------------------------+ |
| |<<<Syncs3600s50thPercentileLatencyMicros>>> | The 50th percentile of sync |
| | | latency in microseconds (1 hour granularity) |
| *-------------------------------------+--------------------------------------+ |
| |<<<Syncs3600s75thPercentileLatencyMicros>>> | The 75th percentile of sync |
| | | latency in microseconds (1 hour granularity) |
| *-------------------------------------+--------------------------------------+ |
| |<<<Syncs3600s90thPercentileLatencyMicros>>> | The 90th percentile of sync |
| | | latency in microseconds (1 hour granularity) |
| *-------------------------------------+--------------------------------------+ |
| |<<<Syncs3600s95thPercentileLatencyMicros>>> | The 95th percentile of sync |
| | | latency in microseconds (1 hour granularity) |
| *-------------------------------------+--------------------------------------+ |
| |<<<Syncs3600s99thPercentileLatencyMicros>>> | The 99th percentile of sync |
| | | latency in microseconds (1 hour granularity) |
| *-------------------------------------+--------------------------------------+ |
| |<<<BatchesWritten>>> | Total number of batches written since startup |
| *-------------------------------------+--------------------------------------+ |
| |<<<TxnsWritten>>> | Total number of transactions written since startup |
| *-------------------------------------+--------------------------------------+ |
| |<<<BytesWritten>>> | Total number of bytes written since startup |
| *-------------------------------------+--------------------------------------+ |
| |<<<BatchesWrittenWhileLagging>>> | Total number of batches written where this |
| | | node was lagging |
| *-------------------------------------+--------------------------------------+ |
| |<<<LastWriterEpoch>>> | Current writer's epoch number |
| *-------------------------------------+--------------------------------------+ |
| |<<<CurrentLagTxns>>> | The number of transactions that this JournalNode is |
| | | lagging |
| *-------------------------------------+--------------------------------------+ |
| |<<<LastWrittenTxId>>> | The highest transaction id stored on this JournalNode |
| *-------------------------------------+--------------------------------------+ |
| |<<<LastPromisedEpoch>>> | The last epoch number which this node has promised |
| | | not to accept any lower epoch, or 0 if no promises have been made |
| *-------------------------------------+--------------------------------------+ |
| |
| * datanode |
| |
| Each metrics record contains tags such as SessionId and Hostname |
| as additional information along with metrics. |
| |
| *-------------------------------------+--------------------------------------+ |
| || Name || Description |
| *-------------------------------------+--------------------------------------+ |
| |<<<BytesWritten>>> | Total number of bytes written to DataNode |
| *-------------------------------------+--------------------------------------+ |
| |<<<BytesRead>>> | Total number of bytes read from DataNode |
| *-------------------------------------+--------------------------------------+ |
| |<<<BlocksWritten>>> | Total number of blocks written to DataNode |
| *-------------------------------------+--------------------------------------+ |
| |<<<BlocksRead>>> | Total number of blocks read from DataNode |
| *-------------------------------------+--------------------------------------+ |
| |<<<BlocksReplicated>>> | Total number of blocks replicated |
| *-------------------------------------+--------------------------------------+ |
| |<<<BlocksRemoved>>> | Total number of blocks removed |
| *-------------------------------------+--------------------------------------+ |
| |<<<BlocksVerified>>> | Total number of blocks verified |
| *-------------------------------------+--------------------------------------+ |
| |<<<BlockVerificationFailures>>> | Total number of verifications failures |
| *-------------------------------------+--------------------------------------+ |
| |<<<BlocksCached>>> | Total number of blocks cached |
| *-------------------------------------+--------------------------------------+ |
| |<<<BlocksUncached>>> | Total number of blocks uncached |
| *-------------------------------------+--------------------------------------+ |
| |<<<ReadsFromLocalClient>>> | Total number of read operations from local client |
| *-------------------------------------+--------------------------------------+ |
| |<<<ReadsFromRemoteClient>>> | Total number of read operations from remote |
| | client |
| *-------------------------------------+--------------------------------------+ |
| |<<<WritesFromLocalClient>>> | Total number of write operations from local |
| | client |
| *-------------------------------------+--------------------------------------+ |
| |<<<WritesFromRemoteClient>>> | Total number of write operations from remote |
| | client |
| *-------------------------------------+--------------------------------------+ |
| |<<<BlocksGetLocalPathInfo>>> | Total number of operations to get local path |
| | names of blocks |
| *-------------------------------------+--------------------------------------+ |
| |<<<FsyncCount>>> | Total number of fsync |
| *-------------------------------------+--------------------------------------+ |
| |<<<VolumeFailures>>> | Total number of volume failures occurred |
| *-------------------------------------+--------------------------------------+ |
| |<<<ReadBlockOpNumOps>>> | Total number of read operations |
| *-------------------------------------+--------------------------------------+ |
| |<<<ReadBlockOpAvgTime>>> | Average time of read operations in milliseconds |
| *-------------------------------------+--------------------------------------+ |
| |<<<WriteBlockOpNumOps>>> | Total number of write operations |
| *-------------------------------------+--------------------------------------+ |
| |<<<WriteBlockOpAvgTime>>> | Average time of write operations in milliseconds |
| *-------------------------------------+--------------------------------------+ |
| |<<<BlockChecksumOpNumOps>>> | Total number of blockChecksum operations |
| *-------------------------------------+--------------------------------------+ |
| |<<<BlockChecksumOpAvgTime>>> | Average time of blockChecksum operations in |
| | milliseconds |
| *-------------------------------------+--------------------------------------+ |
| |<<<CopyBlockOpNumOps>>> | Total number of block copy operations |
| *-------------------------------------+--------------------------------------+ |
| |<<<CopyBlockOpAvgTime>>> | Average time of block copy operations in |
| | milliseconds |
| *-------------------------------------+--------------------------------------+ |
| |<<<ReplaceBlockOpNumOps>>> | Total number of block replace operations |
| *-------------------------------------+--------------------------------------+ |
| |<<<ReplaceBlockOpAvgTime>>> | Average time of block replace operations in |
| | milliseconds |
| *-------------------------------------+--------------------------------------+ |
| |<<<HeartbeatsNumOps>>> | Total number of heartbeats |
| *-------------------------------------+--------------------------------------+ |
| |<<<HeartbeatsAvgTime>>> | Average heartbeat time in milliseconds |
| *-------------------------------------+--------------------------------------+ |
| |<<<BlockReportsNumOps>>> | Total number of block report operations |
| *-------------------------------------+--------------------------------------+ |
| |<<<BlockReportsAvgTime>>> | Average time of block report operations in |
| | milliseconds |
| *-------------------------------------+--------------------------------------+ |
| |<<<CacheReportsNumOps>>> | Total number of cache report operations |
| *-------------------------------------+--------------------------------------+ |
| |<<<CacheReportsAvgTime>>> | Average time of cache report operations in |
| | milliseconds |
| *-------------------------------------+--------------------------------------+ |
| |<<<PacketAckRoundTripTimeNanosNumOps>>> | Total number of ack round trip |
| *-------------------------------------+--------------------------------------+ |
| |<<<PacketAckRoundTripTimeNanosAvgTime>>> | Average time from ack send to |
| | | receive minus the downstream ack time in nanoseconds |
| *-------------------------------------+--------------------------------------+ |
| |<<<FlushNanosNumOps>>> | Total number of flushes |
| *-------------------------------------+--------------------------------------+ |
| |<<<FlushNanosAvgTime>>> | Average flush time in nanoseconds |
| *-------------------------------------+--------------------------------------+ |
| |<<<FsyncNanosNumOps>>> | Total number of fsync |
| *-------------------------------------+--------------------------------------+ |
| |<<<FsyncNanosAvgTime>>> | Average fsync time in nanoseconds |
| *-------------------------------------+--------------------------------------+ |
| |<<<SendDataPacketBlockedOnNetworkNanosNumOps>>> | Total number of sending |
| | packets |
| *-------------------------------------+--------------------------------------+ |
| |<<<SendDataPacketBlockedOnNetworkNanosAvgTime>>> | Average waiting time of |
| | | sending packets in nanoseconds |
| *-------------------------------------+--------------------------------------+ |
| |<<<SendDataPacketTransferNanosNumOps>>> | Total number of sending packets |
| *-------------------------------------+--------------------------------------+ |
| |<<<SendDataPacketTransferNanosAvgTime>>> | Average transfer time of sending |
| | packets in nanoseconds |
| *-------------------------------------+--------------------------------------+ |
| |
| ugi context |
| |
| * UgiMetrics |
| |
| UgiMetrics is related to user and group information. |
| Each metrics record contains Hostname tag as additional information |
| along with metrics. |
| |
| *-------------------------------------+--------------------------------------+ |
| || Name || Description |
| *-------------------------------------+--------------------------------------+ |
| |<<<LoginSuccessNumOps>>> | Total number of successful kerberos logins |
| *-------------------------------------+--------------------------------------+ |
| |<<<LoginSuccessAvgTime>>> | Average time for successful kerberos logins in |
| | milliseconds |
| *-------------------------------------+--------------------------------------+ |
| |<<<LoginFailureNumOps>>> | Total number of failed kerberos logins |
| *-------------------------------------+--------------------------------------+ |
| |<<<LoginFailureAvgTime>>> | Average time for failed kerberos logins in |
| | milliseconds |
| *-------------------------------------+--------------------------------------+ |
| |<<<getGroupsNumOps>>> | Total number of group resolutions |
| *-------------------------------------+--------------------------------------+ |
| |<<<getGroupsAvgTime>>> | Average time for group resolution in milliseconds |
| *-------------------------------------+--------------------------------------+ |
| |<<<getGroups>>><num><<<sNumOps>>> | |
| | | Total number of group resolutions (<num> seconds granularity). <num> is |
| | | specified by <<<hadoop.user.group.metrics.percentiles.intervals>>>. |
| *-------------------------------------+--------------------------------------+ |
| |<<<getGroups>>><num><<<s50thPercentileLatency>>> | |
| | | Shows the 50th percentile of group resolution time in milliseconds |
| | | (<num> seconds granularity). <num> is specified by |
| | | <<<hadoop.user.group.metrics.percentiles.intervals>>>. |
| *-------------------------------------+--------------------------------------+ |
| |<<<getGroups>>><num><<<s75thPercentileLatency>>> | |
| | | Shows the 75th percentile of group resolution time in milliseconds |
| | | (<num> seconds granularity). <num> is specified by |
| | | <<<hadoop.user.group.metrics.percentiles.intervals>>>. |
| *-------------------------------------+--------------------------------------+ |
| |<<<getGroups>>><num><<<s90thPercentileLatency>>> | |
| | | Shows the 90th percentile of group resolution time in milliseconds |
| | | (<num> seconds granularity). <num> is specified by |
| | | <<<hadoop.user.group.metrics.percentiles.intervals>>>. |
| *-------------------------------------+--------------------------------------+ |
| |<<<getGroups>>><num><<<s95thPercentileLatency>>> | |
| | | Shows the 95th percentile of group resolution time in milliseconds |
| | | (<num> seconds granularity). <num> is specified by |
| | | <<<hadoop.user.group.metrics.percentiles.intervals>>>. |
| *-------------------------------------+--------------------------------------+ |
| |<<<getGroups>>><num><<<s99thPercentileLatency>>> | |
| | | Shows the 99th percentile of group resolution time in milliseconds |
| | | (<num> seconds granularity). <num> is specified by |
| | | <<<hadoop.user.group.metrics.percentiles.intervals>>>. |
| *-------------------------------------+--------------------------------------+ |
| |
| metricssystem context |
| |
| * MetricsSystem |
| |
| MetricsSystem shows the statistics for metrics snapshots and publishes. |
| Each metrics record contains Hostname tag as additional information |
| along with metrics. |
| |
| *-------------------------------------+--------------------------------------+ |
| || Name || Description |
| *-------------------------------------+--------------------------------------+ |
| |<<<NumActiveSources>>> | Current number of active metrics sources |
| *-------------------------------------+--------------------------------------+ |
| |<<<NumAllSources>>> | Total number of metrics sources |
| *-------------------------------------+--------------------------------------+ |
| |<<<NumActiveSinks>>> | Current number of active sinks |
| *-------------------------------------+--------------------------------------+ |
| |<<<NumAllSinks>>> | Total number of sinks \ |
| | (BUT usually less than <<<NumActiveSinks>>>, |
| | see {{{https://issues.apache.org/jira/browse/HADOOP-9946}HADOOP-9946}}) |
| *-------------------------------------+--------------------------------------+ |
| |<<<SnapshotNumOps>>> | Total number of operations to snapshot statistics from |
| | a metrics source |
| *-------------------------------------+--------------------------------------+ |
| |<<<SnapshotAvgTime>>> | Average time in milliseconds to snapshot statistics |
| | from a metrics source |
| *-------------------------------------+--------------------------------------+ |
| |<<<PublishNumOps>>> | Total number of operations to publish statistics to a |
| | sink |
| *-------------------------------------+--------------------------------------+ |
| |<<<PublishAvgTime>>> | Average time in milliseconds to publish statistics to |
| | a sink |
| *-------------------------------------+--------------------------------------+ |
| |<<<DroppedPubAll>>> | Total number of dropped publishes |
| *-------------------------------------+--------------------------------------+ |
| |<<<Sink_>>><instance><<<NumOps>>> | Total number of sink operations for the |
| | <instance> |
| *-------------------------------------+--------------------------------------+ |
| |<<<Sink_>>><instance><<<AvgTime>>> | Average time in milliseconds of sink |
| | operations for the <instance> |
| *-------------------------------------+--------------------------------------+ |
| |<<<Sink_>>><instance><<<Dropped>>> | Total number of dropped sink operations |
| | for the <instance> |
| *-------------------------------------+--------------------------------------+ |
| |<<<Sink_>>><instance><<<Qsize>>> | Current queue length of sink operations \ |
| | (BUT always set to 0 because nothing to |
| | increment this metrics, see |
| | {{{https://issues.apache.org/jira/browse/HADOOP-9941}HADOOP-9941}}) |
| *-------------------------------------+--------------------------------------+ |
| |
| default context |
| |
| * StartupProgress |
| |
| StartupProgress metrics shows the statistics of NameNode startup. |
| Four metrics are exposed for each startup phase based on its name. |
| The startup <phase>s are <<<LoadingFsImage>>>, <<<LoadingEdits>>>, |
| <<<SavingCheckpoint>>>, and <<<SafeMode>>>. |
| Each metrics record contains Hostname tag as additional information |
| along with metrics. |
| |
| *-------------------------------------+--------------------------------------+ |
| || Name || Description |
| *-------------------------------------+--------------------------------------+ |
| |<<<ElapsedTime>>> | Total elapsed time in milliseconds |
| *-------------------------------------+--------------------------------------+ |
| |<<<PercentComplete>>> | Current rate completed in NameNode startup progress \ |
| | (The max value is not 100 but 1.0) |
| *-------------------------------------+--------------------------------------+ |
| |<phase><<<Count>>> | Total number of steps completed in the phase |
| *-------------------------------------+--------------------------------------+ |
| |<phase><<<ElapsedTime>>> | Total elapsed time in the phase in milliseconds |
| *-------------------------------------+--------------------------------------+ |
| |<phase><<<Total>>> | Total number of steps in the phase |
| *-------------------------------------+--------------------------------------+ |
| |<phase><<<PercentComplete>>> | Current rate completed in the phase \ |
| | (The max value is not 100 but 1.0) |
| *-------------------------------------+--------------------------------------+ |