| = Monitoring |
| |
| Metrics in Cassandra are managed using the |
| http://metrics.dropwizard.io[Dropwizard Metrics] library. These metrics |
| can be queried via JMX or pushed to external monitoring systems using a |
| number of |
| http://metrics.dropwizard.io/3.1.0/getting-started/#other-reporting[built |
| in] and http://metrics.dropwizard.io/3.1.0/manual/third-party/[third |
| party] reporter plugins. |
| |
| Metrics are collected for a single node. It's up to the operator to use |
| an external monitoring system to aggregate them. |
| |
| == Metric Types |
| |
| All metrics reported by cassandra fit into one of the following types. |
| |
| `Gauge`:: |
| An instantaneous measurement of a value. |
| `Counter`:: |
| A gauge for an `AtomicLong` instance. Typically this is consumed by |
| monitoring the change since the last call to see if there is a large |
| increase compared to the norm. |
| `Histogram`:: |
| Measures the statistical distribution of values in a stream of data. |
| + |
| In addition to minimum, maximum, mean, etc., it also measures median, |
| 75th, 90th, 95th, 98th, 99th, and 99.9th percentiles. |
| `Timer`:: |
| Measures both the rate that a particular piece of code is called and |
| the histogram of its duration. |
| `Latency`:: |
| Special type that tracks latency (in microseconds) with a `Timer` plus |
| a `Counter` that tracks the total latency accrued since starting. The |
| former is useful if you track the change in total latency since the |
| last check. Each metric name of this type will have 'Latency' and |
| 'TotalLatency' appended to it. |
| `Meter`:: |
| A meter metric which measures mean throughput and one-, five-, and |
| fifteen-minute exponentially-weighted moving average throughputs. |
| |
| == Table Metrics |
| |
| Each table in Cassandra has metrics responsible for tracking its state |
| and performance. |
| |
| The metric names are all appended with the specific `Keyspace` and |
| `Table` name. |
| |
| Reported name format: |
| |
| *Metric Name*:: |
| `org.apache.cassandra.metrics.Table.<MetricName>.<Keyspace>.<Table>` |
| *JMX MBean*:: |
| `org.apache.cassandra.metrics:type=Table keyspace=<Keyspace> scope=<Table> name=<MetricName>` |
| |
| [NOTE] |
| .Note |
| ==== |
| There is a special table called '`all`' without a keyspace. This |
| represents the aggregation of metrics across *all* tables and keyspaces |
| on the node. |
| ====[cols=",,",options="header",] |
| |=== |
| |Name |Type |Description |
| |MemtableOnHeapSize |Gauge<Long> |Total amount of data stored in the |
| memtable that resides *on*-heap, including column related overhead and |
| partitions overwritten. |
| |
| |MemtableOffHeapSize |Gauge<Long> |Total amount of data stored in the |
| memtable that resides *off*-heap, including column related overhead and |
| partitions overwritten. |
| |
| |MemtableLiveDataSize |Gauge<Long> |Total amount of live data stored in |
| the memtable, excluding any data structure overhead. |
| |
| |AllMemtablesOnHeapSize |Gauge<Long> |Total amount of data stored in the |
| memtables (2i and pending flush memtables included) that resides |
| *on*-heap. |
| |
| |AllMemtablesOffHeapSize |Gauge<Long> |Total amount of data stored in |
| the memtables (2i and pending flush memtables included) that resides |
| *off*-heap. |
| |
| |AllMemtablesLiveDataSize |Gauge<Long> |Total amount of live data stored |
| in the memtables (2i and pending flush memtables included) that resides |
| off-heap, excluding any data structure overhead. |
| |
| |MemtableColumnsCount |Gauge<Long> |Total number of columns present in |
| the memtable. |
| |
| |MemtableSwitchCount |Counter |Number of times flush has resulted in the |
| memtable being switched out. |
| |
| |CompressionRatio |Gauge<Double> |Current compression ratio for all |
| SSTables. |
| |
| |EstimatedPartitionSizeHistogram |Gauge<long[]> |Histogram of estimated |
| partition size (in bytes). |
| |
| |EstimatedPartitionCount |Gauge<Long> |Approximate number of keys in |
| table. |
| |
| |EstimatedColumnCountHistogram |Gauge<long[]> |Histogram of estimated |
| number of columns. |
| |
| |SSTablesPerReadHistogram |Histogram |Histogram of the number of sstable |
| data files accessed per single partition read. SSTables skipped due to |
| Bloom Filters, min-max key or partition index lookup are not taken into |
| acoount. |
| |
| |ReadLatency |Latency |Local read latency for this table. |
| |
| |RangeLatency |Latency |Local range scan latency for this table. |
| |
| |WriteLatency |Latency |Local write latency for this table. |
| |
| |CoordinatorReadLatency |Timer |Coordinator read latency for this table. |
| |
| |CoordinatorWriteLatency |Timer |Coordinator write latency for this |
| table. |
| |
| |CoordinatorScanLatency |Timer |Coordinator range scan latency for this |
| table. |
| |
| |PendingFlushes |Counter |Estimated number of flush tasks pending for |
| this table. |
| |
| |BytesFlushed |Counter |Total number of bytes flushed since server |
| [re]start. |
| |
| |CompactionBytesWritten |Counter |Total number of bytes written by |
| compaction since server [re]start. |
| |
| |PendingCompactions |Gauge<Integer> |Estimate of number of pending |
| compactions for this table. |
| |
| |LiveSSTableCount |Gauge<Integer> |Number of SSTables on disk for this |
| table. |
| |
| |LiveDiskSpaceUsed |Counter |Disk space used by SSTables belonging to |
| this table (in bytes). |
| |
| |TotalDiskSpaceUsed |Counter |Total disk space used by SSTables |
| belonging to this table, including obsolete ones waiting to be GC'd. |
| |
| |MinPartitionSize |Gauge<Long> |Size of the smallest compacted partition |
| (in bytes). |
| |
| |MaxPartitionSize |Gauge<Long> |Size of the largest compacted partition |
| (in bytes). |
| |
| |MeanPartitionSize |Gauge<Long> |Size of the average compacted partition |
| (in bytes). |
| |
| |BloomFilterFalsePositives |Gauge<Long> |Number of false positives on |
| table's bloom filter. |
| |
| |BloomFilterFalseRatio |Gauge<Double> |False positive ratio of table's |
| bloom filter. |
| |
| |BloomFilterDiskSpaceUsed |Gauge<Long> |Disk space used by bloom filter |
| (in bytes). |
| |
| |BloomFilterOffHeapMemoryUsed |Gauge<Long> |Off-heap memory used by |
| bloom filter. |
| |
| |IndexSummaryOffHeapMemoryUsed |Gauge<Long> |Off-heap memory used by |
| index summary. |
| |
| |CompressionMetadataOffHeapMemoryUsed |Gauge<Long> |Off-heap memory used |
| by compression meta data. |
| |
| |KeyCacheHitRate |Gauge<Double> |Key cache hit rate for this table. |
| |
| |TombstoneScannedHistogram |Histogram |Histogram of tombstones scanned |
| in queries on this table. |
| |
| |LiveScannedHistogram |Histogram |Histogram of live cells scanned in |
| queries on this table. |
| |
| |ColUpdateTimeDeltaHistogram |Histogram |Histogram of column update time |
| delta on this table. |
| |
| |ViewLockAcquireTime |Timer |Time taken acquiring a partition lock for |
| materialized view updates on this table. |
| |
| |ViewReadTime |Timer |Time taken during the local read of a materialized |
| view update. |
| |
| |TrueSnapshotsSize |Gauge<Long> |Disk space used by snapshots of this |
| table including all SSTable components. |
| |
| |RowCacheHitOutOfRange |Counter |Number of table row cache hits that do |
| not satisfy the query filter, thus went to disk. |
| |
| |RowCacheHit |Counter |Number of table row cache hits. |
| |
| |RowCacheMiss |Counter |Number of table row cache misses. |
| |
| |CasPrepare |Latency |Latency of paxos prepare round. |
| |
| |CasPropose |Latency |Latency of paxos propose round. |
| |
| |CasCommit |Latency |Latency of paxos commit round. |
| |
| |PercentRepaired |Gauge<Double> |Percent of table data that is repaired |
| on disk. |
| |
| |BytesRepaired |Gauge<Long> |Size of table data repaired on disk |
| |
| |BytesUnrepaired |Gauge<Long> |Size of table data unrepaired on disk |
| |
| |BytesPendingRepair |Gauge<Long> |Size of table data isolated for an |
| ongoing incremental repair |
| |
| |SpeculativeRetries |Counter |Number of times speculative retries were |
| sent for this table. |
| |
| |SpeculativeFailedRetries |Counter |Number of speculative retries that |
| failed to prevent a timeout |
| |
| |SpeculativeInsufficientReplicas |Counter |Number of speculative retries |
| that couldn't be attempted due to lack of replicas |
| |
| |SpeculativeSampleLatencyNanos |Gauge<Long> |Number of nanoseconds to |
| wait before speculation is attempted. Value may be statically configured |
| or updated periodically based on coordinator latency. |
| |
| |WaitingOnFreeMemtableSpace |Histogram |Histogram of time spent waiting |
| for free memtable space, either on- or off-heap. |
| |
| |DroppedMutations |Counter |Number of dropped mutations on this table. |
| |
| |AnticompactionTime |Timer |Time spent anticompacting before a |
| consistent repair. |
| |
| |ValidationTime |Timer |Time spent doing validation compaction during |
| repair. |
| |
| |SyncTime |Timer |Time spent doing streaming during repair. |
| |
| |BytesValidated |Histogram |Histogram over the amount of bytes read |
| during validation. |
| |
| |PartitionsValidated |Histogram |Histogram over the number of partitions |
| read during validation. |
| |
| |BytesAnticompacted |Counter |How many bytes we anticompacted. |
| |
| |BytesMutatedAnticompaction |Counter |How many bytes we avoided |
| anticompacting because the sstable was fully contained in the repaired |
| range. |
| |
| |MutatedAnticompactionGauge |Gauge<Double> |Ratio of bytes mutated vs |
| total bytes repaired. |
| |=== |
| |
| == Keyspace Metrics |
| |
| Each keyspace in Cassandra has metrics responsible for tracking its |
| state and performance. |
| |
| Most of these metrics are the same as the `Table Metrics` above, only |
| they are aggregated at the Keyspace level. The keyspace specific metrics |
| are specified in the table below. |
| |
| Reported name format: |
| |
| *Metric Name*:: |
| `org.apache.cassandra.metrics.keyspace.<MetricName>.<Keyspace>` |
| *JMX MBean*:: |
| `org.apache.cassandra.metrics:type=Keyspace scope=<Keyspace> name=<MetricName>` |
| |
| [cols=",,",options="header",] |
| |=== |
| |Name |Type |Description |
| |WriteFailedIdeaCL |Counter |Number of writes that failed to achieve the |
| configured ideal consistency level or 0 if none is configured |
| |
| |IdealCLWriteLatency |Latency |Coordinator latency of writes at the |
| configured ideal consistency level. No values are recorded if ideal |
| consistency level is not configured |
| |
| |RepairTime |Timer |Total time spent as repair coordinator. |
| |
| |RepairPrepareTime |Timer |Total time spent preparing for repair. |
| |=== |
| |
| == ThreadPool Metrics |
| |
| Cassandra splits work of a particular type into its own thread pool. |
| This provides back-pressure and asynchrony for requests on a node. It's |
| important to monitor the state of these thread pools since they can tell |
| you how saturated a node is. |
| |
| The metric names are all appended with the specific `ThreadPool` name. |
| The thread pools are also categorized under a specific type. |
| |
| Reported name format: |
| |
| *Metric Name*:: |
| `org.apache.cassandra.metrics.ThreadPools.<MetricName>.<Path>.<ThreadPoolName>` |
| *JMX MBean*:: |
| `org.apache.cassandra.metrics:type=ThreadPools path=<Path> scope=<ThreadPoolName> name=<MetricName>` |
| |
| [cols=",,",options="header",] |
| |=== |
| |Name |Type |Description |
| |ActiveTasks |Gauge<Integer> |Number of tasks being actively worked on |
| by this pool. |
| |
| |PendingTasks |Gauge<Integer> |Number of queued tasks queued up on this |
| pool. |
| |
| |CompletedTasks |Counter |Number of tasks completed. |
| |
| |TotalBlockedTasks |Counter |Number of tasks that were blocked due to |
| queue saturation. |
| |
| |CurrentlyBlockedTask |Counter |Number of tasks that are currently |
| blocked due to queue saturation but on retry will become unblocked. |
| |
| |MaxPoolSize |Gauge<Integer> |The maximum number of threads in this |
| pool. |
| |
| |MaxTasksQueued |Gauge<Integer> |The maximum number of tasks queued |
| before a task get blocked. |
| |=== |
| |
| The following thread pools can be monitored. |
| |
| [cols=",,",options="header",] |
| |=== |
| |Name |Type |Description |
| |Native-Transport-Requests |transport |Handles client CQL requests |
| |
| |CounterMutationStage |request |Responsible for counter writes |
| |
| |ViewMutationStage |request |Responsible for materialized view writes |
| |
| |MutationStage |request |Responsible for all other writes |
| |
| |ReadRepairStage |request |ReadRepair happens on this thread pool |
| |
| |ReadStage |request |Local reads run on this thread pool |
| |
| |RequestResponseStage |request |Coordinator requests to the cluster run |
| on this thread pool |
| |
| |AntiEntropyStage |internal |Builds merkle tree for repairs |
| |
| |CacheCleanupExecutor |internal |Cache maintenance performed on this |
| thread pool |
| |
| |CompactionExecutor |internal |Compactions are run on these threads |
| |
| |GossipStage |internal |Handles gossip requests |
| |
| |HintsDispatcher |internal |Performs hinted handoff |
| |
| |InternalResponseStage |internal |Responsible for intra-cluster |
| callbacks |
| |
| |MemtableFlushWriter |internal |Writes memtables to disk |
| |
| |MemtablePostFlush |internal |Cleans up commit log after memtable is |
| written to disk |
| |
| |MemtableReclaimMemory |internal |Memtable recycling |
| |
| |MigrationStage |internal |Runs schema migrations |
| |
| |MiscStage |internal |Misceleneous tasks run here |
| |
| |PendingRangeCalculator |internal |Calculates token range |
| |
| |PerDiskMemtableFlushWriter_0 |internal |Responsible for writing a spec |
| (there is one of these per disk 0-N) |
| |
| |Sampler |internal |Responsible for re-sampling the index summaries of |
| SStables |
| |
| |SecondaryIndexManagement |internal |Performs updates to secondary |
| indexes |
| |
| |ValidationExecutor |internal |Performs validation compaction or |
| scrubbing |
| |
| |ViewBuildExecutor |internal |Performs materialized views initial build |
| |=== |
| |
| == Client Request Metrics |
| |
| Client requests have their own set of metrics that encapsulate the work |
| happening at coordinator level. |
| |
| Different types of client requests are broken down by `RequestType`. |
| |
| Reported name format: |
| |
| *Metric Name*:: |
| `org.apache.cassandra.metrics.ClientRequest.<MetricName>.<RequestType>` |
| *JMX MBean*:: |
| `org.apache.cassandra.metrics:type=ClientRequest scope=<RequestType> name=<MetricName>` |
| |
| RequestType:: |
| CASRead |
| Description:: |
| Metrics related to transactional read requests. |
| Metrics:: |
| [cols=",,",options="header",] |
| |=== |
| |Name |Type |Description |
| |Timeouts |Counter |Number of timeouts encountered. |
| |
| |Failures |Counter |Number of transaction failures encountered. |
| |
| |Â |Latency |Transaction read latency. |
| |
| |Unavailables |Counter |Number of unavailable exceptions encountered. |
| |
| |UnfinishedCommit |Counter |Number of transactions that were committed |
| on read. |
| |
| |ConditionNotMet |Counter |Number of transaction preconditions did not |
| match current values. |
| |
| |ContentionHistogram |Histogram |How many contended reads were |
| encountered |
| |=== |
| RequestType:: |
| CASWrite |
| Description:: |
| Metrics related to transactional write requests. |
| Metrics:: |
| [cols=",,",options="header",] |
| |=== |
| |Name |Type |Description |
| |Timeouts |Counter |Number of timeouts encountered. |
| |
| |Failures |Counter |Number of transaction failures encountered. |
| |
| |Â |Latency |Transaction write latency. |
| |
| |UnfinishedCommit |Counter |Number of transactions that were committed |
| on write. |
| |
| |ConditionNotMet |Counter |Number of transaction preconditions did not |
| match current values. |
| |
| |ContentionHistogram |Histogram |How many contended writes were |
| encountered |
| |
| |MutationSizeHistogram |Histogram |Total size in bytes of the requests |
| mutations. |
| |=== |
| RequestType:: |
| Read |
| Description:: |
| Metrics related to standard read requests. |
| Metrics:: |
| [cols=",,",options="header",] |
| |=== |
| |Name |Type |Description |
| |Timeouts |Counter |Number of timeouts encountered. |
| |Failures |Counter |Number of read failures encountered. |
| |Â |Latency |Read latency. |
| |Unavailables |Counter |Number of unavailable exceptions encountered. |
| |=== |
| RequestType:: |
| RangeSlice |
| Description:: |
| Metrics related to token range read requests. |
| Metrics:: |
| [cols=",,",options="header",] |
| |=== |
| |Name |Type |Description |
| |Timeouts |Counter |Number of timeouts encountered. |
| |Failures |Counter |Number of range query failures encountered. |
| |Â |Latency |Range query latency. |
| |Unavailables |Counter |Number of unavailable exceptions encountered. |
| |=== |
| RequestType:: |
| Write |
| Description:: |
| Metrics related to regular write requests. |
| Metrics:: |
| [cols=",,",options="header",] |
| |=== |
| |Name |Type |Description |
| |Timeouts |Counter |Number of timeouts encountered. |
| |
| |Failures |Counter |Number of write failures encountered. |
| |
| |Â |Latency |Write latency. |
| |
| |Unavailables |Counter |Number of unavailable exceptions encountered. |
| |
| |MutationSizeHistogram |Histogram |Total size in bytes of the requests |
| mutations. |
| |=== |
| RequestType:: |
| ViewWrite |
| Description:: |
| Metrics related to materialized view write wrtes. |
| Metrics:: |
| [cols=",,",] |
| |=== |
| |Timeouts |Counter |Number of timeouts encountered. |
| |
| |Failures |Counter |Number of transaction failures encountered. |
| |
| |Unavailables |Counter |Number of unavailable exceptions encountered. |
| |
| |ViewReplicasAttempted |Counter |Total number of attempted view |
| replica writes. |
| |
| |ViewReplicasSuccess |Counter |Total number of succeded view replica |
| writes. |
| |
| |ViewPendingMutations |Gauge<Long> |ViewReplicasAttempted - |
| ViewReplicasSuccess. |
| |
| |ViewWriteLatency |Timer |Time between when mutation is applied to |
| base table and when CL.ONE is achieved on view. |
| |=== |
| |
| == Cache Metrics |
| |
| Cassandra caches have metrics to track the effectivness of the caches. |
| Though the `Table Metrics` might be more useful. |
| |
| Reported name format: |
| |
| *Metric Name*:: |
| `org.apache.cassandra.metrics.Cache.<MetricName>.<CacheName>` |
| *JMX MBean*:: |
| `org.apache.cassandra.metrics:type=Cache scope=<CacheName> name=<MetricName>` |
| |
| [cols=",,",options="header",] |
| |=== |
| |Name |Type |Description |
| |Capacity |Gauge<Long> |Cache capacity in bytes. |
| |Entries |Gauge<Integer> |Total number of cache entries. |
| |FifteenMinuteCacheHitRate |Gauge<Double> |15m cache hit rate. |
| |FiveMinuteCacheHitRate |Gauge<Double> |5m cache hit rate. |
| |OneMinuteCacheHitRate |Gauge<Double> |1m cache hit rate. |
| |HitRate |Gauge<Double> |All time cache hit rate. |
| |Hits |Meter |Total number of cache hits. |
| |Misses |Meter |Total number of cache misses. |
| |MissLatency |Timer |Latency of misses. |
| |Requests |Gauge<Long> |Total number of cache requests. |
| |Size |Gauge<Long> |Total size of occupied cache, in bytes. |
| |=== |
| |
| The following caches are covered: |
| |
| [cols=",",options="header",] |
| |=== |
| |Name |Description |
| |CounterCache |Keeps hot counters in memory for performance. |
| |ChunkCache |In process uncompressed page cache. |
| |KeyCache |Cache for partition to sstable offsets. |
| |RowCache |Cache for rows kept in memory. |
| |=== |
| |
| [NOTE] |
| .Note |
| ==== |
| Misses and MissLatency are only defined for the ChunkCache |
| ====== CQL Metrics |
| |
| Metrics specific to CQL prepared statement caching. |
| |
| Reported name format: |
| |
| *Metric Name*:: |
| `org.apache.cassandra.metrics.CQL.<MetricName>` |
| *JMX MBean*:: |
| `org.apache.cassandra.metrics:type=CQL name=<MetricName>` |
| |
| [cols=",,",options="header",] |
| |=== |
| |Name |Type |Description |
| |PreparedStatementsCount |Gauge<Integer> |Number of cached prepared |
| statements. |
| |
| |PreparedStatementsEvicted |Counter |Number of prepared statements |
| evicted from the prepared statement cache |
| |
| |PreparedStatementsExecuted |Counter |Number of prepared statements |
| executed. |
| |
| |RegularStatementsExecuted |Counter |Number of *non* prepared statements |
| executed. |
| |
| |PreparedStatementsRatio |Gauge<Double> |Percentage of statements that |
| are prepared vs unprepared. |
| |=== |
| |
| [[dropped-metrics]] |
| == DroppedMessage Metrics |
| |
| Metrics specific to tracking dropped messages for different types of |
| requests. Dropped writes are stored and retried by `Hinted Handoff` |
| |
| Reported name format: |
| |
| *Metric Name*:: |
| `org.apache.cassandra.metrics.DroppedMessage.<MetricName>.<Type>` |
| *JMX MBean*:: |
| `org.apache.cassandra.metrics:type=DroppedMessage scope=<Type> name=<MetricName>` |
| |
| [cols=",,",options="header",] |
| |=== |
| |Name |Type |Description |
| |CrossNodeDroppedLatency |Timer |The dropped latency across nodes. |
| |InternalDroppedLatency |Timer |The dropped latency within node. |
| |Dropped |Meter |Number of dropped messages. |
| |=== |
| |
| The different types of messages tracked are: |
| |
| [cols=",",options="header",] |
| |=== |
| |Name |Description |
| |BATCH_STORE |Batchlog write |
| |BATCH_REMOVE |Batchlog cleanup (after succesfully applied) |
| |COUNTER_MUTATION |Counter writes |
| |HINT |Hint replay |
| |MUTATION |Regular writes |
| |READ |Regular reads |
| |READ_REPAIR |Read repair |
| |PAGED_SLICE |Paged read |
| |RANGE_SLICE |Token range read |
| |REQUEST_RESPONSE |RPC Callbacks |
| |_TRACE |Tracing writes |
| |=== |
| |
| == Streaming Metrics |
| |
| Metrics reported during `Streaming` operations, such as repair, |
| bootstrap, rebuild. |
| |
| These metrics are specific to a peer endpoint, with the source node |
| being the node you are pulling the metrics from. |
| |
| Reported name format: |
| |
| *Metric Name*:: |
| `org.apache.cassandra.metrics.Streaming.<MetricName>.<PeerIP>` |
| *JMX MBean*:: |
| `org.apache.cassandra.metrics:type=Streaming scope=<PeerIP> name=<MetricName>` |
| |
| [cols=",,",options="header",] |
| |=== |
| |Name |Type |Description |
| |IncomingBytes |Counter |Number of bytes streamed to this node from the |
| peer. |
| |
| |OutgoingBytes |Counter |Number of bytes streamed to the peer endpoint |
| from this node. |
| |=== |
| |
| == Compaction Metrics |
| |
| Metrics specific to `Compaction` work. |
| |
| Reported name format: |
| |
| *Metric Name*:: |
| `org.apache.cassandra.metrics.Compaction.<MetricName>` |
| *JMX MBean*:: |
| `org.apache.cassandra.metrics:type=Compaction name=<MetricName>` |
| |
| [cols=",,",options="header",] |
| |=== |
| |Name |Type |Description |
| |BytesCompacted |Counter |Total number of bytes compacted since server |
| [re]start. |
| |
| |PendingTasks |Gauge<Integer> |Estimated number of compactions remaining |
| to perform. |
| |
| |CompletedTasks |Gauge<Long> |Number of completed compactions since |
| server [re]start. |
| |
| |TotalCompactionsCompleted |Meter |Throughput of completed compactions |
| since server [re]start. |
| |
| |PendingTasksByTableName |Gauge<Map<String, Map<String, Integer>>> |
| |Estimated number of compactions remaining to perform, grouped by |
| keyspace and then table name. This info is also kept in `Table Metrics`. |
| |=== |
| |
| == CommitLog Metrics |
| |
| Metrics specific to the `CommitLog` |
| |
| Reported name format: |
| |
| *Metric Name*:: |
| `org.apache.cassandra.metrics.CommitLog.<MetricName>` |
| *JMX MBean*:: |
| `org.apache.cassandra.metrics:type=CommitLog name=<MetricName>` |
| |
| [cols=",,",options="header",] |
| |=== |
| |Name |Type |Description |
| |CompletedTasks |Gauge<Long> |Total number of commit log messages |
| written since [re]start. |
| |
| |PendingTasks |Gauge<Long> |Number of commit log messages written but |
| yet to be fsync'd. |
| |
| |TotalCommitLogSize |Gauge<Long> |Current size, in bytes, used by all |
| the commit log segments. |
| |
| |WaitingOnSegmentAllocation |Timer |Time spent waiting for a |
| CommitLogSegment to be allocated - under normal conditions this should |
| be zero. |
| |
| |WaitingOnCommit |Timer |The time spent waiting on CL fsync; for |
| Periodic this is only occurs when the sync is lagging its sync interval. |
| |=== |
| |
| == Storage Metrics |
| |
| Metrics specific to the storage engine. |
| |
| Reported name format: |
| |
| *Metric Name*:: |
| `org.apache.cassandra.metrics.Storage.<MetricName>` |
| *JMX MBean*:: |
| `org.apache.cassandra.metrics:type=Storage name=<MetricName>` |
| |
| [cols=",,",options="header",] |
| |=== |
| |Name |Type |Description |
| |Exceptions |Counter |Number of internal exceptions caught. Under normal |
| exceptions this should be zero. |
| |
| |Load |Counter |Size, in bytes, of the on disk data size this node |
| manages. |
| |
| |TotalHints |Counter |Number of hint messages written to this node since |
| [re]start. Includes one entry for each host to be hinted per hint. |
| |
| |TotalHintsInProgress |Counter |Number of hints attemping to be sent |
| currently. |
| |=== |
| |
| [[handoff-metrics]] |
| == HintedHandoff Metrics |
| |
| Metrics specific to Hinted Handoff. There are also some metrics related |
| to hints tracked in `Storage Metrics` |
| |
| These metrics include the peer endpoint *in the metric name* |
| |
| Reported name format: |
| |
| *Metric Name*:: |
| `org.apache.cassandra.metrics.HintedHandOffManager.<MetricName>` |
| *JMX MBean*:: |
| `org.apache.cassandra.metrics:type=HintedHandOffManager name=<MetricName>` |
| |
| [cols=",,",options="header",] |
| |=== |
| |Name |Type |Description |
| |Hints_created-<PeerIP> a| |
| ____ |
| Counter |
| ____ |
| |
| a| |
| ____ |
| Number of hints on disk for this peer. |
| ____ |
| |
| |Hints_not_stored-<PeerIP> a| |
| ____ |
| Counter |
| ____ |
| |
| a| |
| ____ |
| Number of hints not stored for this peer, due to being down past the |
| configured hint window. |
| ____ |
| |
| |=== |
| |
| == HintsService Metrics |
| |
| Metrics specific to the Hints delivery service. There are also some |
| metrics related to hints tracked in `Storage Metrics` |
| |
| These metrics include the peer endpoint *in the metric name* |
| |
| Reported name format: |
| |
| *Metric Name*:: |
| `org.apache.cassandra.metrics.HintsService.<MetricName>` |
| *JMX MBean*:: |
| `org.apache.cassandra.metrics:type=HintsService name=<MetricName>` |
| |
| [cols=",,",options="header",] |
| |=== |
| |Name |Type |Description |
| |HintsSucceeded a| |
| ____ |
| Meter |
| ____ |
| |
| a| |
| ____ |
| A meter of the hints successfully delivered |
| ____ |
| |
| |HintsFailed a| |
| ____ |
| Meter |
| ____ |
| |
| a| |
| ____ |
| A meter of the hints that failed deliver |
| ____ |
| |
| |HintsTimedOut a| |
| ____ |
| Meter |
| ____ |
| |
| a| |
| ____ |
| A meter of the hints that timed out |
| ____ |
| |
| |Hint_delays |Histogram |Histogram of hint delivery delays (in |
| milliseconds) |
| |
| |Hint_delays-<PeerIP> |Histogram |Histogram of hint delivery delays (in |
| milliseconds) per peer |
| |=== |
| |
| == SSTable Index Metrics |
| |
| Metrics specific to the SSTable index metadata. |
| |
| Reported name format: |
| |
| *Metric Name*:: |
| `org.apache.cassandra.metrics.Index.<MetricName>.RowIndexEntry` |
| *JMX MBean*:: |
| `org.apache.cassandra.metrics:type=Index scope=RowIndexEntry name=<MetricName>` |
| |
| [cols=",,",options="header",] |
| |=== |
| |Name |Type |Description |
| |IndexedEntrySize |Histogram |Histogram of the on-heap size, in bytes, |
| of the index across all SSTables. |
| |
| |IndexInfoCount |Histogram |Histogram of the number of on-heap index |
| entries managed across all SSTables. |
| |
| |IndexInfoGets |Histogram |Histogram of the number index seeks performed |
| per SSTable. |
| |=== |
| |
| == BufferPool Metrics |
| |
| Metrics specific to the internal recycled buffer pool Cassandra manages. |
| This pool is meant to keep allocations and GC lower by recycling on and |
| off heap buffers. |
| |
| Reported name format: |
| |
| *Metric Name*:: |
| `org.apache.cassandra.metrics.BufferPool.<MetricName>` |
| *JMX MBean*:: |
| `org.apache.cassandra.metrics:type=BufferPool name=<MetricName>` |
| |
| [cols=",,",options="header",] |
| |=== |
| |Name |Type |Description |
| |Size |Gauge<Long> |Size, in bytes, of the managed buffer pool |
| |
| |Misses |Meter a| |
| ____ |
| The rate of misses in the pool. The higher this is the more allocations |
| incurred. |
| ____ |
| |
| |=== |
| |
| == Client Metrics |
| |
| Metrics specifc to client managment. |
| |
| Reported name format: |
| |
| *Metric Name*:: |
| `org.apache.cassandra.metrics.Client.<MetricName>` |
| *JMX MBean*:: |
| `org.apache.cassandra.metrics:type=Client name=<MetricName>` |
| |
| [cols=",,",options="header",] |
| |=== |
| |Name |Type |Description |
| |connectedNativeClients |Gauge<Integer> |Number of clients connected to |
| this nodes native protocol server |
| |
| |connections |Gauge<List<Map<String, String>> |List of all connections |
| and their state information |
| |
| |connectedNativeClientsByUser |Gauge<Map<String, Int> |Number of |
| connnective native clients by username |
| |=== |
| |
| == Batch Metrics |
| |
| Metrics specifc to batch statements. |
| |
| Reported name format: |
| |
| *Metric Name*:: |
| `org.apache.cassandra.metrics.Batch.<MetricName>` |
| *JMX MBean*:: |
| `org.apache.cassandra.metrics:type=Batch name=<MetricName>` |
| |
| [cols=",,",options="header",] |
| |=== |
| |Name |Type |Description |
| |PartitionsPerCounterBatch |Histogram |Distribution of the number of |
| partitions processed per counter batch |
| |
| |PartitionsPerLoggedBatch |Histogram |Distribution of the number of |
| partitions processed per logged batch |
| |
| |PartitionsPerUnloggedBatch |Histogram |Distribution of the number of |
| partitions processed per unlogged batch |
| |=== |
| |
| == JVM Metrics |
| |
| JVM metrics such as memory and garbage collection statistics can either |
| be accessed by connecting to the JVM using JMX or can be exported using |
| link:#metric-reporters[Metric Reporters]. |
| |
| === BufferPool |
| |
| *Metric Name*:: |
| `jvm.buffers.<direct|mapped>.<MetricName>` |
| *JMX MBean*:: |
| `java.nio:type=BufferPool name=<direct|mapped>` |
| |
| [cols=",,",options="header",] |
| |=== |
| |Name |Type |Description |
| |Capacity |Gauge<Long> |Estimated total capacity of the buffers in this |
| pool |
| |
| |Count |Gauge<Long> |Estimated number of buffers in the pool |
| |
| |Used |Gauge<Long> |Estimated memory that the Java virtual machine is |
| using for this buffer pool |
| |=== |
| |
| === FileDescriptorRatio |
| |
| *Metric Name*:: |
| `jvm.fd.<MetricName>` |
| *JMX MBean*:: |
| `java.lang:type=OperatingSystem name=<OpenFileDescriptorCount|MaxFileDescriptorCount>` |
| |
| [cols=",,",options="header",] |
| |=== |
| |Name |Type |Description |
| |Usage |Ratio |Ratio of used to total file descriptors |
| |=== |
| |
| === GarbageCollector |
| |
| *Metric Name*:: |
| `jvm.gc.<gc_type>.<MetricName>` |
| *JMX MBean*:: |
| `java.lang:type=GarbageCollector name=<gc_type>` |
| |
| [cols=",,",options="header",] |
| |=== |
| |Name |Type |Description |
| |Count |Gauge<Long> |Total number of collections that have occurred |
| |
| |Time |Gauge<Long> |Approximate accumulated collection elapsed time in |
| milliseconds |
| |=== |
| |
| === Memory |
| |
| *Metric Name*:: |
| `jvm.memory.<heap/non-heap/total>.<MetricName>` |
| *JMX MBean*:: |
| `java.lang:type=Memory` |
| |
| [cols=",,",] |
| |=== |
| |Committed |Gauge<Long> |Amount of memory in bytes that is committed for |
| the JVM to use |
| |
| |Init |Gauge<Long> |Amount of memory in bytes that the JVM initially |
| requests from the OS |
| |
| |Max |Gauge<Long> |Maximum amount of memory in bytes that can be used |
| for memory management |
| |
| |Usage |Ratio |Ratio of used to maximum memory |
| |
| |Used |Gauge<Long> |Amount of used memory in bytes |
| |=== |
| |
| === MemoryPool |
| |
| *Metric Name*:: |
| `jvm.memory.pools.<memory_pool>.<MetricName>` |
| *JMX MBean*:: |
| `java.lang:type=MemoryPool name=<memory_pool>` |
| |
| [cols=",,",] |
| |=== |
| |Committed |Gauge<Long> |Amount of memory in bytes that is committed for |
| the JVM to use |
| |
| |Init |Gauge<Long> |Amount of memory in bytes that the JVM initially |
| requests from the OS |
| |
| |Max |Gauge<Long> |Maximum amount of memory in bytes that can be used |
| for memory management |
| |
| |Usage |Ratio |Ratio of used to maximum memory |
| |
| |Used |Gauge<Long> |Amount of used memory in bytes |
| |=== |
| |
| == JMX |
| |
| Any JMX based client can access metrics from cassandra. |
| |
| If you wish to access JMX metrics over http it's possible to download |
| http://mx4j.sourceforge.net/[Mx4jTool] and place `mx4j-tools.jar` into |
| the classpath. On startup you will see in the log: |
| |
| [source,none] |
| ---- |
| HttpAdaptor version 3.0.2 started on port 8081 |
| ---- |
| |
| To choose a different port (8081 is the default) or a different listen |
| address (0.0.0.0 is not the default) edit `conf/cassandra-env.sh` and |
| uncomment: |
| |
| [source,none] |
| ---- |
| #MX4J_ADDRESS="-Dmx4jaddress=0.0.0.0" |
| |
| #MX4J_PORT="-Dmx4jport=8081" |
| ---- |
| |
| == Metric Reporters |
| |
| As mentioned at the top of this section on monitoring the Cassandra |
| metrics can be exported to a number of monitoring system a number of |
| http://metrics.dropwizard.io/3.1.0/getting-started/#other-reporting[built |
| in] and http://metrics.dropwizard.io/3.1.0/manual/third-party/[third |
| party] reporter plugins. |
| |
| The configuration of these plugins is managed by the |
| https://github.com/addthis/metrics-reporter-config[metrics reporter |
| config project]. There is a sample configuration file located at |
| `conf/metrics-reporter-config-sample.yaml`. |
| |
| Once configured, you simply start cassandra with the flag |
| `-Dcassandra.metricsReporterConfigFile=metrics-reporter-config.yaml`. |
| The specified .yaml file plus any 3rd party reporter jars must all be in |
| Cassandra's classpath. |