HBASE-29039 Seek past delete markers instead of skipping one at a time (#8001) When a DeleteColumn or DeleteFamily marker is encountered during a normal user scan, the matcher currently returns SKIP, forcing the scanner to advance one cell at a time. This causes read latency to degrade linearly with the number of accumulated delete markers for the same row or column. Since these are range deletes that mask all remaining versions of the column, seek past the entire column immediately via columns.getNextRowOrNextColumn(). This is safe because cells arrive in timestamp descending order, so any puts newer than the delete have already been processed. For DeleteFamily, also fix getKeyForNextColumn in ScanQueryMatcher to bypass the empty-qualifier guard (HBASE-18471) when the cell is a DeleteFamily marker. Without this, the seek barely advances past the current cell instead of jumping to the first real qualified column. The optimization is only applied with plain ScanDeleteTracker, and skipped when: - seePastDeleteMarkers is true (KEEP_DELETED_CELLS) - newVersionBehavior is enabled (sequence IDs determine visibility) - visibility labels are in use (delete/put label mismatch) --- Seeking is more expensive than skipping. When each row has only one DeleteFamily or DeleteColumn marker (common case), the seek overhead adds up across many rows, causing performance regression. Introduce a counter that tracks consecutive range delete markers per row. Only switch from SKIP to SEEK after seeing SEEK_ON_DELETE_MARKER_THRESHOLD (default 10) markers, indicating actual accumulation. This preserves skip performance for the common case while still optimizing the accumulation case. Signed-off-by: Charles Connell <cconnell@apache.org>

Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Apache Hadoop.
To get started using HBase, visit the project home page. The HBase Reference Guide has a ‘Quick Start’ section and is where you should begin your exploration of the HBase project.
The latest HBase can be downloaded from the download page.
We use mailing lists to send notices and discuss. See the mailing lists and archives for more information.
We use the #hbase channel on the official ASF Slack Workspace for real time questions and discussions. Please mail dev@hbase.apache.org to request an invite.
The source code can be found at https://hbase.apache.org/source-repository
The HBase issue tracker is at https://issues.apache.org/jira/browse/HBASE
Notice that, the public registration for https://issues.apache.org/ has been disabled due to spam. If you want to contribute to HBase, please visit the Request a jira account page to submit your request. Please make sure to select hbase as the ‘ASF project you want to file a ticket’ so we can receive your request and process it.
NOTE: we need to process the requests manually so it may take sometime, for example, up to a week, for us to respond to your request.
Apache HBase is made available under the Apache License, Version 2.0.
The HBase distribution includes cryptographic software. See the export control notice for more information.