TOPN queries refer to queries that involve ORDER BY LIMIT operations, which are common in log retrieval and other detailed query scenarios. Doris automatically optimizes this type of query.
SELECT * FROM tablex WHERE xxx ORDER BY c1,c2 ... LIMIT n
n, it will not take effect if n is greater than topn_opt_limit_threshold.The following two parameters are session variables that can be set for a specific SQL or globally.
topn_opt_limit_threshold: This session variable determines whether TOPN optimization is applied. It defaults to 1024, and setting it to 0 disables the optimization.
enable_two_phase_read_optimization: This session variable determines whether to enable this optimization. It defaults to true, and setting it to false disables the optimization.
To confirm if TOPN query optimization is enabled for a particular SQL, you can use the EXPLAIN statement to get the query plan. An example is as follows:
TOPN OPT indicates that optimization point 1 is applied.VOlapScanNode with SORT LIMIT indicates optimization point 2 is applied.OPT TWO PHASE indicates optimization point 3 is applied.1:VTOP-N(137) | order by: @timestamp18 DESC | TOPN OPT | OPT TWO PHASE | offset: 0 | limit: 10 | distribute expr lists: applicationName5 | 0:VOlapScanNode(106) TABLE: log_db.log_core_all_no_index(log_core_all_no_index), PREAGGREGATION: ON SORT INFO: @timestamp18 SORT LIMIT: 10 TOPN OPT:1 PREDICATES: ZYCFC-TRACE-ID4 like '%flowId-1720055220933%' partitions=1/8 (p20240704), tablets=250/250, tabletList=1727094,1727096,1727098 ... cardinality=345472780, avgRowSize=0.0, numNodes=1 pushAggOp=NONE
First, set topn_opt_limit_threshold to 0 to disable TOPN query optimization and compare the execution time of the SQL with and without optimization enabled.
After enabling TOPN query optimization, search for RuntimePredicate in the query profile and focus on the following metrics:
RowsZonemapRuntimePredicateFiltered: The number of rows filtered out, the higher the better.NumSegmentFiltered: The number of data files filtered out, the higher the better.BlockConditionsFilteredZonemapRuntimePredicateTime: The time taken to filter data, the lower the better.Before version 2.0.3, the RuntimePredicate metrics were not separated out, and the Zonemap metrics can be used as a rough guide.
SegmentIterator:
- BitmapIndexFilterTimer: 46.54us
- BlockConditionsFilteredBloomFilterTime: 10.352us
- BlockConditionsFilteredDictTime: 7.299us
- BlockConditionsFilteredTime: 202.23ms
- BlockConditionsFilteredZonemapRuntimePredicateTime: 0ns
- BlockConditionsFilteredZonemapTime: 402.917ms
- BlockInitSeekCount: 399
- BlockInitSeekTime: 11.309ms
- BlockInitTime: 215.59ms
- BlockLoadTime: 7s567ms
- BlocksLoad: 392.97K (392970)
- CachedPagesNum: 0
- CollectIteratorMergeTime: 0ns
- CollectIteratorNormalTime: 0ns
- CompressedBytesRead: 29.76 MB
- DecompressorTimer: 427.713ms
- ExprFilterEvalTime: 3s930ms
- FirstReadSeekCount: 392.921K (392921)
- FirstReadSeekTime: 528.287ms
- FirstReadTime: 1s134ms
- IOTimer: 51.286ms
- InvertedIndexFilterTime: 49.457us
- InvertedIndexQueryBitmapCopyTime: 0ns
- InvertedIndexQueryBitmapOpTime: 0ns
- InvertedIndexQueryCacheHit: 0
- InvertedIndexQueryCacheMiss: 0
- InvertedIndexQueryTime: 0ns
- InvertedIndexSearcherOpenTime: 0ns
- InvertedIndexSearcherSearchTime: 0ns
- LazyReadSeekCount: 0
- LazyReadSeekTime: 0ns
- LazyReadTime: 106.952us
- NumSegmentFiltered: 0
- NumSegmentTotal: 50
- OutputColumnTime: 61.987ms
- OutputIndexResultColumnTimer: 12.345ms
- RawRowsRead: 3.929151M (3929151)
- RowsBitmapIndexFiltered: 0
- RowsBloomFilterFiltered: 0
- RowsConditionsFiltered: 6.38976M (6389760)
- RowsDictFiltered: 0
- RowsInvertedIndexFiltered: 0
- RowsKeyRangeFiltered: 0
- RowsShortCircuitPredFiltered: 0
- RowsShortCircuitPredInput: 0
- RowsStatsFiltered: 6.38976M (6389760)
- RowsVectorPredFiltered: 0
- RowsVectorPredInput: 0
- RowsZonemapRuntimePredicateFiltered: 6.38976M (6389760)
- SecondReadTime: 0ns
- ShortPredEvalTime: 0ns
- TotalPagesNum: 2.301K (2301)
- UncompressedBytesRead: 137.99 MB
- VectorPredEvalTime: 0ns