The written files are both out of order and in order, and there are both small files and large files, and there are different optimal merge algorithms in different systems. Therefore, MergeManager provides multiple merge policy interfaces and provides flexible new policy interfaces. Entry method
Under the squeeze merge strategy, when a series of seq and unseq files are merged and the time and memory limits are not exceeded, all files will be merged into one named {timestamp}-{version}-{merge times + 1 } .tsfile.merge.squeeze
When the time or memory limit is exceeded, the file selection process will be interrupted, and the currently selected seq and unseq files will be merged as above to form a file
The time limit is that the time spent in the process of selecting files cannot exceed a given value, not an estimate of the time taken by the merge process, the purpose is to prevent the selection of files from taking too much time when there are too many files
Memory limit refers to the estimation of the maximum memory consumed by the selected files when merging, and making the estimated value not exceed a given value, so that the process of merging generates a memory overflow.
There are two options when recovering, one is to continue the previous progress, and the other is to give up the previous progress
org.apache.iotdb.db.engine.merge.BaseFileSelector
The base class for the file selection process, which specifies the basic framework for selecting files and methods for calculating file memory consumption in different situations. All custom file selection strategies need to inherit this class
org.apache.iotdb.db.engine.merge.IRecoverMergeTask
The interface class of the recover process, which specifies the recoverMerge interface. All custom merge recovery strategies must inherit this class.
In addition, each custom MergeTask needs to inherit the Callable <void> interface to ensure that it can be called back
org.apache.iotdb.db.engine.merge.manage.MergeContext
Common context classes in the Merge process
org.apache.iotdb.db.engine.merge.manage.MergeManager
The thread pool class in the Merge process, which manages the operation of multiple merge tasks
org.apache.iotdb.db.engine.merge.manage.MergeResource
Resource class in the Merge process, responsible for managing files, readers, writers, measurement Schemas, modifications, and other resources during the merge process
Under limited memory and time, first select the unseq file in turn, and each time directly select the seq file that overlaps with the unseq file according to the time range
First select all series that need to be merged according to storageGroupName, then create a chunkMetaHeap for each seq file selected in the selector, and merge them into multiple sub-threads according to the mergeChunkSubThreadNum in the configuration.
Under the limited memory and time, first select the unseq file in turn, each time select the seq file that overlaps with the time range of the unseq file, and then retry each seq file in order. Take more seq files under circumstances
Basically similar to the inplace strategy, first select all series that need to be merged according to storageGroupName, then create a chunkMetaHeap for each seq file selected in the selector, and merge into multiple child threads according to the mergeChunkSubThreadNum in the configuration to merge
The merge may be forcibly interrupted when the system shuts down or fails suddenly. At this time, the system records the interrupted merge and scans the merge.log file when the next StorageGroupProcessor is created, and re-merges according to the configuration. There are the following states, among which the recovery process is to give up the merge strategy first
Basically did nothing, delete the corresponding merge log directly during recovery, and wait for the next manual or automatic merge
Files to be merged and timeseries have been selected Delete the corresponding merge file during recovery, empty the selected file, and clear all other merge related public resources
All timeseries have been merged Perform cleanUp directly on recovery and execute the callback operation completed by merge
All the files on the surface have been merged, this time the merge has been completed This state does not appear in the merge log in principle