Metadata of IoTDB is managed by MManger, including:
Maintain an inverted index for tag: Map<String, Map<String, Set<LeafMNode>>> tagIndex
tag key -> tag value -> timeseries LeafMNode
In the process of initializing, MManager will replay the mlog to load the metadata into memory. There are seven types of operation log:
At the beginning of each operation, it will try to obatin the write lock of MManager, and release it after operation.
Create Timeseries
Delete Timeseries
Set Storage Group
Delete Storage Group
Set TTL
Change the offset of Timeseries
Change the alias of Timeseries
In addition to these seven operation that are needed to be logged, there are another six alter operation to tag/attribute info of timeseries.
Same as above, at the beginning of each operation, it will try to obatin the write lock of MManager, and release it after operation.
Rename Tag/Attribute
reset tag/attribute value
drop existing tag/attribute
add new tags
add new attributes
upsert alias/tags/attributes
There three types of nodes in MTree: StorageGroupMNode、InternalMNode(Non-leaf node)、LeafMNode(leaf node), they all extend to MNode.
Each InternalMNode has a read-write lock. When querying metadata information, you need to obtain a read lock for each InternalMNode on the path. When modifying metadata information, if you modify the LeafMNode, you need to obtain the write lock of its parent node. If you modify a non-leaf node, only need to obtain its own write lock. If the InternalMNode is located in the device layer, it also contains a Map <String, MNode> aliasChildren, which is used to store alias information.
StorageGroupMNode extends to InternalMNode, containing metadata information for storage groups, such as TTL.
LeafMNode contains the schema information of the corresponding time series, its alias(if it doesn't have, it is null) and the offset of the time series tag/attribute information in the tlog file(if there is no tag/attribute, it is -1)
example:
The metadata management of IoTDB takes the form of a directory tree, the penultimate layer is the device layer, and the last layer is the sensor layer.
The root node exists by default. Creating storage groups, deleting storage groups, creating time series and deleting time series are all operations on the nodes of the tree.
create storage group(root.a.b.sg)
create timeseries(root.a.b.sg.d.s1)
Deleting a storage group is similar to deleting a time series. That is, the storage group or time series node is deleted in its parent node. The time series node also needs to delete its alias in the parent node; if in the deletion process, a node is found not to have any child node, needs to be deleted recursively.
To speed up restarting of IoTDB, we set checkpoint for MTree to avoid reading mlog.bin and executing the commands line by line. There are two ways to create MTree snapshot:
mlog.bin hasn’t been updated for more than 1 hourmlog.bin has reached 100000 lines (could be configured)create snapshot for schema to create MTree snapshotThe method is MManager.createMTreeSnapshot():
mtree.snapshot.tmp). The serialization of MTree is depth-first from children to parent. Information of nodes are converted into String according to different node types, which is convenient for deserialization.mtree.snapshot), to avoid crush of server and failure of serialization.mlog.bin by MLogWriter.clear() method:mlog.bin filelogNumber as 0. logNumber records the log number of mlog.bin, which is used for background thread to check whether it is larger than the threshold configured by user.The method is MManager.initFromLog():
mtree.snapshot.tmp exists. If so, there may exist crush of server and failure of serialization. Delete the temp file.mtree.snapshot exists. If not, use a new MTree; otherwise, start deserializing from snapshot and get MTreemlog.bin and finish the recover process of MTree. Update lineNumber at the same time and return it for recording the line number of mlog.bin afterwards.All metadata operations are recorded in a metadata log file, which defaults to data/system/schema/mlog.bin.
When the system restarted, the logs in mlog will be replayed. Until the replaying finished, you need to mark writeToLog to false. When the restart is complete, the writeToLog needs to be set to true.
mlog stores the binary encoding. We can use MlogParser Tool to parse the mlog.bin to a human-readable txt version.
Schema operation examples and the corresponding parsed mlog record:
set storage group to root.turbine
mlog: 2,root.turbine
format: 2,path
delete storage group root.turbine
mlog: 1,root.turbine
format: 1,path
create timeseries root.turbine.d1.s1(temperature) with datatype=FLOAT, encoding=RLE, compression=SNAPPY tags(tag1=v1, tag2=v2) attributes(attr1=v1, attr2=v2)
mlog: 0,root.turbine.d1.s1,3,2,1,,temperature,offset
format: 0,path,TSDataType,TSEncoding,CompressionType,[properties],[alias],[tag-attribute offset]
delete timeseries root.turbine.d1.s1
mlog: 1,root.turbine.d1.s1
format: 1,path
set ttl to root.turbine 10
mlog: 10,root.turbine,10
format: 10,path,ttl
alter timeseries root.turbine.d1.s1 add tags(tag1=v1)
Only when root.turbine.d1.s1 does not have any tag/attribute information before, the sql will generate logs
mlog: 12,root.turbine.d1.s1,10
format: 10,path,[change offset]
alter timeseries root.turbine.d1.s1 UPSERT ALIAS=newAlias
mlog: 13,root.turbine.d1.s1,newAlias
format: 13,path,[new alias]
create schema template temp1( s1 INT32 with encoding=Gorilla and compression SNAPPY, s2 FLOAT with encoding=RLE and compression=SNAPPY )
mlog:5,temp1,0,s1,1,8,1
mlog:5,temp1,0,s2,3,2,1
format: 5,template name,is Aligned Timeseries,measurementId,TSDataType,TSEncoding,CompressionType
set schema template temp1 to root.turbine
mlog: 6,temp1,root.turbine
format: 6,template name,path
Auto create device root.turbine.d1 (after set a template to a prefix path, create a device path in mtree automatically when insert data to the device)
mlog: 4,root.turbine.d1
format: 4,path
set root.turbine.d1 is using template (after set a template to a device path, this log shows the device is using template)
mlog: 61,root.turbine.d1
format: 61,path
All timeseries tag/attribute information will be saved in the tag file, which defaults to data/system/schema/tlog.txt.
Total number of bytes of persistence for tags and attributes of each time series is L, which can be configured in the iotdb-engine.properties
persist content: Map<String,String> tags, Map<String,String> attributes, if the content length doesn't reach L, we need to fill it with blank.
create timeseries root.turbine.d1.s1(temprature) with datatype=FLOAT, encoding=RLE, compression=SNAPPY tags(tag1=v1, tag2=v2) attributes (attr1=v1, attr2=v2)
content in tlog:
tagsSize (tag1=v1, tag2=v2) attributesSize (attr1=v1, attr2=v2)
The main logic of query is in the showTimeseries(ShowTimeSeriesPlan plan) function of MManager
First of all, we should judge whether we need to order by heat, if so, call the getAllMeasurementSchemaByHeatOrder function of MTree. Otherwise, call the getAllMeasurementSchema function.
The heat here is represented by the lastTimeStamp of each time series, so we need to fetch all the satisfied time series, and then order them by lastTimeStamp, cut them by offset and limit.
In this case, we need to pass the limit(if not exists, set fetch size as limit) and offset to the function findPath to reduce the memory footprint.
It's a recursive function to get all the statisfied MNode in MTree from root until the number of timeseries list has reached limit or all the MTree has been traversed.
The filter condition here can only be tag attribute, or it will throw an exception.
We can fetch all the satisfied MeasurementMNode through the inverted tag index in MTree fast without traversing the whole tree.
If the result needs to be ordered by heat, we should sort them by the order of lastTimeStamp or by the natural order, and then we will trim the result by limit and offset.
If there is too much metadata , one whole show timeseris processing will cause OOM, so we need to add a fetch size parameter.
While the client interacting with the server, it will get at most fetch_size records once.
And the intermediate state will be saved in the ShowTimeseriesDataSet. The queryId -> ShowTimeseriesDataSet key-value pair will be saved in TsServieImpl.
In ShowTimeseriesDataSet, we saved the ShowTimeSeriesPlan, current cursor index and cached result list List<RowRecord> result.
indexis equal to the size of List<RowRecord> resultshowTimeseries in MManager to fetch result and put them into cache.fetch size.hasLimit is false, then reset index to zero.index < result.size(),return trueindex > result.size(),return false