Options below can be used when create table for analytic engine
enable_ttl, bool. When enable TTL on a table, rows older than ttl will be deleted and can't be querid, default true
ttl, duration, lifetime of a row, only used when enable_ttl is true. default 7d.
storage_format, string. The underlying column's format. Availiable values:
columnar, defaulthybrid, Note: This feature is still in development, and it may change in the future.The meaning of those two values are in Storage format section.
There are mainly two formats supported in analytic engine. One is columnar, which is the traditional columnar format, with one table column in one physical column:
| Timestamp | Device ID | Status Code | Tag 1 | Tag 2 | | --------- |---------- | ----------- | ----- | ----- | | 12:01 | A | 0 | v1 | v1 | | 12:01 | B | 0 | v2 | v2 | | 12:02 | A | 0 | v1 | v1 | | 12:02 | B | 1 | v2 | v2 | | 12:03 | A | 0 | v1 | v1 | | 12:03 | B | 0 | v2 | v2 | | ..... | | | | |
The other one is hybrid, an experimental format used to simulate row-oriented storage in columnar storage to accelerate classic time-series query.
In classic time-series user cases like IoT or DevOps, queries will typically first group their result by series id(or device id), then by timestamp. In order to achieve good performance in those scenarios, the data physical layout should match this style, so the hybrid format is proposed like this:
| Device ID | Timestamp | Status Code | Tag 1 | Tag 2 | minTime | maxTime | |-----------|---------------------|-------------|-------|-------|---------|---------| | A | [12:01,12:02,12:03] | [0,0,0] | v1 | v1 | 12:01 | 12:03 | | B | [12:01,12:02,12:03] | [0,1,0] | v2 | v2 | 12:01 | 12:03 | | ... | | | | | | |
collapsible, those columns will be collapsed into a list. Used to encode fields in time-series tablenon-collapsible, those columns should only contain one distinct value. Used to encode tags in time-series tableminTime and maxTime. Those are used to cut unnecessary rows out in query.CREATE TABLE `device` ( `ts` timestamp NOT NULL, `tag1` string tag, `tag2` string tag, `value1` double, `value2` int, timestamp KEY (ts)) ENGINE=Analytic with ( enable_ttl = 'false', storage_format = 'hybrid' );
This will create a table with hybrid format, users can inspect data format with parquet-tools. The table above should have following parquet schema:
message arrow_schema {
optional group ts (LIST) {
repeated group list {
optional int64 item (TIMESTAMP(MILLIS,false));
}
}
required int64 tsid (INTEGER(64,false));
optional binary tag1 (STRING);
optional binary tag2 (STRING);
optional group value1 (LIST) {
repeated group list {
optional double item;
}
}
optional group value2 (LIST) {
repeated group list {
optional int32 item;
}
}
}