A big usage scenario in the future is similar to the es log storage. In the log scenario, the data will be cut by date. Many data are cold data, with few queries. Therefore, the storage cost of such data needs to be reduced. From the perspective of saving storage costs
Set the freeze time on the partition level to indicate how long the partition will be frozen, and define the location of remote storage stored after the freeze. On the be, the daemon thread will periodically determine whether the table needs to be frozen. If it does, it will upload the data to s3.
The cold and hot separation supports all doris functions, but only places some data on object storage to save costs without sacrificing functions. Therefore, it has the following characteristics:
The storage policy is the entry to use the cold and hot separation function. Users only need to associate a storage policy with a table or partition during table creation or doris use. that is, they can use the cold and hot separation function.
When creating an S3 RESOURCE, the S3 remote link verification will be performed to ensure that the RESOURCE is created correctly.
In addition, fe configuration needs to be added: enable_storage_policy=true
For example:
CREATE RESOURCE "remote_s3"
PROPERTIES
(
"type" = "s3",
"AWS_ENDPOINT" = "bj.s3.com",
"AWS_REGION" = "bj",
"AWS_BUCKET" = "test-bucket",
"AWS_ROOT_PATH" = "path/to/root",
"AWS_ACCESS_KEY" = "bbb",
"AWS_SECRET_KEY" = "aaaa",
"AWS_MAX_CONNECTIONS" = "50",
"AWS_REQUEST_TIMEOUT_MS" = "3000",
"AWS_CONNECTION_TIMEOUT_MS" = "1000"
);
CREATE STORAGE POLICY test_policy
PROPERTIES(
"storage_resource" = "remote_s3",
"cooldown_ttl" = "1d"
);
CREATE TABLE IF NOT EXISTS create_table_use_created_policy
(
k1 BIGINT,
k2 LARGEINT,
v1 VARCHAR(2048)
)
UNIQUE KEY(k1)
DISTRIBUTED BY HASH (k1) BUCKETS 3
PROPERTIES(
"storage_policy" = "test_policy"
);
Or for an existing table, associate the storage policy
ALTER TABLE create_table_not_have_policy set ("storage_policy" = "test_policy");
Or associate a storage policy with an existing partition
ALTER TABLE create_table_partition MODIFY PARTITION (*) SET("storage_policy"="test_policy");
For details, please refer to the resource, policy, create table, alter and other documents in the docs directory
Through show proc ‘/backends’, you can view the size of each object being uploaded to, and the RemoteUsedCapacity item.
Through show tables from tableName, you can view the object size occupied by each table, and the RemoteDataSize item.
As above, cold data introduces the cache in order to optimize query performance. After the first hit after cooling, Doris will reload the cooled data to be's local disk. The cache has the following characteristics:
file_cache_alive_time_sec can set the maximum storage time of the cache data after it has not been accessed. The default is 604800, which is one week.file_cache_max_size_per_disk can set the disk size occupied by the cache. Once this setting is exceeded, the cache that has not been accessed for the longest time will be deleted. The default is 0, means no limit to the size, unit: byte.file_cache_type is optional sub_file_cache (segment the remote file for local caching) and whole_file_cache (the entire remote file for local caching), the default is "", means no file is cached, please set it when caching is required this parameter.