Volume in the datanode stores the container data and metadata (rocks db co-located on the volume). There are various parallel operation going on such as import container, export container, write and delete data blocks, container repairs, create and delete containers. The space is also required for volume db to perform compaction at regular interval. This is hard to capture exact usages and free available space. So, this is required to configure minimum free space so that datanode operation can perform without any corruption and environment being stuck and support read of data.
This free space is used to ensure volume allocation if required space < (volume available space - free space - reserved space - committed space)
. Any container creation and import container need to ensure that this constraint is met. And block byte writes need ensure that free space
space is available. Note: Any issue related to ensuring free space is tracked with separate JIRA.
Two configurations are provided,
Solution involves
Configuration:
20GB
Minimum free space = Max (<Min free space>
, <percent disk space>
)
Disk space | Min Free Space (percent: 1%) | Min Free Space ( percent: 0.1%) |
---|---|---|
100 GB | 20 GB | 20 GB (min space default) |
1 TB | 20 GB | 20 GB (min space default) |
10 TB | 100 GB | 20 GB (min space default) |
100 TB | 1 TB | 100 GB |
considering above table with this solution,
Considering above approach, 20 GB as default should be sufficient for most of the disk, as usually disk size is 10-15TB as seen. Higher disk is rarely used, and instead multiple volumes are attached to same DN with multiple disk.
Considering this scenario, Minimum free space: hdds.datanode.volume.min.free.space
itself is enough and percent based configuration can be removed.
If hdds.datanode.volume.min.free.space.percent
is configured, this should not have any impact as default value is increased to 20GB which will consider most of the use case.
Configuration:
20GB
Minimum free space = Min (<Max free space>
, <percent disk space>
)
Difference with approach
one
is, Min function over the 2 above configuration
Disk space | Min Free Space (20GB, 10% of disk) |
---|---|
10 GB | 1 GB (=Min(20GB, 1GB) |
100 GB | 10 GB (=Min(20GB, 10GB) |
1 TB | 20 GB (=Min(20GB, 100GB) |
10 TB | 20 GB (=Min(20GB, 1TB) |
100 TB | 20GB (=Min(20GB, 10TB) |
This case is more useful for test environment where disk space is less and no need any additional configuration.