blob: 7c93175d0d6f98f91f9c9aef04376baae386d27f [file] [log] [blame]
---
title: Preventing and Recovering from Disk Full Errors
---
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
It is important to monitor the disk usage of <%=vars.product_name%> members. If a member lacks sufficient disk space for a disk store, the member attempts to shut down the disk store and its associated cache, and logs an error message. A shutdown due to a member running out of disk space can cause loss of data, data file corruption, log file corruption and other error conditions that can negatively impact your applications.
After you make sufficient disk space available to the member, you can restart the member.
You can prevent disk file errors using the following techniques:
- If you are using ext4 file system, we recommend that you pre-allocate disk store files and disk store metadata files. Pre-allocation reserves disk space for these files and leaves the member in a healthy state when the disk store and regions are shut down, allowing you to restart the member once sufficient disk space has been made available. Pre-allocation is enabled by default.
- Configure critical usage thresholds (disk-usage-warning-percentage and disk-usage-critical-percentage) for the disk. By default, these are set to 90% for warning and 99% for errors that will shut down the cache.
- Follow the recommendations in [Optimizing a System with Disk Stores](../disk_storage/optimize_availability_and_performance.html#optimize_avail_disk_store) for general disk management best practices.
When a disk write fails due to disk full conditions, the member is shutdown and removed from the cluster.
## Recovering from Disk Full Errors
If a member of your cluster fails due to a disk full error condition, add or make additional disk capacity available and attempt to restart the member normally. If the member does not restart and there is a redundant copy of its regions in a disk store on another member, you can restore the member using the following steps:
1. Delete or move the disk store files from the failed member.
2. Use the gfsh `show missing-disk-stores` command to identify any missing data. You may need to manually restore this data.
3. Revoke the missing disk stores using the [revoke missing-disk-store](../../tools_modules/gfsh/command-pages/revoke.html) gfsh command.
4. Restart the member.
See [Handling Missing Disk Stores](../disk_storage/handling_missing_disk_stores.html#handling_missing_disk_stores) for more information.