blob: 1f09271520a04182c32ac7481840d6b9c57767c3 [file] [log] [blame]
<% set_title("Configure", product_name_long, "to Handle Network Partitioning") %>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
This section lists configuration considerations relating to network partition detection.
<a id="handling_network_partitioning__section_EAF1957B6446491A938DEFB06481740F"></a>
The system uses a combination of member coordinators and system members, designated as lead members, to detect and resolve network partitioning problems.
- Network partition detection works in all environments. Using multiple locators mitigates the effect of network partitioning. See [Configuring Peer-to-Peer Discovery](../../topologies_and_comm/p2p_configuration/setting_up_a_p2p_system.html).
- Network partition detection is enabled by default. The default setting in the `gemfire.properties` file is
``` pre
enable-network-partition-detection=true
```
Processes that do not have network partition detection enabled are not eligible to be the lead member, so their failure will not trigger declaration of a network partition.
All system members should have the same setting for `enable-network-partition-detection`. If they do not, the system throws a `GemFireConfigException` upon startup.
- The property `enable-network-partition-detection` must be true if you are using either partitioned or persistent regions. If you create a persistent region and `enable-network-partition-detection` to set to false, you will receive the following warning message:
``` pre
Creating persistent region {0}, but enable-network-partition-detection is set to false.
Running with network partition detection disabled can lead to an unrecoverable system in the
event of a network split."
```
- Configure regions you want to protect from network partitioning with a scope setting of `DISTRIBUTED_ACK` or `GLOBAL`. Do not use `DISTRIBUTED_NO_ACK` scope. This prevents operations from being performed throughout the cluster before a network partition is detected.
**Note:**
<%=vars.product_name%> issues an alert if it detects `DISTRIBUTED_NO_ACK` regions when network partition detection is enabled:
``` pre
Region {0} is being created with scope {1} but enable-network-partition-detection is enabled in the distributed system.
This can lead to cache inconsistencies if there is a network failure.
```
- These other configuration parameters affect or interact with network partitioning detection. Check whether they are appropriate for your installation and modify as needed.
- If you have network partition detection enabled, the threshold percentage value for allowed membership weight loss is automatically configured to 51. You cannot modify this value. **Note:** The weight loss calculation uses round to nearest. Therefore, a value of 50.51 is rounded to 51 and will cause a network partition.
- Failure detection is initiated if a member's `ack-wait-threshold` (default is 15 seconds) and `ack-severe-alert-threshold` (15 seconds) properties elapse before receiving a response to a message. If you modify the `ack-wait-threshold` configuration value, you should modify `ack-severe-alert-threshold` to match the other configuration value.
- If the system has clients connecting to it, the clients' `cache.xml` pool `read-timeout` should be set to at least three times the `member-timeout` setting in the server's `gemfire.properties` file. The default pool `read-timeout` setting is 10000 milliseconds.
- You can adjust the default weights of members by specifying the system property `gemfire.member-weight` upon startup. For example, if you have some VMs that host a needed service, you could assign them a higher weight upon startup.
- By default, members that are forced out of the cluster by a network partition event will automatically restart and attempt to reconnect. Data members will attempt to reinitialize the cache. See [Handling Forced Cache Disconnection Using Autoreconnect](../member-reconnect.html).