| <% set_title("Configure", product_name_long, "to Handle Network Partitioning") %> |
| |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| This section lists configuration considerations relating to network partition detection. |
| |
| <a id="handling_network_partitioning__section_EAF1957B6446491A938DEFB06481740F"></a> |
| The system uses a combination of member coordinators and system members, designated as lead members, to detect and resolve network partitioning problems. |
| |
| - Network partition detection works in all environments. Using multiple locators mitigates the effect of network partitioning. See [Configuring Peer-to-Peer Discovery](../../topologies_and_comm/p2p_configuration/setting_up_a_p2p_system.html). |
| |
| - Network partition detection is enabled by default. The default setting in the `gemfire.properties` file is |
| |
| ``` pre |
| enable-network-partition-detection=true |
| ``` |
| |
| Processes that do not have network partition detection enabled are not eligible to be the lead member, so their failure will not trigger declaration of a network partition. |
| |
| All system members should have the same setting for `enable-network-partition-detection`. If they do not, the system throws a `GemFireConfigException` upon startup. |
| |
| - The property `enable-network-partition-detection` must be true if you are using either partitioned or persistent regions. If you create a persistent region and `enable-network-partition-detection` to set to false, you will receive the following warning message: |
| |
| ``` pre |
| Creating persistent region {0}, but enable-network-partition-detection is set to false. |
| Running with network partition detection disabled can lead to an unrecoverable system in the |
| event of a network split." |
| ``` |
| |
| - Configure regions you want to protect from network partitioning with a scope setting of `DISTRIBUTED_ACK` or `GLOBAL`. Do not use `DISTRIBUTED_NO_ACK` scope. This prevents operations from being performed throughout the cluster before a network partition is detected. |
| **Note:** |
| <%=vars.product_name%> issues an alert if it detects `DISTRIBUTED_NO_ACK` regions when network partition detection is enabled: |
| |
| ``` pre |
| Region {0} is being created with scope {1} but enable-network-partition-detection is enabled in the distributed system. |
| This can lead to cache inconsistencies if there is a network failure. |
| |
| ``` |
| |
| - These other configuration parameters affect or interact with network partitioning detection. Check whether they are appropriate for your installation and modify as needed. |
| - If you have network partition detection enabled, the threshold percentage value for allowed membership weight loss is automatically configured to 51. You cannot modify this value. **Note:** The weight loss calculation uses round to nearest. Therefore, a value of 50.51 is rounded to 51 and will cause a network partition. |
| - Failure detection is initiated if a member's `ack-wait-threshold` (default is 15 seconds) and `ack-severe-alert-threshold` (15 seconds) properties elapse before receiving a response to a message. If you modify the `ack-wait-threshold` configuration value, you should modify `ack-severe-alert-threshold` to match the other configuration value. |
| - If the system has clients connecting to it, the clients' `cache.xml` pool `read-timeout` should be set to at least three times the `member-timeout` setting in the server's `gemfire.properties` file. The default pool `read-timeout` setting is 10000 milliseconds. |
| - You can adjust the default weights of members by specifying the system property `gemfire.member-weight` upon startup. For example, if you have some VMs that host a needed service, you could assign them a higher weight upon startup. |
| |
| - By default, members that are forced out of the cluster by a network partition event will automatically restart and attempt to reconnect. Data members will attempt to reinitialize the cache. See [Handling Forced Cache Disconnection Using Autoreconnect](../member-reconnect.html). |
| |
| |