blob: 86fbc521d9266a9bd260f971531ff980ae87ac1d [file] [log] [blame]
---
title: Maintaining Cache Consistency
---
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
Maintaining data consistency between caches in a distributed <%=vars.product_name%> system is vital for ensuring its functional integrity and preventing data loss.
## <a id="cache_const__section_lf3_lvn_nr" class="no-quick-link"></a>General Guidelines
**Before Restarting a Region with a Disk Store, Consider the State of the Entire Region**
**Note:**
If you revoke a members disk store, do not restart that member with its disk storesin isolationat a later time.
<%=vars.product_name%> stores information about your persisted data and prevents you from starting a member with a revoked disk store in the running system. But <%=vars.product_name%> cannot stop you from starting a revoked member in isolation, and running with its revoked data. This is an unlikely situation, but it is possible to do:
1. Members A and B are running, both storing Region data to disk.
2. Member A goes down.
3. Member B goes down.
4. At this point, Member B has the most recent disk data.
5. Member B is not usable. Perhaps its host machine is down or cut off temporarily.
6. To get the system up and running, you start Member A, and use the command line tool to revoke Member Bs status as member with the most recent data. The system loads Member As data and you run forward with that.
7. Member A is stopped.
8. At this point, both Member A and Member B have information in their disk files indicating they are the gold copy members.
9. If you start Member B, it will load its data from disk.
10. When you start Member A, the system will recognize the incompatible state and report an exception, but by this point, you have good data in both files, with no way to combine them.
**Understand Cache Transactions**
Understanding the operation of <%=vars.product_name%> transactions can help you minimize situations where the cache could get out of sync.
Transactions do not work in distributed regions with global scope.
Transactions provide consistency within one cache, but the distribution of results to other members is not as consistent.
Multiple transactions in a cache can create inconsistencies because of read committed isolation. Since multiple threads cannot participate in a transaction, most applications will be running multiple transactions.
An in-place change to directly alter a keys value without doing a put can result in cache inconsistencies. With transactions, it creates additional difficulties because it breaks read committed isolation. If at all possible, use copy-on-read instead.
In distributed-no-ack scope, two conflicting transactions in different members can commit simultaneously, overwriting each other as the changes are distributed.
If a cache writer exists during a transaction, then each transaction write operation triggers a cache writers related call. Regardless of the regions scope, a transaction commit can invoke a cache writer only in the local cache and not in the remote caches.
A region in a cache with transactions may not stay in sync with a region of the same name in another cache without transactions.
Two applications running the same sequence of operations in their transactions may get different results. This could occur because operations happening outside a transaction in one of the members can overwrite the transaction, even in the process of committing. This could also occur if the results of a large transaction exceed the machines memory or the capacity of <%=vars.product_name%>. Those limits can vary by machine, so the two members may not be in sync.
## <a id="cache_const__section_qxx_kvn_nr" class="no-quick-link"></a>Guidelines for Multi-Site Deployments
**Optimize socket-buffer-size**
In a multi-site installation using gateways, if the link between sites is not tuned for optimum throughput, it could cause messages to back up in the cache queues. If a queue overflows because of inadequate buffer sizes, it will become out of sync with the sender and the receiver will be unaware of the condition. You can configure the send-receive buffer sizes of the TCP/IP connections used for data transmissions by changing the socket-buffer-size attribute of the gateway-sender and gateway-receiver elements in the `cache.xml` file. Set the buffer size by determining the link bandwidth and then using ping to measure the round-trip time.
When optimizing socket-buffer sizes, use the same value for both gateway senders and gateway receivers.
**Prevent Primary and Secondary Gateway Senders from Going Offline**
In a multi-site installation, if the primary gateway server goes offline, a secondary gateway sender must take over primary responsibilities as the failover system. The existing secondary gateway sender detects that the primary gateway sender has gone offline, and a secondary one becomes the new primary. Because the queue is distributed, its contents are available to all gateway senders. So, when a secondary gateway sender becomes primary, it is able to start processing the queue where the previous primary left off with no loss of data.
If both the primary gateway sender and all its secondary senders go offline and messages are in their queues, data loss could occur, because there is no failover system.
**Verify That isOriginRemote Is Set to False**
The isOriginRemote flag for a server or a multi-site gateway is set to false by default, which ensures that updates are distributed to other members. Setting its value to true in the server or the receiving gateway member applies updates to that member only, so updates are not distributed to peer members.