blob: 2ad71d680a6e9259875b314b96c9071ff8276774 [file] [log] [blame]
// Licensed to the Apache Software Foundation (ASF) under one or more
// contributor license agreements. See the NOTICE file distributed with
// this work for additional information regarding copyright ownership.
// The ASF licenses this file to You under the Apache License, Version 2.0
// (the "License"); you may not use this file except in compliance with
// the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
= Cache Groups
For each cache deployed in the cluster, there is always overhead: the cache is split into partitions whose state must be tracked on every cluster node.
If link:persistence/native-persistence[Native Persistence] is enabled, then for every partition there is an open file on the disk that Ignite actively writes to and reads from. Thus, the more caches and partitions you have:
* The more Java heap is occupied by partition maps. Every cache has its own partition map.
* The longer it might take for a new node to join the cluster.
* The longer it might take to initiate rebalancing if a node leaves the cluster.
* The more partition files are kept open and the worse the performance of the checkpointing might be.
Usually, you will not spot any of these problems for deployments with dozens or several hundreds of caches. However, when it comes to thousands the impact can be noticeable.
To avoid this impact, consider using cache groups. Caches within a single cache group share various internal structures such as partitions maps, thus boosting topology events processing and decreasing overall memory usage. Note that from the API standpoint, there is no difference whether a cache is a part of a group or not.
You can create a cache group by setting the `groupName` property of `CacheConfiguration`.
Here is an example of how to assign caches to a specific group:
[tabs]
--
tab:XML[]
[source,xml]
----
include::code-snippets/xml/cache-groups.xml[tags=ignite-config;!discovery, indent=0]
----
tab:Java[]
[source,java]
----
include::{javaCodeDir}/DataPartitioning.java[tag=cfg,indent=0]
----
tab:C#/.NET[]
[source,csharp]
----
include::code-snippets/dotnet/DataModellingDataPartitioning.cs[tag=partitioning,indent=0]
----
tab:C++[unsupported]
--
In the above example, the `Person` and `Organization` caches belong to `group1`.
[NOTE]
====
[discrete]
=== How are key-value pairs distinguished?
If a cache is assigned to a cache group, its data is stored in shared partitions' internal structures.
Every key you put into the cache is enriched with the unique ID of the cache the key belongs to.
The ID is derived from the cache name.
This happens automatically and allows storing data of different caches in the same partitions and B+tree structures.
====
The reason for grouping caches is simple — if you decide to group 1000 caches, then you have 1000x fewer structures that store partitions' data, partition maps, and open partition files.
[NOTE]
====
[discrete]
=== Should cache groups be used all the time?
With all the benefits cache groups have, they might impact the performance of read operations and indexes lookups.
This is caused by the fact that all data and indexes get mixed in shared data structures (partition maps, B+trees), and it will take more time to query over them.
Thus, consider using the cache groups if you have a cluster of dozens and hundreds of nodes and caches, and you spot increased Java heap usage by internal structures, checkpointing performance drop, slow node connectivity to the cluster.
====