blob: ebf8bb96379422257debe82af9306469c9f2ab03 [file] [log] [blame]
---
title: Understanding Custom Partitioning and Data Colocation
---
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
Custom partitioning and data colocation can be used separately or in conjunction with one another.
## <a id="custom_partitioning_and_data_colocation__section_ABFEE9CB17AF44F1AE252AC10FB5E999" class="no-quick-link"></a>Custom Partitioning
Use custom partitioning to group like entries into region buckets within a region. By default, <%=vars.product_name%> assigns new entries to buckets based on the entry key's hash code. With custom partitioning, you can assign your entries to buckets in whatever way you want.
You can generally get better performance if you use custom partitioning to group similar data within a region. For example, a query run on all accounts created in January runs faster if all January account data is hosted by a single member. Grouping all data for a single customer can improve performance of data operations that work on customer data. Data aware function execution also takes advantage of custom partitioning.
With custom partitioning, you have two choices:
- **Standard custom partitioning**. With standard custom partitioning, you group entries into buckets, but you do not specify where the buckets reside. <%=vars.product_name%> always keeps the entries in the buckets you have specified, but may move the buckets around for load balancing.
See [Standard Custom Partitioning](standard_custom_partitioning.html) for
implementation and configuration details.
- **Fixed custom partitioning**. With fixed custom partitioning,
you specify the exact member where each region entry resides.
You assign an entry to a partition and then to a bucket within
that partition.
You name specific members as primary and secondary hosts of each partition.
This gives you complete control over the locations of your primary and any secondary buckets for the region. This can be useful when you want to store specific data on specific physical machines or when you need to keep data close to certain hardware elements.
Fixed partitioning has these requirements and caveats:
- <%=vars.product_name%> cannot rebalance fixed partition region data, because it cannot move the buckets around among the host members. You must carefully consider your expected data loads for the partitions you create.
- With fixed partitioning, the region configuration is different between host members. Each member identifies the named partitions it hosts, and whether it is hosting the primary copy or a secondary copy. You then program a fixed-partition resolver to return the partition id, so the entry is placed on the right members. Only one member can be primary for a particular partition name, and that member cannot be the partition's secondary.
See [Fixed Custom Partitioning](fixed_custom_partitioning.html) for
implementation and configuration details.
## <a id="custom_partitioning_and_data_colocation__section_D2C66951FE38426F9C05050D2B9028D8" class="no-quick-link"></a>Data Colocation Between Regions
With data colocation, <%=vars.product_name%> stores entries that are related across multiple data regions in a single member. <%=vars.product_name%> does this by storing all of the regions' buckets with the same ID together in the same member. During rebalancing operations, <%=vars.product_name%> moves these bucket groups together or not at all.
So, for example, if you have one region with customer contact information and another region with customer orders, you can use colocation to keep all contact information and all orders for a single customer in a single member. This way, any operation done for a single customer uses the cache of only a single member.
This figure shows two regions with data colocation where the data is partitioned by customer type.
<img src="../../images_svg/colocated_partitioned_regions.svg" id="custom_partitioning_and_data_colocation__image_525AC474950F473ABCDE8E372583C5DF" class="image" />
Data colocation requires the same data partitioning mechanism for all of the colocated regions. You can use the default partitioning provided by <%=vars.product_name%> or any of the custom partitioning strategies.
You must use the same high availability settings across your colocated regions.
See [Colocate Data from Different Partitioned Regions](colocating_partitioned_region_data.html) for implementation and configuration details.