blob: d755a336422f4d3fcd361e9df83381b8c8612a8e [file] [log] [blame]
---
layout: post
title: 'Apache Ignite 2.11: Stabilization First'
date: '2021-09-20T00:00:00+00:00'
categories: ignite
---
<p>The new <a href="https://ignite.apache.org/">Apache Ignite</a> 2.11 was released on September 17, 2021. It can be considered to be a greater
extent as a stabilization release that closed a number of technical debts of the internal architecture and bugs. Out of more than
200 completed tasks, 120 are bug fixes. However, some valuable improvements still exist, so let&#39;s take a quick look at them together.</p>
<h3 id="thin-clients">Thin Clients</h3>
<p>Partition awareness is enabled by default in the 2.11 release and allows thin clients to send query requests directly to the
node that owns the queried data. Without partition awareness, an application executes all queries and operations via
a single server node that acts as a proxy for the incoming requests.</p>
<p>The support of <a href="https://ignite.apache.org/docs/latest/thin-clients/java-thin-client#cache-entry-listening">Continuous Queries</a>
added to the java thin client. For the other supported features, you can check -
the <a href="https://cwiki.apache.org/confluence/display/IGNITE/Thin+clients+features">List of Thin Client Features</a>.</p>
<h3 id="cellular-clusters-deployment">Cellular-clusters Deployment</h3>
<p>The Apache Ignite internals has the so-called <em>switch</em> (a part of Partition Map Exchange) process that is used to perform
atomic execution of cluster-wide operations and move a cluster from one consistent state to another, for example, a cache creation/destroy,
a node JOIN/LEFT/FAIL operations, snapshot creation, etc. During the switching process, all user transactions are parked for a small
period of time which in turn increases the average latency and decreases throughput of the overall cluster.</p>
<p>Splitting the cluster into virtual cells containing 4-8 nodes may increase the total cluster performance and minimize the
influence of one cell on another in case of node fail events. Such a technique also significantly increases the recovery speed of
transactions on cells not affected by failing nodes. The time when transactions are parked also decreases on non-affected cells which
in turn decreases the worst latency for the cluster operations overall.</p>
<p>From now on, you can use the <em>RendezvousAffinityFunction</em> affinity function with <em>ClusterNodeAttributeColocatedBackupFilter</em> to
group nodes into virtual cells. Since the node baseline attributes are used as cell markers the corresponding
<a href="https://ignite.apache.org/docs/latest/monitoring-metrics/system-views#baseline_node_attributes">BASELINE_NODE_ATTRIBUTES</a> system
view was added.</p>
<p>See benchmarks below that represent the worst (max) latency, which happens in case of node left/failure/timeout events on broken
and alive cells.</p>
<a href="https://blogs.apache.org/ignite/mediaresource/ec8a7800-01e9-4910-aaa9-0e27ea2d4303"><img src="https://blogs.apache.org/ignite/mediaresource/ec8a7800-01e9-4910-aaa9-0e27ea2d4303" alt="723rhosidfgu4787fh9sdhf.png" width="50%"/></a>
<h3 id="new-page-replacement-policies">New Page Replacement Policies</h3>
<p>When Native Persistence is on and the amount of data, which Ignite stores on the disk, is bigger than the off-heap memory amount
allocated for the data region, another page should be evicted from the off-heap to the disk to preload a page from the disk to
the completely full off-heap memory. This process is called page replacement. Previously, Apache Ignite used the Random-LRU page
replacement algorithm which has a low maintenance cost, but it has many disadvantages and greatly affects the performance when
the page replacement is started. On some deployments, administrators even force a cluster restart periodically to avoid page
replacement. There are a few new algorithms available from now on:</p>
<ul>
<li>Segmented-LRU Algorithm</li>
<li>CLOCK Algorithm</li>
</ul>
<p>Page replacement algorithm can be configured by the <em>PageReplacementMode</em> property of <em>DataRegionConfiguration</em>. By default,
the CLOCK algorithm is now used. You can check the
<a href="https://ignite.apache.org/docs/latest/memory-configuration/replacement-policies">Replacement Policies</a> in the documentation
for more details.</p>
<h3 id="snapshot-restore-and-check-commands">Snapshot Restore And Check Commands</h3>
<h4 id="check">Check</h4>
<p>All snapshots are fully consistent in terms of concurrent cluster-wide operations as well as ongoing changes with Ignite.
However, in some cases and for your own peace of mind, it may be necessary to check the snapshot for completeness and
for data consistency. The Apache Ignite is now delivered with a built-in snapshot consistency check commands that enable you to
verify internal data consistency, calculate data partitions hashes and pages checksums, and print out the result if a problem
is found. The check command also compares hashes calculated by containing keys of primary partitions with corresponding backup
partitions and reports any differences.</p>
<pre><code class="lang-shell"><span class="hljs-comment"># This procedure does not require the cluster to be in the idle state.</span>
control.(sh|bat) --snapshot<span class="hljs-built_in"> check </span>snapshot_name
</code></pre>
<h4 id="restore">Restore</h4>
<p>Previously, only the manual snapshot restore procedure was available by fully copying persistence data files from the
snapshot directory to the Apache Ignite <em>work</em> directory. The automatic restore procedure allows you to restore cache groups from
a snapshot on an active cluster by using the Java API or command line script (using CLI is recommended). Currently, the restore
procedure has several limitations, so please check the documentation pages for details.</p>
<pre><code class="lang-shell"><span class="hljs-keyword">Start</span> restoring all <span class="hljs-keyword">user</span>-created <span class="hljs-keyword">cache</span> <span class="hljs-keyword">groups</span> <span class="hljs-keyword">from</span> the <span class="hljs-keyword">snapshot</span> <span class="hljs-string">"snapshot_09062021"</span>.
control.(sh|bat) <span class="hljs-comment">--snapshot restore snapshot_09062021 --start</span>
# <span class="hljs-keyword">Start</span> restoring <span class="hljs-keyword">only</span> <span class="hljs-string">"cache-group1"</span> <span class="hljs-keyword">and</span> <span class="hljs-string">"cache-group2"</span> <span class="hljs-keyword">from</span> the <span class="hljs-keyword">snapshot</span> <span class="hljs-string">"snapshot_09062021"</span>.
control.(sh|bat) <span class="hljs-comment">--snapshot restore snapshot_09062021 --start cache-group1,cache-group2</span>
# <span class="hljs-keyword">Get</span> the <span class="hljs-keyword">status</span> <span class="hljs-keyword">of</span> the <span class="hljs-keyword">restore</span> operation <span class="hljs-keyword">for</span> <span class="hljs-string">"snapshot_09062021"</span>.
control.(sh|bat) <span class="hljs-comment">--snapshot restore snapshot_09062021 --status</span>
# <span class="hljs-keyword">Cancel</span> the <span class="hljs-keyword">restore</span> operation <span class="hljs-keyword">for</span> <span class="hljs-string">"snapshot_09062021"</span>.
control.(sh|bat) <span class="hljs-comment">--snapshot restore snapshot_09062021 --cancel</span>
</code></pre>