| --- |
| layout: post |
| title: 'Apache Ignite 2.11: Stabilization First' |
| date: '2021-09-20T00:00:00+00:00' |
| categories: ignite |
| --- |
| <p>The new <a href="https://ignite.apache.org/">Apache Ignite</a> 2.11 was released on September 17, 2021. It can be considered to be a greater
|
| extent as a stabilization release that closed a number of technical debts of the internal architecture and bugs. Out of more than
|
| 200 completed tasks, 120 are bug fixes. However, some valuable improvements still exist, so let's take a quick look at them together.</p>
|
| <h3 id="thin-clients">Thin Clients</h3>
|
| <p>Partition awareness is enabled by default in the 2.11 release and allows thin clients to send query requests directly to the
|
| node that owns the queried data. Without partition awareness, an application executes all queries and operations via
|
| a single server node that acts as a proxy for the incoming requests.</p>
|
| <p>The support of <a href="https://ignite.apache.org/docs/latest/thin-clients/java-thin-client#cache-entry-listening">Continuous Queries</a>
|
| added to the java thin client. For the other supported features, you can check -
|
| the <a href="https://cwiki.apache.org/confluence/display/IGNITE/Thin+clients+features">List of Thin Client Features</a>.</p>
|
| <h3 id="cellular-clusters-deployment">Cellular-clusters Deployment</h3>
|
| <p>The Apache Ignite internals has the so-called <em>switch</em> (a part of Partition Map Exchange) process that is used to perform
|
| atomic execution of cluster-wide operations and move a cluster from one consistent state to another, for example, a cache creation/destroy,
|
| a node JOIN/LEFT/FAIL operations, snapshot creation, etc. During the switching process, all user transactions are parked for a small
|
| period of time which in turn increases the average latency and decreases throughput of the overall cluster.</p>
|
| <p>Splitting the cluster into virtual cells containing 4-8 nodes may increase the total cluster performance and minimize the
|
| influence of one cell on another in case of node fail events. Such a technique also significantly increases the recovery speed of
|
| transactions on cells not affected by failing nodes. The time when transactions are parked also decreases on non-affected cells which
|
| in turn decreases the worst latency for the cluster operations overall.</p>
|
| <p>From now on, you can use the <em>RendezvousAffinityFunction</em> affinity function with <em>ClusterNodeAttributeColocatedBackupFilter</em> to
|
| group nodes into virtual cells. Since the node baseline attributes are used as cell markers the corresponding
|
| <a href="https://ignite.apache.org/docs/latest/monitoring-metrics/system-views#baseline_node_attributes">BASELINE_NODE_ATTRIBUTES</a> system
|
| view was added.</p>
|
| <p>See benchmarks below that represent the worst (max) latency, which happens in case of node left/failure/timeout events on broken
|
| and alive cells.</p>
|
| <a href="https://blogs.apache.org/ignite/mediaresource/ec8a7800-01e9-4910-aaa9-0e27ea2d4303"><img src="https://blogs.apache.org/ignite/mediaresource/ec8a7800-01e9-4910-aaa9-0e27ea2d4303" alt="723rhosidfgu4787fh9sdhf.png" width="50%"/></a>
|
| <h3 id="new-page-replacement-policies">New Page Replacement Policies</h3>
|
| <p>When Native Persistence is on and the amount of data, which Ignite stores on the disk, is bigger than the off-heap memory amount
|
| allocated for the data region, another page should be evicted from the off-heap to the disk to preload a page from the disk to
|
| the completely full off-heap memory. This process is called page replacement. Previously, Apache Ignite used the Random-LRU page
|
| replacement algorithm which has a low maintenance cost, but it has many disadvantages and greatly affects the performance when
|
| the page replacement is started. On some deployments, administrators even force a cluster restart periodically to avoid page
|
| replacement. There are a few new algorithms available from now on:</p>
|
| <ul>
|
| <li>Segmented-LRU Algorithm</li>
|
| <li>CLOCK Algorithm</li>
|
| </ul>
|
| <p>Page replacement algorithm can be configured by the <em>PageReplacementMode</em> property of <em>DataRegionConfiguration</em>. By default,
|
| the CLOCK algorithm is now used. You can check the
|
| <a href="https://ignite.apache.org/docs/latest/memory-configuration/replacement-policies">Replacement Policies</a> in the documentation
|
| for more details.</p>
|
| <h3 id="snapshot-restore-and-check-commands">Snapshot Restore And Check Commands</h3>
|
| <h4 id="check">Check</h4>
|
| <p>All snapshots are fully consistent in terms of concurrent cluster-wide operations as well as ongoing changes with Ignite.
|
| However, in some cases and for your own peace of mind, it may be necessary to check the snapshot for completeness and
|
| for data consistency. The Apache Ignite is now delivered with a built-in snapshot consistency check commands that enable you to
|
| verify internal data consistency, calculate data partitions hashes and pages checksums, and print out the result if a problem
|
| is found. The check command also compares hashes calculated by containing keys of primary partitions with corresponding backup
|
| partitions and reports any differences.</p>
|
| <pre><code class="lang-shell"><span class="hljs-comment"># This procedure does not require the cluster to be in the idle state.</span>
|
| control.(sh|bat) --snapshot<span class="hljs-built_in"> check </span>snapshot_name
|
| </code></pre>
|
| <h4 id="restore">Restore</h4>
|
| <p>Previously, only the manual snapshot restore procedure was available by fully copying persistence data files from the
|
| snapshot directory to the Apache Ignite <em>work</em> directory. The automatic restore procedure allows you to restore cache groups from
|
| a snapshot on an active cluster by using the Java API or command line script (using CLI is recommended). Currently, the restore
|
| procedure has several limitations, so please check the documentation pages for details.</p>
|
| <pre><code class="lang-shell"><span class="hljs-keyword">Start</span> restoring all <span class="hljs-keyword">user</span>-created <span class="hljs-keyword">cache</span> <span class="hljs-keyword">groups</span> <span class="hljs-keyword">from</span> the <span class="hljs-keyword">snapshot</span> <span class="hljs-string">"snapshot_09062021"</span>.
|
| control.(sh|bat) <span class="hljs-comment">--snapshot restore snapshot_09062021 --start</span>
|
|
|
| # <span class="hljs-keyword">Start</span> restoring <span class="hljs-keyword">only</span> <span class="hljs-string">"cache-group1"</span> <span class="hljs-keyword">and</span> <span class="hljs-string">"cache-group2"</span> <span class="hljs-keyword">from</span> the <span class="hljs-keyword">snapshot</span> <span class="hljs-string">"snapshot_09062021"</span>.
|
| control.(sh|bat) <span class="hljs-comment">--snapshot restore snapshot_09062021 --start cache-group1,cache-group2</span>
|
|
|
| # <span class="hljs-keyword">Get</span> the <span class="hljs-keyword">status</span> <span class="hljs-keyword">of</span> the <span class="hljs-keyword">restore</span> operation <span class="hljs-keyword">for</span> <span class="hljs-string">"snapshot_09062021"</span>.
|
| control.(sh|bat) <span class="hljs-comment">--snapshot restore snapshot_09062021 --status</span>
|
|
|
| # <span class="hljs-keyword">Cancel</span> the <span class="hljs-keyword">restore</span> operation <span class="hljs-keyword">for</span> <span class="hljs-string">"snapshot_09062021"</span>.
|
| control.(sh|bat) <span class="hljs-comment">--snapshot restore snapshot_09062021 --cancel</span>
|
| </code></pre>
|