_posts/2021-09-20-ignite.markdown - infrastructure-blogs-archive - Git at Google

 ---
 layout: post
 title: 'Apache Ignite 2.11: Stabilization First'
 date: '2021-09-20T00:00:00+00:00'
 categories: ignite
 ---
 <p>The new <a href="https://ignite.apache.org/">Apache Ignite</a> 2.11 was released on September 17, 2021. It can be considered to be a greater
 extent as a stabilization release that closed a number of technical debts of the internal architecture and bugs. Out of more than
 200 completed tasks, 120 are bug fixes. However, some valuable improvements still exist, so let&#39;s take a quick look at them together.</p>
 <h3 id="thin-clients">Thin Clients</h3>
 <p>Partition awareness is enabled by default in the 2.11 release and allows thin clients to send query requests directly to the
 node that owns the queried data. Without partition awareness, an application executes all queries and operations via
 a single server node that acts as a proxy for the incoming requests.</p>
 <p>The support of <a href="https://ignite.apache.org/docs/latest/thin-clients/java-thin-client#cache-entry-listening">Continuous Queries</a>
 added to the java thin client. For the other supported features, you can check -
 the <a href="https://cwiki.apache.org/confluence/display/IGNITE/Thin+clients+features">List of Thin Client Features</a>.</p>
 <h3 id="cellular-clusters-deployment">Cellular-clusters Deployment</h3>
 <p>The Apache Ignite internals has the so-called <em>switch</em> (a part of Partition Map Exchange) process that is used to perform
 atomic execution of cluster-wide operations and move a cluster from one consistent state to another, for example, a cache creation/destroy,
 a node JOIN/LEFT/FAIL operations, snapshot creation, etc. During the switching process, all user transactions are parked for a small
 period of time which in turn increases the average latency and decreases throughput of the overall cluster.</p>
 <p>Splitting the cluster into virtual cells containing 4-8 nodes may increase the total cluster performance and minimize the
 influence of one cell on another in case of node fail events. Such a technique also significantly increases the recovery speed of
 transactions on cells not affected by failing nodes. The time when transactions are parked also decreases on non-affected cells which
 in turn decreases the worst latency for the cluster operations overall.</p>
 <p>From now on, you can use the <em>RendezvousAffinityFunction</em> affinity function with <em>ClusterNodeAttributeColocatedBackupFilter</em> to
 group nodes into virtual cells. Since the node baseline attributes are used as cell markers the corresponding
 <a href="https://ignite.apache.org/docs/latest/monitoring-metrics/system-views#baseline_node_attributes">BASELINE_NODE_ATTRIBUTES</a> system
 view was added.</p>
 <p>See benchmarks below that represent the worst (max) latency, which happens in case of node left/failure/timeout events on broken
 and alive cells.</p>
 <a href="https://blogs.apache.org/ignite/mediaresource/ec8a7800-01e9-4910-aaa9-0e27ea2d4303"><img src="https://blogs.apache.org/ignite/mediaresource/ec8a7800-01e9-4910-aaa9-0e27ea2d4303" alt="723rhosidfgu4787fh9sdhf.png" width="50%"/></a>
 <h3 id="new-page-replacement-policies">New Page Replacement Policies</h3>
 <p>When Native Persistence is on and the amount of data, which Ignite stores on the disk, is bigger than the off-heap memory amount
 allocated for the data region, another page should be evicted from the off-heap to the disk to preload a page from the disk to
 the completely full off-heap memory. This process is called page replacement. Previously, Apache Ignite used the Random-LRU page
 replacement algorithm which has a low maintenance cost, but it has many disadvantages and greatly affects the performance when
 the page replacement is started. On some deployments, administrators even force a cluster restart periodically to avoid page
 replacement. There are a few new algorithms available from now on:</p>
 <ul>
 <li>Segmented-LRU Algorithm</li>
 <li>CLOCK Algorithm</li>
 </ul>
 <p>Page replacement algorithm can be configured by the <em>PageReplacementMode</em> property of <em>DataRegionConfiguration</em>. By default,
 the CLOCK algorithm is now used. You can check the
 <a href="https://ignite.apache.org/docs/latest/memory-configuration/replacement-policies">Replacement Policies</a> in the documentation
 for more details.</p>
 <h3 id="snapshot-restore-and-check-commands">Snapshot Restore And Check Commands</h3>
 <h4 id="check">Check</h4>
 <p>All snapshots are fully consistent in terms of concurrent cluster-wide operations as well as ongoing changes with Ignite.
 However, in some cases and for your own peace of mind, it may be necessary to check the snapshot for completeness and
 for data consistency. The Apache Ignite is now delivered with a built-in snapshot consistency check commands that enable you to
 verify internal data consistency, calculate data partitions hashes and pages checksums, and print out the result if a problem
 is found. The check command also compares hashes calculated by containing keys of primary partitions with corresponding backup
 partitions and reports any differences.</p>
 <pre><code class="lang-shell"><span class="hljs-comment"># This procedure does not require the cluster to be in the idle state.</span>
 control.(sh|bat) --snapshot<span class="hljs-built_in"> check </span>snapshot_name
 </code></pre>
 <h4 id="restore">Restore</h4>
 <p>Previously, only the manual snapshot restore procedure was available by fully copying persistence data files from the
 snapshot directory to the Apache Ignite <em>work</em> directory. The automatic restore procedure allows you to restore cache groups from
 a snapshot on an active cluster by using the Java API or command line script (using CLI is recommended).  Currently, the restore
 procedure has several limitations, so please check the documentation pages for details.</p>
 <pre><code class="lang-shell"><span class="hljs-keyword">Start</span> restoring all <span class="hljs-keyword">user</span>-created <span class="hljs-keyword">cache</span> <span class="hljs-keyword">groups</span> <span class="hljs-keyword">from</span> the <span class="hljs-keyword">snapshot</span> <span class="hljs-string">"snapshot_09062021"</span>.
 control.(sh|bat) <span class="hljs-comment">--snapshot restore snapshot_09062021 --start</span>

 # <span class="hljs-keyword">Start</span> restoring <span class="hljs-keyword">only</span> <span class="hljs-string">"cache-group1"</span> <span class="hljs-keyword">and</span> <span class="hljs-string">"cache-group2"</span> <span class="hljs-keyword">from</span> the <span class="hljs-keyword">snapshot</span> <span class="hljs-string">"snapshot_09062021"</span>.
 control.(sh|bat) <span class="hljs-comment">--snapshot restore snapshot_09062021 --start cache-group1,cache-group2</span>

 # <span class="hljs-keyword">Get</span> the <span class="hljs-keyword">status</span> <span class="hljs-keyword">of</span> the <span class="hljs-keyword">restore</span> operation <span class="hljs-keyword">for</span> <span class="hljs-string">"snapshot_09062021"</span>.
 control.(sh|bat) <span class="hljs-comment">--snapshot restore snapshot_09062021 --status</span>

 # <span class="hljs-keyword">Cancel</span> the <span class="hljs-keyword">restore</span> operation <span class="hljs-keyword">for</span> <span class="hljs-string">"snapshot_09062021"</span>.
 control.(sh|bat) <span class="hljs-comment">--snapshot restore snapshot_09062021 --cancel</span>
 </code></pre>
	---
	layout: post
	title: 'Apache Ignite 2.11: Stabilization First'
	date: '2021-09-20T00:00:00+00:00'
	categories: ignite
	---
	<p>The new <a href="https://ignite.apache.org/">Apache Ignite</a> 2.11 was released on September 17, 2021. It can be considered to be a greater
	extent as a stabilization release that closed a number of technical debts of the internal architecture and bugs. Out of more than
	200 completed tasks, 120 are bug fixes. However, some valuable improvements still exist, so let's take a quick look at them together.</p>
	<h3 id="thin-clients">Thin Clients</h3>
	<p>Partition awareness is enabled by default in the 2.11 release and allows thin clients to send query requests directly to the
	node that owns the queried data. Without partition awareness, an application executes all queries and operations via
	a single server node that acts as a proxy for the incoming requests.</p>
	<p>The support of <a href="https://ignite.apache.org/docs/latest/thin-clients/java-thin-client#cache-entry-listening">Continuous Queries</a>
	added to the java thin client. For the other supported features, you can check -
	the <a href="https://cwiki.apache.org/confluence/display/IGNITE/Thin+clients+features">List of Thin Client Features</a>.</p>
	<h3 id="cellular-clusters-deployment">Cellular-clusters Deployment</h3>
	<p>The Apache Ignite internals has the so-called <em>switch</em> (a part of Partition Map Exchange) process that is used to perform
	atomic execution of cluster-wide operations and move a cluster from one consistent state to another, for example, a cache creation/destroy,
	a node JOIN/LEFT/FAIL operations, snapshot creation, etc. During the switching process, all user transactions are parked for a small
	period of time which in turn increases the average latency and decreases throughput of the overall cluster.</p>
	<p>Splitting the cluster into virtual cells containing 4-8 nodes may increase the total cluster performance and minimize the
	influence of one cell on another in case of node fail events. Such a technique also significantly increases the recovery speed of
	transactions on cells not affected by failing nodes. The time when transactions are parked also decreases on non-affected cells which
	in turn decreases the worst latency for the cluster operations overall.</p>
	<p>From now on, you can use the <em>RendezvousAffinityFunction</em> affinity function with <em>ClusterNodeAttributeColocatedBackupFilter</em> to
	group nodes into virtual cells. Since the node baseline attributes are used as cell markers the corresponding
	<a href="https://ignite.apache.org/docs/latest/monitoring-metrics/system-views#baseline_node_attributes">BASELINE_NODE_ATTRIBUTES</a> system
	view was added.</p>
	<p>See benchmarks below that represent the worst (max) latency, which happens in case of node left/failure/timeout events on broken
	and alive cells.</p>
	<a href="https://blogs.apache.org/ignite/mediaresource/ec8a7800-01e9-4910-aaa9-0e27ea2d4303"><img src="https://blogs.apache.org/ignite/mediaresource/ec8a7800-01e9-4910-aaa9-0e27ea2d4303" alt="723rhosidfgu4787fh9sdhf.png" width="50%"/></a>
	<h3 id="new-page-replacement-policies">New Page Replacement Policies</h3>
	<p>When Native Persistence is on and the amount of data, which Ignite stores on the disk, is bigger than the off-heap memory amount
	allocated for the data region, another page should be evicted from the off-heap to the disk to preload a page from the disk to
	the completely full off-heap memory. This process is called page replacement. Previously, Apache Ignite used the Random-LRU page
	replacement algorithm which has a low maintenance cost, but it has many disadvantages and greatly affects the performance when
	the page replacement is started. On some deployments, administrators even force a cluster restart periodically to avoid page
	replacement. There are a few new algorithms available from now on:</p>
	<ul>
	<li>Segmented-LRU Algorithm</li>
	<li>CLOCK Algorithm</li>
	</ul>
	<p>Page replacement algorithm can be configured by the <em>PageReplacementMode</em> property of <em>DataRegionConfiguration</em>. By default,
	the CLOCK algorithm is now used. You can check the
	<a href="https://ignite.apache.org/docs/latest/memory-configuration/replacement-policies">Replacement Policies</a> in the documentation
	for more details.</p>
	<h3 id="snapshot-restore-and-check-commands">Snapshot Restore And Check Commands</h3>
	<h4 id="check">Check</h4>
	<p>All snapshots are fully consistent in terms of concurrent cluster-wide operations as well as ongoing changes with Ignite.
	However, in some cases and for your own peace of mind, it may be necessary to check the snapshot for completeness and
	for data consistency. The Apache Ignite is now delivered with a built-in snapshot consistency check commands that enable you to
	verify internal data consistency, calculate data partitions hashes and pages checksums, and print out the result if a problem
	is found. The check command also compares hashes calculated by containing keys of primary partitions with corresponding backup
	partitions and reports any differences.</p>
	<pre><code class="lang-shell"><span class="hljs-comment"># This procedure does not require the cluster to be in the idle state.</span>
	control.(sh\|bat) --snapshot<span class="hljs-built_in"> check </span>snapshot_name
	</code></pre>
	<h4 id="restore">Restore</h4>
	<p>Previously, only the manual snapshot restore procedure was available by fully copying persistence data files from the
	snapshot directory to the Apache Ignite <em>work</em> directory. The automatic restore procedure allows you to restore cache groups from
	a snapshot on an active cluster by using the Java API or command line script (using CLI is recommended). Currently, the restore
	procedure has several limitations, so please check the documentation pages for details.</p>
	<pre><code class="lang-shell"><span class="hljs-keyword">Start</span> restoring all <span class="hljs-keyword">user</span>-created <span class="hljs-keyword">cache</span> <span class="hljs-keyword">groups</span> <span class="hljs-keyword">from</span> the <span class="hljs-keyword">snapshot</span> <span class="hljs-string">"snapshot_09062021"</span>.
	control.(sh\|bat) <span class="hljs-comment">--snapshot restore snapshot_09062021 --start</span>

	# <span class="hljs-keyword">Start</span> restoring <span class="hljs-keyword">only</span> <span class="hljs-string">"cache-group1"</span> <span class="hljs-keyword">and</span> <span class="hljs-string">"cache-group2"</span> <span class="hljs-keyword">from</span> the <span class="hljs-keyword">snapshot</span> <span class="hljs-string">"snapshot_09062021"</span>.
	control.(sh\|bat) <span class="hljs-comment">--snapshot restore snapshot_09062021 --start cache-group1,cache-group2</span>

	# <span class="hljs-keyword">Get</span> the <span class="hljs-keyword">status</span> <span class="hljs-keyword">of</span> the <span class="hljs-keyword">restore</span> operation <span class="hljs-keyword">for</span> <span class="hljs-string">"snapshot_09062021"</span>.
	control.(sh\|bat) <span class="hljs-comment">--snapshot restore snapshot_09062021 --status</span>

	# <span class="hljs-keyword">Cancel</span> the <span class="hljs-keyword">restore</span> operation <span class="hljs-keyword">for</span> <span class="hljs-string">"snapshot_09062021"</span>.
	control.(sh\|bat) <span class="hljs-comment">--snapshot restore snapshot_09062021 --cancel</span>
	</code></pre>