_posts/2018-05-31-apache-ignite-2-5-scaling.html - infrastructure-blogs-archive - Git at Google

 ---
 layout: post
 status: PUBLISHED
 published: true
 title: 'Apache Ignite 2.5: Scaling to 1000s Nodes Clusters'
 excerpt: Apache Ignite was always appreciated by its users for two primary things
   it delivers - scalability and performance. And now it grew to the point when the
   community decided to revisit its discovery subsystem that influences how well and
   far the database scales out. The goal was pretty clear - Ignite has to scale to
   1000s of nodes as good as it scales to 100s now. Check what we did to solve the
   challenge.
 id: 9e57e6d1-0acb-44f6-8111-062b486f1ec8
 date: '2018-05-31 22:17:35 -0400'
 categories: ignite
 tags:
 - ignite
 - sql
 - apache
 - database
 - spark
 permalink: ignite/entry/apache-ignite-2-5-scaling
 ---
 <p>Apache Ignite was always appreciated by its users for two primary things it delivers - scalability and performance. Throughout the lifetime many distributed systems tend to do performance optimizations from a release to release while making scalability related improvements just a couple of times. It's not because the scalability is of no interest. Usually, scalability requirements are set and solved once by a distributed system and don't require significant additional interventions by engineers. </p>
 <p>However, Apache Ignite grew to the point when the community decided to revisit its discovery subsystem that influences how well and far Ignite scales out. The goal was pretty clear - Ignite has to scale to 1000s of nodes as good as it scales to 100s now.</p>
 <p>It took many months to get the task implemented. So, please join me in welcoming Apache Ignite 2.5 that now can be scaled easily to 1000s of nodes and goes with other exciting capabilities. Let's check out the most prominent ones.</p>
 <h1>
 Massive Scalability<br />
 </h1>
 <p>There are two components of Ignite that were modified in Ignite 2.5 to improve its scalability capabilities. The first one is related to 1000s nodes clusters while the other is related to the way we train machine learning (ML) models in Ignite. Let's start with the first.</p>
 <h2>Marrying Apache Ignite and ZooKeeper</h2>
 <p>Right, that 1000s nodes scalability goal was solved with the help of Apache ZooKeeper. Why did we turn to it?</p>
 <p>Apache Ignite default TCP/IP Discovery organizes cluster nodes into a ring-topology form that has its advantages and disadvantages. For instance, on topologies with hundreds of cluster nodes, it can take many seconds why a system message traverse through all the nodes. As a result, necessary processing of events such as joining of new nodes or detecting of failed ones can take a while affecting overall cluster responsiveness and performance. That is a big deal if you'd like to run 1000s nodes clusters.</p>
 <p>The new ZooKeeper Discovery uses ZooKeeper as a single point of synchronization where Ignite nodes are exchanging discovery events through it. It solved the issue with long-to-be-processed discovery messages and, as a result, allowed Ignite scaling to large cluster topologies.</p>
 <p><a href="https://apacheignite.readme.io/docs/zookeeper-discovery">
 <div style="text-align: center"><img src="https://blogs.apache.org/ignite/mediaresource/4b80632d-232d-4e4f-bd5e-9d91f0bc550f" width="450"></img></div>
 <p></a></p>
 <p>As a rule of thumb, keep using default <a href="https://apacheignite.readme.io/docs/tcpip-discovery" target="_blank">TCP/IP Discovery</a> if it's unlikely that your Ignite cluster scales beyond 300s nodes and switch to <a href="https://apacheignite.readme.io/docs/zookeeper-discovery" target="_blank">ZooKeeper Discovery</a> if that's the case.</p>
 <h2>Machine Learning: Partition-Based Datasets</h2>
 <p>That's the second prominent feature of Ignite 2.5 that improves the way of how far you can scale your Ignite clusters to train ML models over terabytes or petabytes of data. The <a href="https://apacheignite.readme.io/docs/ml-partition-based-dataset" target="_blank">partition-based datasets</a> moved us closer to the implementation of Zero-ETL concept which implies that Ignite can be used as a single storage where ML models and algorithms are being improved iteratively and online without ETLing data back and forth between Ignite and another storage.</p>
 <p>Read more about the datasets from <a href="https://apacheignite.readme.io/docs/ml-partition-based-dataset" target="_blank">this</a> documentation page.</p>
 <h1>Genetic Algorithms</h1>
 <p>Ignite's ML component is ramping up and in the version 2.5 it accepted a contribution of genetic algorithms (GAs) which help to solve optimization problems by simulating the process of biological evolution. GAs are excellent for searching through large and complex data sets for an optimal solution. Real world applications of GAs include automotive design, computer gaming, robotics, investments, traffic/shipment routing and more.</p>
 <p>Refer to excessive articles of my community-mates Turik Campbell and Akmal B. Chaudhri which cover main benefits of GAs:</p>
 <ul>
 <li>
 <a href="https://www.linkedin.com/pulse/travel-like-macgyver-solve-knapsack-problem-ga-grid-turik-campbell/" target="_blank">Travel Like MacGyver: Solve Knapsack Problem with GA Grid</a>
 </li>
 <li>
 <a href="https://www.gridgain.com/resources/blog/genetic-algorithms-apacher-ignitetm" target="_blank">Genetic Algorithms with Apache&reg; Ignite</a>
 </li>
 </ul>
 <h1>Continuous Self-Healing and Consistency Checks</h1>
 <p>It's a known fact that many companies and businesses trusted Ignite its mission-critical deployments and solutions. As a result, sometimes Ignite doesn't even have a right to "misfire" and should be able to handle critical or unpredictable situations automatically or provide facilities to do deal with them manually. </p>
 <p>With Ignite 2.5, we've kicked off the realization of continuous self-healing concept that implies that no matter what happens with Ignite in production it should be able to tolerate unexpected failures and stay up and running. The following was done in 2.5:</p>
 <ul>
 <li>
 <a href="https://apacheignite.readme.io/docs/critical-failures-handling" target="_blank">Critical Failures Handling</a>
 </li>
 <li>
 <a href="https://apacheignite.readme.io/docs/transactions#section-long-running-transactions-termination" target="_blank">Long running transactions monitoring and termination</a>
 </li>
 <li>
 <a href="https://apacheignite.readme.io/docs/consistency-check-facilities" target="_blank">Data Consistency Check Facilities</a>
 </li>
 </ul>
 <h1>SQL: Security and Fast Data Loading</h1>
 <p>The community stays strong and determined in its goal of making Ignite SQL engine undistinguishable from SQL engines of famous and mature SQL database. What's the purpose? We want to make it easy for you to migrate from a relational database to Ignite, so that you can reuse all your skills gained before. Overall, this is what our SQL engine got in 2.5:</p>
 <ul>
 <li>
 Fast data loading with <a href="https://apacheignite-sql.readme.io/docs/copy" target="_blank">COPY</a> command and <a href="https://apacheignite-sql.readme.io/docs/jdbc-driver#section-streaming" target="_blank">streaming mode</a> using SQL APIs.
 </li>
 <li>
 <a href="https://apacheignite.readme.io/docs/transactions#section-long-running-transactions-termination" target="_blank">Long running transactions monitoring and termination</a>
 </li>
 <li>
 Secured Ignite cluster. Use <a href="https://apacheignite-sql.readme.io/docs/ddl" target="_blank">CREATE USER, DROP USER and ALTER USER</a> commands to manage who is allowed to connect to your clusters.
 </li>
 </ul>
 <h1>In-place Execution of Spark DataFrame Queries</h1>
 <p>Apache Spark users can applaud because the <a href="https://issues.apache.org/jira/browse/IGNITE-7077" target="_blank">following ticket</a> got merged in 2.5. In short, it means that from now on Ignite will be able to execute as many DataFrames SQL queries as it can in-place on Ignite servers side avoiding data movement from Ignite to Spark. The performance of your DataFrames queries should boost significantly. Enjoy!</p>
 <h1>DEB and RPM packages</h1>
 <p>Last but not least, if you're a Linux user, now you can install the latest Ignite versions directly from DEB and RPM repositories. Refer to <a href="https://apacheignite.readme.io/docs/getting-started#section-rpm-deb-packages-installation" target="_blank">how-to</a> and share your feedback with us.</p>
 <p>Finally, I have no more paper left to cover other optimizations and improvements. So, go ahead and check out our <a href="https://ignite.apache.org/releases/2.5.0/release_notes.html" target="_blank">release notes</a>.</p>
	---
	layout: post
	status: PUBLISHED
	published: true
	title: 'Apache Ignite 2.5: Scaling to 1000s Nodes Clusters'
	excerpt: Apache Ignite was always appreciated by its users for two primary things
	it delivers - scalability and performance. And now it grew to the point when the
	community decided to revisit its discovery subsystem that influences how well and
	far the database scales out. The goal was pretty clear - Ignite has to scale to
	1000s of nodes as good as it scales to 100s now. Check what we did to solve the
	challenge.
	id: 9e57e6d1-0acb-44f6-8111-062b486f1ec8
	date: '2018-05-31 22:17:35 -0400'
	categories: ignite
	tags:
	- ignite
	- sql
	- apache
	- database
	- spark
	permalink: ignite/entry/apache-ignite-2-5-scaling
	---
	<p>Apache Ignite was always appreciated by its users for two primary things it delivers - scalability and performance. Throughout the lifetime many distributed systems tend to do performance optimizations from a release to release while making scalability related improvements just a couple of times. It's not because the scalability is of no interest. Usually, scalability requirements are set and solved once by a distributed system and don't require significant additional interventions by engineers. </p>
	<p>However, Apache Ignite grew to the point when the community decided to revisit its discovery subsystem that influences how well and far Ignite scales out. The goal was pretty clear - Ignite has to scale to 1000s of nodes as good as it scales to 100s now.</p>
	<p>It took many months to get the task implemented. So, please join me in welcoming Apache Ignite 2.5 that now can be scaled easily to 1000s of nodes and goes with other exciting capabilities. Let's check out the most prominent ones.</p>
	<h1>
	Massive Scalability<br />
	</h1>
	<p>There are two components of Ignite that were modified in Ignite 2.5 to improve its scalability capabilities. The first one is related to 1000s nodes clusters while the other is related to the way we train machine learning (ML) models in Ignite. Let's start with the first.</p>
	<h2>Marrying Apache Ignite and ZooKeeper</h2>
	<p>Right, that 1000s nodes scalability goal was solved with the help of Apache ZooKeeper. Why did we turn to it?</p>
	<p>Apache Ignite default TCP/IP Discovery organizes cluster nodes into a ring-topology form that has its advantages and disadvantages. For instance, on topologies with hundreds of cluster nodes, it can take many seconds why a system message traverse through all the nodes. As a result, necessary processing of events such as joining of new nodes or detecting of failed ones can take a while affecting overall cluster responsiveness and performance. That is a big deal if you'd like to run 1000s nodes clusters.</p>
	<p>The new ZooKeeper Discovery uses ZooKeeper as a single point of synchronization where Ignite nodes are exchanging discovery events through it. It solved the issue with long-to-be-processed discovery messages and, as a result, allowed Ignite scaling to large cluster topologies.</p>
	<p><a href="https://apacheignite.readme.io/docs/zookeeper-discovery">
	<div style="text-align: center"><img src="https://blogs.apache.org/ignite/mediaresource/4b80632d-232d-4e4f-bd5e-9d91f0bc550f" width="450"></img></div>
	<p></a></p>
	<p>As a rule of thumb, keep using default <a href="https://apacheignite.readme.io/docs/tcpip-discovery" target="_blank">TCP/IP Discovery</a> if it's unlikely that your Ignite cluster scales beyond 300s nodes and switch to <a href="https://apacheignite.readme.io/docs/zookeeper-discovery" target="_blank">ZooKeeper Discovery</a> if that's the case.</p>
	<h2>Machine Learning: Partition-Based Datasets</h2>
	<p>That's the second prominent feature of Ignite 2.5 that improves the way of how far you can scale your Ignite clusters to train ML models over terabytes or petabytes of data. The <a href="https://apacheignite.readme.io/docs/ml-partition-based-dataset" target="_blank">partition-based datasets</a> moved us closer to the implementation of Zero-ETL concept which implies that Ignite can be used as a single storage where ML models and algorithms are being improved iteratively and online without ETLing data back and forth between Ignite and another storage.</p>
	<p>Read more about the datasets from <a href="https://apacheignite.readme.io/docs/ml-partition-based-dataset" target="_blank">this</a> documentation page.</p>
	<h1>Genetic Algorithms</h1>
	<p>Ignite's ML component is ramping up and in the version 2.5 it accepted a contribution of genetic algorithms (GAs) which help to solve optimization problems by simulating the process of biological evolution. GAs are excellent for searching through large and complex data sets for an optimal solution. Real world applications of GAs include automotive design, computer gaming, robotics, investments, traffic/shipment routing and more.</p>
	<p>Refer to excessive articles of my community-mates Turik Campbell and Akmal B. Chaudhri which cover main benefits of GAs:</p>
	<ul>
	<li>
	<a href="https://www.linkedin.com/pulse/travel-like-macgyver-solve-knapsack-problem-ga-grid-turik-campbell/" target="_blank">Travel Like MacGyver: Solve Knapsack Problem with GA Grid</a>
	</li>
	<li>
	<a href="https://www.gridgain.com/resources/blog/genetic-algorithms-apacher-ignitetm" target="_blank">Genetic Algorithms with Apache® Ignite</a>
	</li>
	</ul>
	<h1>Continuous Self-Healing and Consistency Checks</h1>
	<p>It's a known fact that many companies and businesses trusted Ignite its mission-critical deployments and solutions. As a result, sometimes Ignite doesn't even have a right to "misfire" and should be able to handle critical or unpredictable situations automatically or provide facilities to do deal with them manually. </p>
	<p>With Ignite 2.5, we've kicked off the realization of continuous self-healing concept that implies that no matter what happens with Ignite in production it should be able to tolerate unexpected failures and stay up and running. The following was done in 2.5:</p>
	<ul>
	<li>
	<a href="https://apacheignite.readme.io/docs/critical-failures-handling" target="_blank">Critical Failures Handling</a>
	</li>
	<li>
	<a href="https://apacheignite.readme.io/docs/transactions#section-long-running-transactions-termination" target="_blank">Long running transactions monitoring and termination</a>
	</li>
	<li>
	<a href="https://apacheignite.readme.io/docs/consistency-check-facilities" target="_blank">Data Consistency Check Facilities</a>
	</li>
	</ul>
	<h1>SQL: Security and Fast Data Loading</h1>
	<p>The community stays strong and determined in its goal of making Ignite SQL engine undistinguishable from SQL engines of famous and mature SQL database. What's the purpose? We want to make it easy for you to migrate from a relational database to Ignite, so that you can reuse all your skills gained before. Overall, this is what our SQL engine got in 2.5:</p>
	<ul>
	<li>
	Fast data loading with <a href="https://apacheignite-sql.readme.io/docs/copy" target="_blank">COPY</a> command and <a href="https://apacheignite-sql.readme.io/docs/jdbc-driver#section-streaming" target="_blank">streaming mode</a> using SQL APIs.
	</li>
	<li>
	<a href="https://apacheignite.readme.io/docs/transactions#section-long-running-transactions-termination" target="_blank">Long running transactions monitoring and termination</a>
	</li>
	<li>
	Secured Ignite cluster. Use <a href="https://apacheignite-sql.readme.io/docs/ddl" target="_blank">CREATE USER, DROP USER and ALTER USER</a> commands to manage who is allowed to connect to your clusters.
	</li>
	</ul>
	<h1>In-place Execution of Spark DataFrame Queries</h1>
	<p>Apache Spark users can applaud because the <a href="https://issues.apache.org/jira/browse/IGNITE-7077" target="_blank">following ticket</a> got merged in 2.5. In short, it means that from now on Ignite will be able to execute as many DataFrames SQL queries as it can in-place on Ignite servers side avoiding data movement from Ignite to Spark. The performance of your DataFrames queries should boost significantly. Enjoy!</p>
	<h1>DEB and RPM packages</h1>
	<p>Last but not least, if you're a Linux user, now you can install the latest Ignite versions directly from DEB and RPM repositories. Refer to <a href="https://apacheignite.readme.io/docs/getting-started#section-rpm-deb-packages-installation" target="_blank">how-to</a> and share your feedback with us.</p>
	<p>Finally, I have no more paper left to cover other optimizations and improvements. So, go ahead and check out our <a href="https://ignite.apache.org/releases/2.5.0/release_notes.html" target="_blank">release notes</a>.</p>