use-cases/hadoop-acceleration.html - ignite-website - Git at Google

 <!--
  ▄▄▄       ██▓███   ▄▄▄       ▄████▄   ██░ ██ ▓█████     ██▓  ▄████  ███▄    █  ██▓▄▄▄█████▓▓█████
 ▒████▄    ▓██░  ██▒▒████▄    ▒██▀ ▀█  ▓██░ ██▒▓█   ▀    ▓██▒ ██▒ ▀█▒ ██ ▀█   █ ▓██▒▓  ██▒ ▓▒▓█   ▀
 ▒██  ▀█▄  ▓██░ ██▓▒▒██  ▀█▄  ▒▓█    ▄ ▒██▀▀██░▒███      ▒██▒▒██░▄▄▄░▓██  ▀█ ██▒▒██▒▒ ▓██░ ▒░▒███
 ░██▄▄▄▄██ ▒██▄█▓▒ ▒░██▄▄▄▄██ ▒▓▓▄ ▄██▒░▓█ ░██ ▒▓█  ▄    ░██░░▓█  ██▓▓██▒  ▐▌██▒░██░░ ▓██▓ ░ ▒▓█  ▄
  ▓█   ▓██▒▒██▒ ░  ░ ▓█   ▓██▒▒ ▓███▀ ░░▓█▒░██▓░▒████▒   ░██░░▒▓███▀▒▒██░   ▓██░░██░  ▒██▒ ░ ░▒████▒
  ▒▒   ▓▒█░▒▓▒░ ░  ░ ▒▒   ▓▒█░░ ░▒ ▒  ░ ▒ ░░▒░▒░░ ▒░ ░   ░▓   ░▒   ▒ ░ ▒░   ▒ ▒ ░▓    ▒ ░░   ░░ ▒░ ░
   ▒   ▒▒ ░░▒ ░       ▒   ▒▒ ░  ░  ▒    ▒ ░▒░ ░ ░ ░  ░    ▒ ░  ░   ░ ░ ░░   ░ ▒░ ▒ ░    ░     ░ ░  ░
   ░   ▒   ░░         ░   ▒   ░         ░  ░░ ░   ░       ▒ ░░ ░   ░    ░   ░ ░  ▒ ░  ░         ░
       ░  ░               ░  ░░ ░       ░  ░  ░   ░  ░    ░        ░          ░  ░              ░  ░
 -->

 <!--
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
 regarding copyright ownership.  The ASF licenses this file
 to you under the Apache License, Version 2.0 (the
 "License"); you may not use this file except in compliance
 with the License.  You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->

 <!DOCTYPE html>
 <html lang="en">
 <head>

     <link rel="canonical" href="https://ignite.apache.org/use-cases/hadoop-acceleration.html"/>
     <meta charset="utf-8">
     <meta name="viewport" content="width=device-width, initial-scale=1.0">

     <meta name="description"
           content="Apache Ignite enables real-time analytics across operational and historical silos for existing
           Apache Hadoop deployments. Ignite serves as an in-memory computing platform designated for low-latency and
           real-time operations while Hadoop continues to be used for long-running OLAP workloads."/>

     <title>Apache Hadoop Performance Acceleration</title>

     <!--#include virtual="/includes/styles.html" -->


 </head>
 <body>
 <!--#include virtual="/includes/header.html" -->
 <article>
     <header>
         <div class="container">

             <h1>Apache Hadoop <strong>Performance Acceleration</strong></h1>
         </div>
     </header>
     <div class="container">
         <p>
             Apache Ignite® enables real-time analytics across Apache™ Hadoop® operational and historical data silos. The
             Ignite in-memory computing platform provides low-latency and high-throughput operations while Hadoop
             continues to be used for long-running OLAP workloads.
         </p>
         <img class="diagram-right img-fluid" src="/images/svg-diagrams/hadoop_acceleration.svg" alt="Apache Hadoop Performance Acceleration" />

         <p>
             As the architecture diagram on the right suggests, you can achieve the performance acceleration
             of Hadoop-based systems by deploying Ignite as a separate distributed storage that maintains the data
             sets required for your low-latency operations or real-time reports.
         </p>

         <p>
             First, depending on the data volume and available memory capacity, you can enable Ignite native persistence
             to
             store historical data sets on disk while dedicating a memory space for operational records. You can
             continue to use Hadoop as storage for less frequently used data or for long-running and ad-hoc
             analytical queries.
         </p>

         <p>
             Next, your applications and services should use Ignite native APIs to
             process the data residing in the in-memory cluster. Ignite provides SQL, compute (aka. map-reduce),
             and machine learning APIs for various data processing needs.
         </p>

         <p>
             Finally, consider using Apache Spark DataFrames APIs if an application needs to run federated or
             cross-database queries across Ignite and Hadoop clusters. Ignite is integrated with Spark, which natively
             supports Hive/Hadoop. Cross-database queries should be considered only for a limited number of
             scenarios when neither Ignite nor Hadoop contains the entire data set.
         </p>


         <h2>How to split data and operations between Ignite and Hadoop?</h2>
         <p>
             Consider using this approach:
         </p>
         <ul>
             <li>
                 Use Apache Ignite for tasks that require low-latency response time (microseconds,
                 milliseconds, seconds), high throughput operations (thousands and millions of
                 operations per second), and real-time processing.
             </li>
             <li>
                 Continue using Apache Hadoop for high-latency operations (dozens of seconds, minutes, hours) and
                 batch processing.
             </li>
         </ul>


         <h2>Getting Started Checklist</h2>
         <p>
             Follow the steps below to implement the discussed architecture in practice:
         </p>
         <ul>
             <li>
                 Download and install Apache Ignite in your system.
             </li>
             <li>
                 Select a list of operations/reports to be executed against Ignite. The best candidates are
                 operations that require low-latency response time, high-throughput, and real-time analytics.
             </li>
             <li>
                 Depending on the data volume and available memory space, consider using Ignite native
                 persistence. Alternatively, you can use Ignite as a pure in-memory cache or in-memory data grid
                 that persists changes to Hadoop or another external database.
             </li>
             <li>
                 Update your applications to ensure they use Ignite native APIs to process Ignite data and Spark
                 for federated queries.
             </li>
             <li>
                 If you need to replicate changes between Ignite and Hadoop clusters, consider using existing
                 change-data-capture solutions like Debezium, Kafka, GridGain Data Lake Accelerator, Oracle GoldenGate
                 or others. If you'd like Ignite to write-through changes to Hadoop directly, then implement
                 <a href="https://apacheignite.readme.io/docs/3rd-party-store" target="_blank">Ignite's CacheStore</a>
                 interface.
             </li>
         </ul>


         <div class="jumbotron jumbotron-fluid">
             <div class="container">
                 <div class="title display-6">Learn More</div>
                 <hr class="my-4">
                 <div class="row">
                     <div class="col-sm-6">
                         <ul>
                             <li>
                                 <a href="/features/sql.html">
                                     Distributed SQL <i class="fas fa-angle-double-right"></i>
                                 </a>
                             </li>
                             <li>
                                 <a href="/features/collocated-processing.html">
                                     Co-located Processing <i class="fas fa-angle-double-right"></i>
                                 </a>
                             </li>
                             <li><a href="/features/acid-transactions.html">
                                 ACID Transactions <i class="fas fa-angle-double-right"></i>
                             </a></li>
                             <li><a href="/arch/native-persistence.html">
                                 Native Persistence <i class="fas fa-angle-double-right"></i>
                             </a></li>
                         </ul>
                     </div>
                     <div class="col-sm-6">
                         <ul>
                             <li>
                                 <a href="/features/machinelearning.html">
                                     Machine and Deep Learning <i class="fas fa-angle-double-right"></i>
                                 </a>
                             </li>
                             <li>
                                 <a href="/use-cases/in-memory-data-grid.html">
                                     Ignite as an In-Memory Data Grid <i class="fas fa-angle-double-right"></i>
                                 </a>
                             </li>
                             <li><a href="/use-cases/in-memory-database.html">
                                 Ignite as an In-Memory Database <i class="fas fa-angle-double-right"></i>
                             </a></li>
                             <li><a href="/use-cases/digital-integration-hub.html">
                                 Ignite as a Digital Integration Hub <i class="fas fa-angle-double-right"></i>
                             </a></li>
                         </ul>
                     </div>
                 </div>
             </div>
         </div>

     </div>
 </article>

 <!--#include virtual="/includes/footer.html" -->
 <!--#include virtual="/includes/scripts.html" -->
 </body>
 </html>
	<!--
	▄▄▄ ██▓███ ▄▄▄ ▄████▄ ██░ ██ ▓█████ ██▓ ▄████ ███▄ █ ██▓▄▄▄█████▓▓█████
	▒████▄ ▓██░ ██▒▒████▄ ▒██▀ ▀█ ▓██░ ██▒▓█ ▀ ▓██▒ ██▒ ▀█▒ ██ ▀█ █ ▓██▒▓ ██▒ ▓▒▓█ ▀
	▒██ ▀█▄ ▓██░ ██▓▒▒██ ▀█▄ ▒▓█ ▄ ▒██▀▀██░▒███ ▒██▒▒██░▄▄▄░▓██ ▀█ ██▒▒██▒▒ ▓██░ ▒░▒███
	░██▄▄▄▄██ ▒██▄█▓▒ ▒░██▄▄▄▄██ ▒▓▓▄ ▄██▒░▓█ ░██ ▒▓█ ▄ ░██░░▓█ ██▓▓██▒ ▐▌██▒░██░░ ▓██▓ ░ ▒▓█ ▄
	▓█ ▓██▒▒██▒ ░ ░ ▓█ ▓██▒▒ ▓███▀ ░░▓█▒░██▓░▒████▒ ░██░░▒▓███▀▒▒██░ ▓██░░██░ ▒██▒ ░ ░▒████▒
	▒▒ ▓▒█░▒▓▒░ ░ ░ ▒▒ ▓▒█░░ ░▒ ▒ ░ ▒ ░░▒░▒░░ ▒░ ░ ░▓ ░▒ ▒ ░ ▒░ ▒ ▒ ░▓ ▒ ░░ ░░ ▒░ ░
	▒ ▒▒ ░░▒ ░ ▒ ▒▒ ░ ░ ▒ ▒ ░▒░ ░ ░ ░ ░ ▒ ░ ░ ░ ░ ░░ ░ ▒░ ▒ ░ ░ ░ ░ ░
	░ ▒ ░░ ░ ▒ ░ ░ ░░ ░ ░ ▒ ░░ ░ ░ ░ ░ ░ ▒ ░ ░ ░
	░ ░ ░ ░░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
	-->

	<!--
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	-->

	<!DOCTYPE html>
	<html lang="en">
	<head>

	<link rel="canonical" href="https://ignite.apache.org/use-cases/hadoop-acceleration.html"/>
	<meta charset="utf-8">
	<meta name="viewport" content="width=device-width, initial-scale=1.0">

	<meta name="description"
	content="Apache Ignite enables real-time analytics across operational and historical silos for existing
	Apache Hadoop deployments. Ignite serves as an in-memory computing platform designated for low-latency and
	real-time operations while Hadoop continues to be used for long-running OLAP workloads."/>

	<title>Apache Hadoop Performance Acceleration</title>

	<!--#include virtual="/includes/styles.html" -->


	</head>
	<body>
	<!--#include virtual="/includes/header.html" -->
	<article>
	<header>
	<div class="container">

	<h1>Apache Hadoop <strong>Performance Acceleration</strong></h1>
	</div>
	</header>
	<div class="container">
	<p>
	Apache Ignite® enables real-time analytics across Apache™ Hadoop® operational and historical data silos. The
	Ignite in-memory computing platform provides low-latency and high-throughput operations while Hadoop
	continues to be used for long-running OLAP workloads.
	</p>
	<img class="diagram-right img-fluid" src="/images/svg-diagrams/hadoop_acceleration.svg" alt="Apache Hadoop Performance Acceleration" />

	<p>
	As the architecture diagram on the right suggests, you can achieve the performance acceleration
	of Hadoop-based systems by deploying Ignite as a separate distributed storage that maintains the data
	sets required for your low-latency operations or real-time reports.
	</p>

	<p>
	First, depending on the data volume and available memory capacity, you can enable Ignite native persistence
	to
	store historical data sets on disk while dedicating a memory space for operational records. You can
	continue to use Hadoop as storage for less frequently used data or for long-running and ad-hoc
	analytical queries.
	</p>

	<p>
	Next, your applications and services should use Ignite native APIs to
	process the data residing in the in-memory cluster. Ignite provides SQL, compute (aka. map-reduce),
	and machine learning APIs for various data processing needs.
	</p>

	<p>
	Finally, consider using Apache Spark DataFrames APIs if an application needs to run federated or
	cross-database queries across Ignite and Hadoop clusters. Ignite is integrated with Spark, which natively
	supports Hive/Hadoop. Cross-database queries should be considered only for a limited number of
	scenarios when neither Ignite nor Hadoop contains the entire data set.
	</p>


	<h2>How to split data and operations between Ignite and Hadoop?</h2>
	<p>
	Consider using this approach:
	</p>
	<ul>
	<li>
	Use Apache Ignite for tasks that require low-latency response time (microseconds,
	milliseconds, seconds), high throughput operations (thousands and millions of
	operations per second), and real-time processing.
	</li>
	<li>
	Continue using Apache Hadoop for high-latency operations (dozens of seconds, minutes, hours) and
	batch processing.
	</li>
	</ul>


	<h2>Getting Started Checklist</h2>
	<p>
	Follow the steps below to implement the discussed architecture in practice:
	</p>
	<ul>
	<li>
	Download and install Apache Ignite in your system.
	</li>
	<li>
	Select a list of operations/reports to be executed against Ignite. The best candidates are
	operations that require low-latency response time, high-throughput, and real-time analytics.
	</li>
	<li>
	Depending on the data volume and available memory space, consider using Ignite native
	persistence. Alternatively, you can use Ignite as a pure in-memory cache or in-memory data grid
	that persists changes to Hadoop or another external database.
	</li>
	<li>
	Update your applications to ensure they use Ignite native APIs to process Ignite data and Spark
	for federated queries.
	</li>
	<li>
	If you need to replicate changes between Ignite and Hadoop clusters, consider using existing
	change-data-capture solutions like Debezium, Kafka, GridGain Data Lake Accelerator, Oracle GoldenGate
	or others. If you'd like Ignite to write-through changes to Hadoop directly, then implement
	<a href="https://apacheignite.readme.io/docs/3rd-party-store" target="_blank">Ignite's CacheStore</a>
	interface.
	</li>
	</ul>


	<div class="jumbotron jumbotron-fluid">
	<div class="container">
	<div class="title display-6">Learn More</div>
	<hr class="my-4">
	<div class="row">
	<div class="col-sm-6">
	<ul>
	<li>
	<a href="/features/sql.html">
	Distributed SQL <i class="fas fa-angle-double-right"></i>
	</a>
	</li>
	<li>
	<a href="/features/collocated-processing.html">
	Co-located Processing <i class="fas fa-angle-double-right"></i>
	</a>
	</li>
	<li><a href="/features/acid-transactions.html">
	ACID Transactions <i class="fas fa-angle-double-right"></i>
	</a></li>
	<li><a href="/arch/native-persistence.html">
	Native Persistence <i class="fas fa-angle-double-right"></i>
	</a></li>
	</ul>
	</div>
	<div class="col-sm-6">
	<ul>
	<li>
	<a href="/features/machinelearning.html">
	Machine and Deep Learning <i class="fas fa-angle-double-right"></i>
	</a>
	</li>
	<li>
	<a href="/use-cases/in-memory-data-grid.html">
	Ignite as an In-Memory Data Grid <i class="fas fa-angle-double-right"></i>
	</a>
	</li>
	<li><a href="/use-cases/in-memory-database.html">
	Ignite as an In-Memory Database <i class="fas fa-angle-double-right"></i>
	</a></li>
	<li><a href="/use-cases/digital-integration-hub.html">
	Ignite as a Digital Integration Hub <i class="fas fa-angle-double-right"></i>
	</a></li>
	</ul>
	</div>
	</div>
	</div>
	</div>

	</div>
	</article>

	<!--#include virtual="/includes/footer.html" -->
	<!--#include virtual="/includes/scripts.html" -->
	</body>
	</html>