use-cases/hadoop-acceleration.html - ignite-website - Git at Google

 <!--
  ▄▄▄       ██▓███   ▄▄▄       ▄████▄   ██░ ██ ▓█████     ██▓  ▄████  ███▄    █  ██▓▄▄▄█████▓▓█████
 ▒████▄    ▓██░  ██▒▒████▄    ▒██▀ ▀█  ▓██░ ██▒▓█   ▀    ▓██▒ ██▒ ▀█▒ ██ ▀█   █ ▓██▒▓  ██▒ ▓▒▓█   ▀
 ▒██  ▀█▄  ▓██░ ██▓▒▒██  ▀█▄  ▒▓█    ▄ ▒██▀▀██░▒███      ▒██▒▒██░▄▄▄░▓██  ▀█ ██▒▒██▒▒ ▓██░ ▒░▒███
 ░██▄▄▄▄██ ▒██▄█▓▒ ▒░██▄▄▄▄██ ▒▓▓▄ ▄██▒░▓█ ░██ ▒▓█  ▄    ░██░░▓█  ██▓▓██▒  ▐▌██▒░██░░ ▓██▓ ░ ▒▓█  ▄
  ▓█   ▓██▒▒██▒ ░  ░ ▓█   ▓██▒▒ ▓███▀ ░░▓█▒░██▓░▒████▒   ░██░░▒▓███▀▒▒██░   ▓██░░██░  ▒██▒ ░ ░▒████▒
  ▒▒   ▓▒█░▒▓▒░ ░  ░ ▒▒   ▓▒█░░ ░▒ ▒  ░ ▒ ░░▒░▒░░ ▒░ ░   ░▓   ░▒   ▒ ░ ▒░   ▒ ▒ ░▓    ▒ ░░   ░░ ▒░ ░
   ▒   ▒▒ ░░▒ ░       ▒   ▒▒ ░  ░  ▒    ▒ ░▒░ ░ ░ ░  ░    ▒ ░  ░   ░ ░ ░░   ░ ▒░ ▒ ░    ░     ░ ░  ░
   ░   ▒   ░░         ░   ▒   ░         ░  ░░ ░   ░       ▒ ░░ ░   ░    ░   ░ ░  ▒ ░  ░         ░
       ░  ░               ░  ░░ ░       ░  ░  ░   ░  ░    ░        ░          ░  ░              ░  ░
 -->

 <!--
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
 regarding copyright ownership.  The ASF licenses this file
 to you under the Apache License, Version 2.0 (the
 "License"); you may not use this file except in compliance
 with the License.  You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->

 <!DOCTYPE html>
 <html lang="en">
 <head>
 <link rel="canonical" href="https://ignite.apache.org/use-cases/hadoop-acceleration.html"/>
     <meta charset="utf-8">
     <meta name="viewport" content="width=device-width, initial-scale=1.0">

     <meta name="description"
           content="Apache Ignite enables real-time analytics across operational and historical silos for existing
           Apache Hadoop deployments. Ignite serves as an in-memory computing platform designated for low-latency and
           real-time operations while Hadoop continues to be used for long-running OLAP workloads."/>

     <title>Apache Hadoop Performance Acceleration With Apache Ignite</title>

     <!--#include virtual="/includes/styles.html" -->

     <!--#include virtual="/includes/sh.html" -->
 </head>
 <body>
 <div id="wrapper">
     <!--#include virtual="/includes/header.html" -->

     <main id="main" role="main" class="container">
         <section id="shared-memory-layer" class="page-section">
             <h1 class="first">Apache Hadoop Performance Acceleration With Apache Ignite</h1>
             <div class="col-sm-12 col-md-12 col-xs-12" style="padding:0 0 10px 0;">
                 <div class="col-sm-6 col-md-6 col-xs-12" style="padding-left:0; padding-right:0">
                     <p>
                         Apache Ignite enables real-time analytics across operational and historical silos for
                         existing Apache Hadoop deployments. It does this by serving as an in-memory computing
                         platform designated for low-latency and high-throughput operations while Hadoop continues to
                         be used for long-running OLAP workloads.
                     </p>

                     <p>
                         As the architecture diagram to the right suggests, you can achieve the performance acceleration
                         of Hadoop-based systems by deploying Ignite as a separate distributed storage that keeps data
                         sets needed for your low-latency operations or real-time reports.
                     </p>

                 </div>

                 <div class="col-sm-6 col-md-6 col-xs-12" style="padding-right:0">
                     <img class="img-responsive" src="/images/hadoop-acceleration.png" width="440px"
                          style="float:right;"/>
                 </div>
             </div>

             <p>
                 Depending on the data volume and available memory capacity, you can enable Ignite native persistence to
                 store historical data sets on disk while dedicating a memory space for operational records. Continue
                 using Hadoop as storage for less frequently used data or for long-running and ad-hoc analytical queries.
             </p>

             <p>
                 Next, as the architecture suggests, your applications and services should use Ignite native APIs to
                 process the data residing in the in-memory cluster. Ignite provides SQL, compute (aka. map-reduce),
                 and machine learning APIs for various data processing needs.
             </p>

             <p>
                 Finally, consider using Apache Spark DataFrames APIs if an application needs to run federated or
                 cross-database across Ignite and Hadoop clusters. Ignite is integrated with Spark, which natively
                 supports Hive/Hadoop. Cross-database queries should be considered only for a limited number of
                 scenarios when neither Ignite nor Hadoop contains the entire data set.
             </p>

             <div class="page-heading">How to split data and operations between Ignite and Hadoop?</div>
             <p>
                 Consider using this approach:
             </p>
             <ul class="page-list">
                 <li>
                     Use Apache Ignite for tasks that require low-latency response time (microseconds,
                     milliseconds, seconds), high throughput operations (thousands and millions of
                     operations per second), and real-time processing.
                 </li>
                 <li>
                     Continue using Apache Hadoop for high-latency operations (dozens of seconds, minutes, hours) and
                     batch processing.
                 </li>
             </ul>

             <div class="page-heading">Getting Started Checklist</div>
             <p>
                 Follow the steps below to implement the discussed architecture in practice:
             </p>
             <ul class="page-list">
                 <li>
                     Download and install Apache Ignite in your system.
                 </li>
                 <li>
                     Select a list of operations/reports to be executed against Ignite. The best candidates are
                     operations for which low-latency response time, high-throughput, and real-time analytics.
                 </li>
                 <li>
                     Depending on the data volume and available memory space, consider using Ignite native
                     persistence. Alternatively, you can use Ignite as a pure in-memory cache or in-memory data grid
                     that persists changes to Hadoop or another external database.
                 </li>
                 <li>
                     Update your applications to ensure they use Ignite native APIs to process Ignite data and Spark
                     for federated queries.
                 </li>
             </ul>

             <div class="page-heading">Learn More</div>
             <p>
                 <a href="/arch/memorycentric.html">
                     <b>Memory-Centric Storage <i class="fa fa-angle-double-right"></i></b>
                 </a>
             </p>
             <p>
                 <a href="/arch/persistence.html">
                     <b>Native Persistence <i class="fa fa-angle-double-right"></i></b>
                 </a>
             </p>
             <p>
                 <a href="/features/collocatedprocessing.html">
                     <b>Co-located Processing <i class="fa fa-angle-double-right"></i></b>
                 </a>
             </p>
             <p>
                 <a href="/features/sql.html">
                     <b>Distributed SQL <i class="fa fa-angle-double-right"></i></b>
                 </a>
             </p>
             <p>
                 <a href="/features/machinelearning.html">
                     <b>Machine and Deep Learning <i class="fa fa-angle-double-right"></i></b>
                 </a>
             </p>
             <p>
                 <a href="https://apacheignite-fs.readme.io/docs/installation-deployment" target="docs">
                     <b>Ignite and Spark Installation and Deployment <i class="fa fa-angle-double-right"></i></b>
                 </a>
             </p>
             <p>
                 <a href="https://apacheignite-fs.readme.io/docs/ignite-data-frame" target="docs">
                     <b>Ignite DataFrames in Details <i class="fa fa-angle-double-right"></i></b>
                 </a>
             </p>
         </section>
     </main>

     <!--#include virtual="/includes/footer.html" -->
 </div>
 <!--#include virtual="/includes/scripts.html" -->
 </body>
 </html>
	<!--
	▄▄▄ ██▓███ ▄▄▄ ▄████▄ ██░ ██ ▓█████ ██▓ ▄████ ███▄ █ ██▓▄▄▄█████▓▓█████
	▒████▄ ▓██░ ██▒▒████▄ ▒██▀ ▀█ ▓██░ ██▒▓█ ▀ ▓██▒ ██▒ ▀█▒ ██ ▀█ █ ▓██▒▓ ██▒ ▓▒▓█ ▀
	▒██ ▀█▄ ▓██░ ██▓▒▒██ ▀█▄ ▒▓█ ▄ ▒██▀▀██░▒███ ▒██▒▒██░▄▄▄░▓██ ▀█ ██▒▒██▒▒ ▓██░ ▒░▒███
	░██▄▄▄▄██ ▒██▄█▓▒ ▒░██▄▄▄▄██ ▒▓▓▄ ▄██▒░▓█ ░██ ▒▓█ ▄ ░██░░▓█ ██▓▓██▒ ▐▌██▒░██░░ ▓██▓ ░ ▒▓█ ▄
	▓█ ▓██▒▒██▒ ░ ░ ▓█ ▓██▒▒ ▓███▀ ░░▓█▒░██▓░▒████▒ ░██░░▒▓███▀▒▒██░ ▓██░░██░ ▒██▒ ░ ░▒████▒
	▒▒ ▓▒█░▒▓▒░ ░ ░ ▒▒ ▓▒█░░ ░▒ ▒ ░ ▒ ░░▒░▒░░ ▒░ ░ ░▓ ░▒ ▒ ░ ▒░ ▒ ▒ ░▓ ▒ ░░ ░░ ▒░ ░
	▒ ▒▒ ░░▒ ░ ▒ ▒▒ ░ ░ ▒ ▒ ░▒░ ░ ░ ░ ░ ▒ ░ ░ ░ ░ ░░ ░ ▒░ ▒ ░ ░ ░ ░ ░
	░ ▒ ░░ ░ ▒ ░ ░ ░░ ░ ░ ▒ ░░ ░ ░ ░ ░ ░ ▒ ░ ░ ░
	░ ░ ░ ░░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
	-->

	<!--
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	-->

	<!DOCTYPE html>
	<html lang="en">
	<head>
	<link rel="canonical" href="https://ignite.apache.org/use-cases/hadoop-acceleration.html"/>
	<meta charset="utf-8">
	<meta name="viewport" content="width=device-width, initial-scale=1.0">

	<meta name="description"
	content="Apache Ignite enables real-time analytics across operational and historical silos for existing
	Apache Hadoop deployments. Ignite serves as an in-memory computing platform designated for low-latency and
	real-time operations while Hadoop continues to be used for long-running OLAP workloads."/>

	<title>Apache Hadoop Performance Acceleration With Apache Ignite</title>

	<!--#include virtual="/includes/styles.html" -->

	<!--#include virtual="/includes/sh.html" -->
	</head>
	<body>
	<div id="wrapper">
	<!--#include virtual="/includes/header.html" -->

	<main id="main" role="main" class="container">
	<section id="shared-memory-layer" class="page-section">
	<h1 class="first">Apache Hadoop Performance Acceleration With Apache Ignite</h1>
	<div class="col-sm-12 col-md-12 col-xs-12" style="padding:0 0 10px 0;">
	<div class="col-sm-6 col-md-6 col-xs-12" style="padding-left:0; padding-right:0">
	<p>
	Apache Ignite enables real-time analytics across operational and historical silos for
	existing Apache Hadoop deployments. It does this by serving as an in-memory computing
	platform designated for low-latency and high-throughput operations while Hadoop continues to
	be used for long-running OLAP workloads.
	</p>

	<p>
	As the architecture diagram to the right suggests, you can achieve the performance acceleration
	of Hadoop-based systems by deploying Ignite as a separate distributed storage that keeps data
	sets needed for your low-latency operations or real-time reports.
	</p>

	</div>

	<div class="col-sm-6 col-md-6 col-xs-12" style="padding-right:0">
	<img class="img-responsive" src="/images/hadoop-acceleration.png" width="440px"
	style="float:right;"/>
	</div>
	</div>

	<p>
	Depending on the data volume and available memory capacity, you can enable Ignite native persistence to
	store historical data sets on disk while dedicating a memory space for operational records. Continue
	using Hadoop as storage for less frequently used data or for long-running and ad-hoc analytical queries.
	</p>

	<p>
	Next, as the architecture suggests, your applications and services should use Ignite native APIs to
	process the data residing in the in-memory cluster. Ignite provides SQL, compute (aka. map-reduce),
	and machine learning APIs for various data processing needs.
	</p>

	<p>
	Finally, consider using Apache Spark DataFrames APIs if an application needs to run federated or
	cross-database across Ignite and Hadoop clusters. Ignite is integrated with Spark, which natively
	supports Hive/Hadoop. Cross-database queries should be considered only for a limited number of
	scenarios when neither Ignite nor Hadoop contains the entire data set.
	</p>

	<div class="page-heading">How to split data and operations between Ignite and Hadoop?</div>
	<p>
	Consider using this approach:
	</p>
	<ul class="page-list">
	<li>
	Use Apache Ignite for tasks that require low-latency response time (microseconds,
	milliseconds, seconds), high throughput operations (thousands and millions of
	operations per second), and real-time processing.
	</li>
	<li>
	Continue using Apache Hadoop for high-latency operations (dozens of seconds, minutes, hours) and
	batch processing.
	</li>
	</ul>

	<div class="page-heading">Getting Started Checklist</div>
	<p>
	Follow the steps below to implement the discussed architecture in practice:
	</p>
	<ul class="page-list">
	<li>
	Download and install Apache Ignite in your system.
	</li>
	<li>
	Select a list of operations/reports to be executed against Ignite. The best candidates are
	operations for which low-latency response time, high-throughput, and real-time analytics.
	</li>
	<li>
	Depending on the data volume and available memory space, consider using Ignite native
	persistence. Alternatively, you can use Ignite as a pure in-memory cache or in-memory data grid
	that persists changes to Hadoop or another external database.
	</li>
	<li>
	Update your applications to ensure they use Ignite native APIs to process Ignite data and Spark
	for federated queries.
	</li>
	</ul>

	<div class="page-heading">Learn More</div>
	<p>
	<a href="/arch/memorycentric.html">
	<b>Memory-Centric Storage <i class="fa fa-angle-double-right"></i></b>
	</a>
	</p>
	<p>
	<a href="/arch/persistence.html">
	<b>Native Persistence <i class="fa fa-angle-double-right"></i></b>
	</a>
	</p>
	<p>
	<a href="/features/collocatedprocessing.html">
	<b>Co-located Processing <i class="fa fa-angle-double-right"></i></b>
	</a>
	</p>
	<p>
	<a href="/features/sql.html">
	<b>Distributed SQL <i class="fa fa-angle-double-right"></i></b>
	</a>
	</p>
	<p>
	<a href="/features/machinelearning.html">
	<b>Machine and Deep Learning <i class="fa fa-angle-double-right"></i></b>
	</a>
	</p>
	<p>
	<a href="https://apacheignite-fs.readme.io/docs/installation-deployment" target="docs">
	<b>Ignite and Spark Installation and Deployment <i class="fa fa-angle-double-right"></i></b>
	</a>
	</p>
	<p>
	<a href="https://apacheignite-fs.readme.io/docs/ignite-data-frame" target="docs">
	<b>Ignite DataFrames in Details <i class="fa fa-angle-double-right"></i></b>
	</a>
	</p>
	</section>
	</main>

	<!--#include virtual="/includes/footer.html" -->
	</div>
	<!--#include virtual="/includes/scripts.html" -->
	</body>
	</html>