blob: e19444124f2c3daf8a9ecaa9733b7ee6828005fb [file] [log] [blame]
<!--
▄▄▄ ██▓███ ▄▄▄ ▄████▄ ██░ ██ ▓█████ ██▓ ▄████ ███▄ █ ██▓▄▄▄█████▓▓█████
▒████▄ ▓██░ ██▒▒████▄ ▒██▀ ▀█ ▓██░ ██▒▓█ ▀ ▓██▒ ██▒ ▀█▒ ██ ▀█ █ ▓██▒▓ ██▒ ▓▒▓█ ▀
▒██ ▀█▄ ▓██░ ██▓▒▒██ ▀█▄ ▒▓█ ▄ ▒██▀▀██░▒███ ▒██▒▒██░▄▄▄░▓██ ▀█ ██▒▒██▒▒ ▓██░ ▒░▒███
░██▄▄▄▄██ ▒██▄█▓▒ ▒░██▄▄▄▄██ ▒▓▓▄ ▄██▒░▓█ ░██ ▒▓█ ▄ ░██░░▓█ ██▓▓██▒ ▐▌██▒░██░░ ▓██▓ ░ ▒▓█ ▄
▓█ ▓██▒▒██▒ ░ ░ ▓█ ▓██▒▒ ▓███▀ ░░▓█▒░██▓░▒████▒ ░██░░▒▓███▀▒▒██░ ▓██░░██░ ▒██▒ ░ ░▒████▒
▒▒ ▓▒█░▒▓▒░ ░ ░ ▒▒ ▓▒█░░ ░▒ ▒ ░ ▒ ░░▒░▒░░ ▒░ ░ ░▓ ░▒ ▒ ░ ▒░ ▒ ▒ ░▓ ▒ ░░ ░░ ▒░ ░
▒ ▒▒ ░░▒ ░ ▒ ▒▒ ░ ░ ▒ ▒ ░▒░ ░ ░ ░ ░ ▒ ░ ░ ░ ░ ░░ ░ ▒░ ▒ ░ ░ ░ ░ ░
░ ▒ ░░ ░ ▒ ░ ░ ░░ ░ ░ ▒ ░░ ░ ░ ░ ░ ░ ▒ ░ ░ ░
░ ░ ░ ░░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
-->
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE html>
<html lang="en">
<head>
<link rel="canonical" href="https://ignite.apache.org/use-cases/hadoop-acceleration.html"/>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description"
content="Apache Ignite enables real-time analytics across operational and historical silos for existing
Apache Hadoop deployments. Ignite serves as an in-memory computing platform designated for low-latency and
real-time operations while Hadoop continues to be used for long-running OLAP workloads."/>
<title>Apache Hadoop Performance Acceleration With Apache Ignite</title>
<!--#include virtual="/includes/styles.html" -->
<!--#include virtual="/includes/sh.html" -->
</head>
<body>
<div id="wrapper">
<!--#include virtual="/includes/header.html" -->
<main id="main" role="main" class="container">
<section id="shared-memory-layer" class="page-section">
<h1 class="first">Apache Hadoop Performance Acceleration With Apache Ignite</h1>
<div class="col-sm-12 col-md-12 col-xs-12" style="padding:0 0 10px 0;">
<div class="col-sm-6 col-md-6 col-xs-12" style="padding-left:0; padding-right:0">
<p>
Apache Ignite enables real-time analytics across operational and historical silos for
existing Apache Hadoop deployments. It does this by serving as an in-memory computing
platform designated for low-latency and high-throughput operations while Hadoop continues to
be used for long-running OLAP workloads.
</p>
<p>
As the architecture diagram to the right suggests, you can achieve the performance acceleration
of Hadoop-based systems by deploying Ignite as a separate distributed storage that keeps data
sets needed for your low-latency operations or real-time reports.
</p>
</div>
<div class="col-sm-6 col-md-6 col-xs-12" style="padding-right:0">
<img class="img-responsive" src="/images/hadoop-acceleration.png" width="440px"
style="float:right;"/>
</div>
</div>
<p>
Depending on the data volume and available memory capacity, you can enable Ignite native persistence to
store historical data sets on disk while dedicating a memory space for operational records. Continue
using Hadoop as storage for less frequently used data or for long-running and ad-hoc analytical queries.
</p>
<p>
Next, as the architecture suggests, your applications and services should use Ignite native APIs to
process the data residing in the in-memory cluster. Ignite provides SQL, compute (aka. map-reduce),
and machine learning APIs for various data processing needs.
</p>
<p>
Finally, consider using Apache Spark DataFrames APIs if an application needs to run federated or
cross-database across Ignite and Hadoop clusters. Ignite is integrated with Spark, which natively
supports Hive/Hadoop. Cross-database queries should be considered only for a limited number of
scenarios when neither Ignite nor Hadoop contains the entire data set.
</p>
<div class="page-heading">How to split data and operations between Ignite and Hadoop?</div>
<p>
Consider using this approach:
</p>
<ul class="page-list">
<li>
Use Apache Ignite for tasks that require low-latency response time (microseconds,
milliseconds, seconds), high throughput operations (thousands and millions of
operations per second), and real-time processing.
</li>
<li>
Continue using Apache Hadoop for high-latency operations (dozens of seconds, minutes, hours) and
batch processing.
</li>
</ul>
<div class="page-heading">Getting Started Checklist</div>
<p>
Follow the steps below to implement the discussed architecture in practice:
</p>
<ul class="page-list">
<li>
Download and install Apache Ignite in your system.
</li>
<li>
Select a list of operations/reports to be executed against Ignite. The best candidates are
operations for which low-latency response time, high-throughput, and real-time analytics.
</li>
<li>
Depending on the data volume and available memory space, consider using Ignite native
persistence. Alternatively, you can use Ignite as a pure in-memory cache or in-memory data grid
that persists changes to Hadoop or another external database.
</li>
<li>
Update your applications to ensure they use Ignite native APIs to process Ignite data and Spark
for federated queries.
</li>
</ul>
<div class="page-heading">Learn More</div>
<p>
<a href="/arch/memorycentric.html">
<b>Memory-Centric Storage <i class="fa fa-angle-double-right"></i></b>
</a>
</p>
<p>
<a href="/arch/persistence.html">
<b>Native Persistence <i class="fa fa-angle-double-right"></i></b>
</a>
</p>
<p>
<a href="/features/collocatedprocessing.html">
<b>Co-located Processing <i class="fa fa-angle-double-right"></i></b>
</a>
</p>
<p>
<a href="/features/sql.html">
<b>Distributed SQL <i class="fa fa-angle-double-right"></i></b>
</a>
</p>
<p>
<a href="/features/machinelearning.html">
<b>Machine and Deep Learning <i class="fa fa-angle-double-right"></i></b>
</a>
</p>
<p>
<a href="https://apacheignite-fs.readme.io/docs/installation-deployment" target="docs">
<b>Ignite and Spark Installation and Deployment <i class="fa fa-angle-double-right"></i></b>
</a>
</p>
<p>
<a href="https://apacheignite-fs.readme.io/docs/ignite-data-frame" target="docs">
<b>Ignite DataFrames in Details <i class="fa fa-angle-double-right"></i></b>
</a>
</p>
</section>
</main>
<!--#include virtual="/includes/footer.html" -->
</div>
<!--#include virtual="/includes/scripts.html" -->
</body>
</html>