blob: 774a9cef19499356d4345425bde0b7b838e79e20 [file] [log] [blame]
<!--
▄▄▄ ██▓███ ▄▄▄ ▄████▄ ██░ ██ ▓█████ ██▓ ▄████ ███▄ █ ██▓▄▄▄█████▓▓█████
▒████▄ ▓██░ ██▒▒████▄ ▒██▀ ▀█ ▓██░ ██▒▓█ ▀ ▓██▒ ██▒ ▀█▒ ██ ▀█ █ ▓██▒▓ ██▒ ▓▒▓█ ▀
▒██ ▀█▄ ▓██░ ██▓▒▒██ ▀█▄ ▒▓█ ▄ ▒██▀▀██░▒███ ▒██▒▒██░▄▄▄░▓██ ▀█ ██▒▒██▒▒ ▓██░ ▒░▒███
░██▄▄▄▄██ ▒██▄█▓▒ ▒░██▄▄▄▄██ ▒▓▓▄ ▄██▒░▓█ ░██ ▒▓█ ▄ ░██░░▓█ ██▓▓██▒ ▐▌██▒░██░░ ▓██▓ ░ ▒▓█ ▄
▓█ ▓██▒▒██▒ ░ ░ ▓█ ▓██▒▒ ▓███▀ ░░▓█▒░██▓░▒████▒ ░██░░▒▓███▀▒▒██░ ▓██░░██░ ▒██▒ ░ ░▒████▒
▒▒ ▓▒█░▒▓▒░ ░ ░ ▒▒ ▓▒█░░ ░▒ ▒ ░ ▒ ░░▒░▒░░ ▒░ ░ ░▓ ░▒ ▒ ░ ▒░ ▒ ▒ ░▓ ▒ ░░ ░░ ▒░ ░
▒ ▒▒ ░░▒ ░ ▒ ▒▒ ░ ░ ▒ ▒ ░▒░ ░ ░ ░ ░ ▒ ░ ░ ░ ░ ░░ ░ ▒░ ▒ ░ ░ ░ ░ ░
░ ▒ ░░ ░ ▒ ░ ░ ░░ ░ ░ ▒ ░░ ░ ░ ░ ░ ░ ▒ ░ ░ ░
░ ░ ░ ░░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
-->
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE html>
<html lang="en">
<head>
<link rel="canonical" href="https://ignite.apache.org/use-cases/hadoop-acceleration.html"/>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description"
content="Apache Ignite enables real-time analytics across operational and historical silos for existing
Apache Hadoop deployments. Ignite serves as an in-memory computing platform designated for low-latency and
real-time operations while Hadoop continues to be used for long-running OLAP workloads."/>
<title>Apache Hadoop Performance Acceleration</title>
<!--#include virtual="/includes/styles.html" -->
</head>
<body>
<!--#include virtual="/includes/header.html" -->
<article>
<header>
<div class="container">
<h1>Apache Hadoop <strong>Performance Acceleration</strong></h1>
</div>
</header>
<div class="container">
<p>
Apache Ignite® enables real-time analytics across Apache™ Hadoop® operational and historical data silos. The
Ignite in-memory computing platform provides low-latency and high-throughput operations while Hadoop
continues to be used for long-running OLAP workloads.
</p>
<img class="diagram-right img-fluid" src="/images/svg-diagrams/hadoop_acceleration.svg" alt="Apache Hadoop Performance Acceleration" />
<p>
As the architecture diagram on the right suggests, you can achieve the performance acceleration
of Hadoop-based systems by deploying Ignite as a separate distributed storage that maintains the data
sets required for your low-latency operations or real-time reports.
</p>
<p>
First, depending on the data volume and available memory capacity, you can enable Ignite native persistence
to
store historical data sets on disk while dedicating a memory space for operational records. You can
continue to use Hadoop as storage for less frequently used data or for long-running and ad-hoc
analytical queries.
</p>
<p>
Next, your applications and services should use Ignite native APIs to
process the data residing in the in-memory cluster. Ignite provides SQL, compute (aka. map-reduce),
and machine learning APIs for various data processing needs.
</p>
<p>
Finally, consider using Apache Spark DataFrames APIs if an application needs to run federated or
cross-database queries across Ignite and Hadoop clusters. Ignite is integrated with Spark, which natively
supports Hive/Hadoop. Cross-database queries should be considered only for a limited number of
scenarios when neither Ignite nor Hadoop contains the entire data set.
</p>
<h2>How to split data and operations between Ignite and Hadoop?</h2>
<p>
Consider using this approach:
</p>
<ul>
<li>
Use Apache Ignite for tasks that require low-latency response time (microseconds,
milliseconds, seconds), high throughput operations (thousands and millions of
operations per second), and real-time processing.
</li>
<li>
Continue using Apache Hadoop for high-latency operations (dozens of seconds, minutes, hours) and
batch processing.
</li>
</ul>
<h2>Getting Started Checklist</h2>
<p>
Follow the steps below to implement the discussed architecture in practice:
</p>
<ul>
<li>
Download and install Apache Ignite in your system.
</li>
<li>
Select a list of operations/reports to be executed against Ignite. The best candidates are
operations that require low-latency response time, high-throughput, and real-time analytics.
</li>
<li>
Depending on the data volume and available memory space, consider using Ignite native
persistence. Alternatively, you can use Ignite as a pure in-memory cache or in-memory data grid
that persists changes to Hadoop or another external database.
</li>
<li>
Update your applications to ensure they use Ignite native APIs to process Ignite data and Spark
for federated queries.
</li>
<li>
If you need to replicate changes between Ignite and Hadoop clusters, consider using existing
change-data-capture solutions like Debezium, Kafka, GridGain Data Lake Accelerator, Oracle GoldenGate
or others. If you'd like Ignite to write-through changes to Hadoop directly, then implement
<a href="https://apacheignite.readme.io/docs/3rd-party-store" target="_blank">Ignite's CacheStore</a>
interface.
</li>
</ul>
<div class="jumbotron jumbotron-fluid">
<div class="container">
<div class="title display-6">Learn More</div>
<hr class="my-4">
<div class="row">
<div class="col-sm-6">
<ul>
<li>
<a href="/features/sql.html">
Distributed SQL <i class="fas fa-angle-double-right"></i>
</a>
</li>
<li>
<a href="/features/collocated-processing.html">
Co-located Processing <i class="fas fa-angle-double-right"></i>
</a>
</li>
<li><a href="/features/acid-transactions.html">
ACID Transactions <i class="fas fa-angle-double-right"></i>
</a></li>
<li><a href="/arch/native-persistence.html">
Native Persistence <i class="fas fa-angle-double-right"></i>
</a></li>
</ul>
</div>
<div class="col-sm-6">
<ul>
<li>
<a href="/features/machinelearning.html">
Machine and Deep Learning <i class="fas fa-angle-double-right"></i>
</a>
</li>
<li>
<a href="/use-cases/in-memory-data-grid.html">
Ignite as an In-Memory Data Grid <i class="fas fa-angle-double-right"></i>
</a>
</li>
<li><a href="/use-cases/in-memory-database.html">
Ignite as an In-Memory Database <i class="fas fa-angle-double-right"></i>
</a></li>
<li><a href="/use-cases/digital-integration-hub.html">
Ignite as a Digital Integration Hub <i class="fas fa-angle-double-right"></i>
</a></li>
</ul>
</div>
</div>
</div>
</div>
</div>
</article>
<!--#include virtual="/includes/footer.html" -->
<!--#include virtual="/includes/scripts.html" -->
</body>
</html>