<!--
 ▄▄▄       ██▓███   ▄▄▄       ▄████▄   ██░ ██ ▓█████     ██▓  ▄████  ███▄    █  ██▓▄▄▄█████▓▓█████
▒████▄    ▓██░  ██▒▒████▄    ▒██▀ ▀█  ▓██░ ██▒▓█   ▀    ▓██▒ ██▒ ▀█▒ ██ ▀█   █ ▓██▒▓  ██▒ ▓▒▓█   ▀
▒██  ▀█▄  ▓██░ ██▓▒▒██  ▀█▄  ▒▓█    ▄ ▒██▀▀██░▒███      ▒██▒▒██░▄▄▄░▓██  ▀█ ██▒▒██▒▒ ▓██░ ▒░▒███
░██▄▄▄▄██ ▒██▄█▓▒ ▒░██▄▄▄▄██ ▒▓▓▄ ▄██▒░▓█ ░██ ▒▓█  ▄    ░██░░▓█  ██▓▓██▒  ▐▌██▒░██░░ ▓██▓ ░ ▒▓█  ▄
 ▓█   ▓██▒▒██▒ ░  ░ ▓█   ▓██▒▒ ▓███▀ ░░▓█▒░██▓░▒████▒   ░██░░▒▓███▀▒▒██░   ▓██░░██░  ▒██▒ ░ ░▒████▒
 ▒▒   ▓▒█░▒▓▒░ ░  ░ ▒▒   ▓▒█░░ ░▒ ▒  ░ ▒ ░░▒░▒░░ ▒░ ░   ░▓   ░▒   ▒ ░ ▒░   ▒ ▒ ░▓    ▒ ░░   ░░ ▒░ ░
  ▒   ▒▒ ░░▒ ░       ▒   ▒▒ ░  ░  ▒    ▒ ░▒░ ░ ░ ░  ░    ▒ ░  ░   ░ ░ ░░   ░ ▒░ ▒ ░    ░     ░ ░  ░
  ░   ▒   ░░         ░   ▒   ░         ░  ░░ ░   ░       ▒ ░░ ░   ░    ░   ░ ░  ▒ ░  ░         ░
      ░  ░               ░  ░░ ░       ░  ░  ░   ░  ░    ░        ░          ░  ░              ░  ░
-->

<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements.  See the NOTICE file
distributed with this work for additional information
regarding copyright ownership.  The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License.  You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied.  See the License for the
specific language governing permissions and limitations
under the License.
-->

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">

    <link rel="canonical" href="https://ignite.apache.org/use-cases/hadoop-acceleration.html"/>
    <!--#include virtual="/includes/scriptshead.html" -->

    <meta name="viewport" content="width=device-width, initial-scale=1.0">

    <meta name="description"
          content="Apache Ignite enables real-time analytics across operational and historical silos for existing
          Apache Hadoop deployments. Ignite serves as an in-memory computing platform designated for low-latency and
          real-time operations while Hadoop continues to be used for long-running OLAP workloads."/>

    <title>Apache Hadoop Performance Acceleration</title>

    <!--#include virtual="/includes/styles.html" -->

    
</head>
<body>
<!--#include virtual="/includes/header.html" -->
<article>
    <header>
        <div class="container">

            <h1>Apache Hadoop <strong>Performance Acceleration</strong></h1>
        </div>
    </header>
    <div class="container">
        <p>
            Apache Ignite® enables real-time analytics across Apache™ Hadoop® operational and historical data silos. The
            Ignite in-memory computing platform provides low-latency and high-throughput operations while Hadoop
            continues to be used for long-running OLAP workloads.
        </p>
        <img class="diagram-right img-fluid" src="/images/svg-diagrams/hadoop_acceleration.svg" alt="Apache Hadoop Performance Acceleration" width="555" height="418" />

        <p>
            As the architecture diagram on the right suggests, you can achieve the performance acceleration
            of Hadoop-based systems by deploying Ignite as a separate distributed storage that maintains the data
            sets required for your low-latency operations or real-time reports.
        </p>

        <p>
            First, depending on the data volume and available memory capacity, you can enable Ignite native persistence
            to
            store historical data sets on disk while dedicating a memory space for operational records. You can
            continue to use Hadoop as storage for less frequently used data or for long-running and ad-hoc
            analytical queries.
        </p>

        <p>
            Next, your applications and services should use Ignite native APIs to
            process the data residing in the in-memory cluster. Ignite provides SQL, compute (aka. map-reduce),
            and machine learning APIs for various data processing needs.
        </p>

        <p>
            Finally, consider using Apache Spark DataFrames APIs if an application needs to run federated or
            cross-database queries across Ignite and Hadoop clusters. Ignite is integrated with Spark, which natively
            supports Hive/Hadoop. Cross-database queries should be considered only for a limited number of
            scenarios when neither Ignite nor Hadoop contains the entire data set.
        </p>


        <h2>How to split data and operations between Ignite and Hadoop?</h2>
        <p>
            Consider using this approach:
        </p>
        <ul>
            <li>
                Use Apache Ignite for tasks that require low-latency response time (microseconds,
                milliseconds, seconds), high throughput operations (thousands and millions of
                operations per second), and real-time processing.
            </li>
            <li>
                Continue using Apache Hadoop for high-latency operations (dozens of seconds, minutes, hours) and
                batch processing.
            </li>
        </ul>


        <h2>Getting Started Checklist</h2>
        <p>
            Follow the steps below to implement the discussed architecture in practice:
        </p>
        <ul>
            <li>
                Download and install Apache Ignite in your system.
            </li>
            <li>
                Select a list of operations/reports to be executed against Ignite. The best candidates are
                operations that require low-latency response time, high-throughput, and real-time analytics.
            </li>
            <li>
                Depending on the data volume and available memory space, consider using Ignite native
                persistence. Alternatively, you can use Ignite as a pure in-memory cache or in-memory data grid
                that persists changes to Hadoop or another external database.
            </li>
            <li>
                Update your applications to ensure they use Ignite native APIs to process Ignite data and Spark
                for federated queries.
            </li>
            <li>
                If you need to replicate changes between Ignite and Hadoop clusters, consider using existing
                change-data-capture solutions like Debezium, Kafka, GridGain Data Lake Accelerator, Oracle GoldenGate
                or others. If you'd like Ignite to write-through changes to Hadoop directly, then implement
                <a href="/docs/latest/persistence/external-storage">Ignite's CacheStore</a>
                interface.
            </li>
        </ul>


        <div class="jumbotron jumbotron-fluid">
            <div class="container">
                <div class="title display-6">Learn More</div>
                <hr class="my-4">
                <div class="row">
                    <div class="col-sm-6">
                        <ul>
                            <li>
                                <a href="/features/sql.html">
                                    Distributed SQL <i class="fas fa-angle-double-right"></i>
                                </a>
                            </li>
                            <li>
                                <a href="/features/collocated-processing.html">
                                    Co-located Processing <i class="fas fa-angle-double-right"></i>
                                </a>
                            </li>
                            <li><a href="/features/acid-transactions.html">
                                ACID Transactions <i class="fas fa-angle-double-right"></i>
                            </a></li>
                            <li><a href="/arch/native-persistence.html">
                                Native Persistence <i class="fas fa-angle-double-right"></i>
                            </a></li>
                        </ul>
                    </div>
                    <div class="col-sm-6">
                        <ul>
                            <li>
                                <a href="/features/machinelearning.html">
                                    Machine and Deep Learning <i class="fas fa-angle-double-right"></i>
                                </a>
                            </li>
                            <li>
                                <a href="/use-cases/in-memory-data-grid.html">
                                    Ignite as an In-Memory Data Grid <i class="fas fa-angle-double-right"></i>
                                </a>
                            </li>
                            <li><a href="/use-cases/in-memory-database.html">
                                Ignite as an In-Memory Database <i class="fas fa-angle-double-right"></i>
                            </a></li>
                            <li><a href="/use-cases/digital-integration-hub.html">
                                Ignite as a Digital Integration Hub <i class="fas fa-angle-double-right"></i>
                            </a></li>
                        </ul>
                    </div>
                </div>
            </div>
        </div>

    </div>
</article>

<!--#include virtual="/includes/footer.html" -->
<!--#include virtual="/includes/scripts.html" -->
</body>
</html>
