| <!-- |
| ▄▄▄ ██▓███ ▄▄▄ ▄████▄ ██░ ██ ▓█████ ██▓ ▄████ ███▄ █ ██▓▄▄▄█████▓▓█████ |
| ▒████▄ ▓██░ ██▒▒████▄ ▒██▀ ▀█ ▓██░ ██▒▓█ ▀ ▓██▒ ██▒ ▀█▒ ██ ▀█ █ ▓██▒▓ ██▒ ▓▒▓█ ▀ |
| ▒██ ▀█▄ ▓██░ ██▓▒▒██ ▀█▄ ▒▓█ ▄ ▒██▀▀██░▒███ ▒██▒▒██░▄▄▄░▓██ ▀█ ██▒▒██▒▒ ▓██░ ▒░▒███ |
| ░██▄▄▄▄██ ▒██▄█▓▒ ▒░██▄▄▄▄██ ▒▓▓▄ ▄██▒░▓█ ░██ ▒▓█ ▄ ░██░░▓█ ██▓▓██▒ ▐▌██▒░██░░ ▓██▓ ░ ▒▓█ ▄ |
| ▓█ ▓██▒▒██▒ ░ ░ ▓█ ▓██▒▒ ▓███▀ ░░▓█▒░██▓░▒████▒ ░██░░▒▓███▀▒▒██░ ▓██░░██░ ▒██▒ ░ ░▒████▒ |
| ▒▒ ▓▒█░▒▓▒░ ░ ░ ▒▒ ▓▒█░░ ░▒ ▒ ░ ▒ ░░▒░▒░░ ▒░ ░ ░▓ ░▒ ▒ ░ ▒░ ▒ ▒ ░▓ ▒ ░░ ░░ ▒░ ░ |
| ▒ ▒▒ ░░▒ ░ ▒ ▒▒ ░ ░ ▒ ▒ ░▒░ ░ ░ ░ ░ ▒ ░ ░ ░ ░ ░░ ░ ▒░ ▒ ░ ░ ░ ░ ░ |
| ░ ▒ ░░ ░ ▒ ░ ░ ░░ ░ ░ ▒ ░░ ░ ░ ░ ░ ░ ▒ ░ ░ ░ |
| ░ ░ ░ ░░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ |
| --> |
| |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| |
| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| |
| <link rel="canonical" href="https://ignite.apache.org/use-cases/hadoop-acceleration.html"/> |
| <!--#include virtual="/includes/scriptshead.html" --> |
| <meta charset="utf-8"> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| |
| <meta name="description" |
| content="Apache Ignite enables real-time analytics across operational and historical silos for existing |
| Apache Hadoop deployments. Ignite serves as an in-memory computing platform designated for low-latency and |
| real-time operations while Hadoop continues to be used for long-running OLAP workloads."/> |
| |
| <title>Apache Hadoop Performance Acceleration</title> |
| |
| <!--#include virtual="/includes/styles.html" --> |
| |
| |
| </head> |
| <body> |
| <!--#include virtual="/includes/header.html" --> |
| <article> |
| <header> |
| <div class="container"> |
| |
| <h1>Apache Hadoop <strong>Performance Acceleration</strong></h1> |
| </div> |
| </header> |
| <div class="container"> |
| <p> |
| Apache Ignite® enables real-time analytics across Apache™ Hadoop® operational and historical data silos. The |
| Ignite in-memory computing platform provides low-latency and high-throughput operations while Hadoop |
| continues to be used for long-running OLAP workloads. |
| </p> |
| <img class="diagram-right img-fluid" src="/images/svg-diagrams/hadoop_acceleration.svg" alt="Apache Hadoop Performance Acceleration" /> |
| |
| <p> |
| As the architecture diagram on the right suggests, you can achieve the performance acceleration |
| of Hadoop-based systems by deploying Ignite as a separate distributed storage that maintains the data |
| sets required for your low-latency operations or real-time reports. |
| </p> |
| |
| <p> |
| First, depending on the data volume and available memory capacity, you can enable Ignite native persistence |
| to |
| store historical data sets on disk while dedicating a memory space for operational records. You can |
| continue to use Hadoop as storage for less frequently used data or for long-running and ad-hoc |
| analytical queries. |
| </p> |
| |
| <p> |
| Next, your applications and services should use Ignite native APIs to |
| process the data residing in the in-memory cluster. Ignite provides SQL, compute (aka. map-reduce), |
| and machine learning APIs for various data processing needs. |
| </p> |
| |
| <p> |
| Finally, consider using Apache Spark DataFrames APIs if an application needs to run federated or |
| cross-database queries across Ignite and Hadoop clusters. Ignite is integrated with Spark, which natively |
| supports Hive/Hadoop. Cross-database queries should be considered only for a limited number of |
| scenarios when neither Ignite nor Hadoop contains the entire data set. |
| </p> |
| |
| |
| <h2>How to split data and operations between Ignite and Hadoop?</h2> |
| <p> |
| Consider using this approach: |
| </p> |
| <ul> |
| <li> |
| Use Apache Ignite for tasks that require low-latency response time (microseconds, |
| milliseconds, seconds), high throughput operations (thousands and millions of |
| operations per second), and real-time processing. |
| </li> |
| <li> |
| Continue using Apache Hadoop for high-latency operations (dozens of seconds, minutes, hours) and |
| batch processing. |
| </li> |
| </ul> |
| |
| |
| <h2>Getting Started Checklist</h2> |
| <p> |
| Follow the steps below to implement the discussed architecture in practice: |
| </p> |
| <ul> |
| <li> |
| Download and install Apache Ignite in your system. |
| </li> |
| <li> |
| Select a list of operations/reports to be executed against Ignite. The best candidates are |
| operations that require low-latency response time, high-throughput, and real-time analytics. |
| </li> |
| <li> |
| Depending on the data volume and available memory space, consider using Ignite native |
| persistence. Alternatively, you can use Ignite as a pure in-memory cache or in-memory data grid |
| that persists changes to Hadoop or another external database. |
| </li> |
| <li> |
| Update your applications to ensure they use Ignite native APIs to process Ignite data and Spark |
| for federated queries. |
| </li> |
| <li> |
| If you need to replicate changes between Ignite and Hadoop clusters, consider using existing |
| change-data-capture solutions like Debezium, Kafka, GridGain Data Lake Accelerator, Oracle GoldenGate |
| or others. If you'd like Ignite to write-through changes to Hadoop directly, then implement |
| <a href="https://apacheignite.readme.io/docs/3rd-party-store" target="_blank">Ignite's CacheStore</a> |
| interface. |
| </li> |
| </ul> |
| |
| |
| <div class="jumbotron jumbotron-fluid"> |
| <div class="container"> |
| <div class="title display-6">Learn More</div> |
| <hr class="my-4"> |
| <div class="row"> |
| <div class="col-sm-6"> |
| <ul> |
| <li> |
| <a href="/features/sql.html"> |
| Distributed SQL <i class="fas fa-angle-double-right"></i> |
| </a> |
| </li> |
| <li> |
| <a href="/features/collocated-processing.html"> |
| Co-located Processing <i class="fas fa-angle-double-right"></i> |
| </a> |
| </li> |
| <li><a href="/features/acid-transactions.html"> |
| ACID Transactions <i class="fas fa-angle-double-right"></i> |
| </a></li> |
| <li><a href="/arch/native-persistence.html"> |
| Native Persistence <i class="fas fa-angle-double-right"></i> |
| </a></li> |
| </ul> |
| </div> |
| <div class="col-sm-6"> |
| <ul> |
| <li> |
| <a href="/features/machinelearning.html"> |
| Machine and Deep Learning <i class="fas fa-angle-double-right"></i> |
| </a> |
| </li> |
| <li> |
| <a href="/use-cases/in-memory-data-grid.html"> |
| Ignite as an In-Memory Data Grid <i class="fas fa-angle-double-right"></i> |
| </a> |
| </li> |
| <li><a href="/use-cases/in-memory-database.html"> |
| Ignite as an In-Memory Database <i class="fas fa-angle-double-right"></i> |
| </a></li> |
| <li><a href="/use-cases/digital-integration-hub.html"> |
| Ignite as a Digital Integration Hub <i class="fas fa-angle-double-right"></i> |
| </a></li> |
| </ul> |
| </div> |
| </div> |
| </div> |
| </div> |
| |
| </div> |
| </article> |
| |
| <!--#include virtual="/includes/footer.html" --> |
| <!--#include virtual="/includes/scripts.html" --> |
| </body> |
| </html> |