<!--
 ▄▄▄       ██▓███   ▄▄▄       ▄████▄   ██░ ██ ▓█████     ██▓  ▄████  ███▄    █  ██▓▄▄▄█████▓▓█████
▒████▄    ▓██░  ██▒▒████▄    ▒██▀ ▀█  ▓██░ ██▒▓█   ▀    ▓██▒ ██▒ ▀█▒ ██ ▀█   █ ▓██▒▓  ██▒ ▓▒▓█   ▀
▒██  ▀█▄  ▓██░ ██▓▒▒██  ▀█▄  ▒▓█    ▄ ▒██▀▀██░▒███      ▒██▒▒██░▄▄▄░▓██  ▀█ ██▒▒██▒▒ ▓██░ ▒░▒███
░██▄▄▄▄██ ▒██▄█▓▒ ▒░██▄▄▄▄██ ▒▓▓▄ ▄██▒░▓█ ░██ ▒▓█  ▄    ░██░░▓█  ██▓▓██▒  ▐▌██▒░██░░ ▓██▓ ░ ▒▓█  ▄
 ▓█   ▓██▒▒██▒ ░  ░ ▓█   ▓██▒▒ ▓███▀ ░░▓█▒░██▓░▒████▒   ░██░░▒▓███▀▒▒██░   ▓██░░██░  ▒██▒ ░ ░▒████▒
 ▒▒   ▓▒█░▒▓▒░ ░  ░ ▒▒   ▓▒█░░ ░▒ ▒  ░ ▒ ░░▒░▒░░ ▒░ ░   ░▓   ░▒   ▒ ░ ▒░   ▒ ▒ ░▓    ▒ ░░   ░░ ▒░ ░
  ▒   ▒▒ ░░▒ ░       ▒   ▒▒ ░  ░  ▒    ▒ ░▒░ ░ ░ ░  ░    ▒ ░  ░   ░ ░ ░░   ░ ▒░ ▒ ░    ░     ░ ░  ░
  ░   ▒   ░░         ░   ▒   ░         ░  ░░ ░   ░       ▒ ░░ ░   ░    ░   ░ ░  ▒ ░  ░         ░
      ░  ░               ░  ░░ ░       ░  ░  ░   ░  ░    ░        ░          ░  ░              ░  ░
-->

<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements.  See the NOTICE file
distributed with this work for additional information
regarding copyright ownership.  The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License.  You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied.  See the License for the
specific language governing permissions and limitations
under the License.
-->

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">

    <link rel="canonical" href="https://ignite.apache.org/use-cases/spark-acceleration.html"/>
    <!--#include virtual="/includes/scriptshead.html" -->

    <meta name="viewport" content="width=device-width, initial-scale=1.0">

    <meta name="description"
          content="Apache Ignite integrates with Apache Spark to accelerate the performance of Spark applications
          and APIs by keeping data in a shared in-memory cluster."/>

    <title>Apache Spark Performance Acceleration</title>

    <!--#include virtual="/includes/styles.html" -->

    
</head>
<body>
<!--#include virtual="/includes/header.html" -->
<article>
    <header>
        <div class="container">
            <h1>Apache Spark <strong>Performance Acceleration</strong></h1>
        </div>
    </header>
    <div class="container">
        <p>
            The performance of Apache Spark® applications can be accelerated by keeping data in a shared
            Apache Ignite® in-memory cluster. Spark works with Ignite as a data source similar to how it uses Hadoop or a
            relational database. You can start an Ignite cluster, set it as a data source for Spark workers, and
            continue using Spark RDDs or DataFrames APIs. You can gain even more speed by running Ignite SQL or
            compute APIs directly on the Spark dataset. Ignite can also be used as a distributed in-memory layer by Spark
            workers that need to share both data and state.
        </p>
        <img class="img-fluid diagram-right" alt="Apache Spark Performance Acceleration" src="/images/svg-diagrams/spark_acceleration.svg" width="555" height="600" />


        <p>
            The performance increase is achievable for several reasons. First, Ignite is designed to store data sets
            in memory across a cluster of nodes reducing latency of Spark operations that usually need to pull date
            from disk-based systems. Second, Ignite tries to minimize data shuffling over the network between its
            store and Spark applications by running certain Spark tasks, produced by RDDs or DataFrames APIs,
            in-place on Ignite nodes. This optimization helps to reduce the effect of network latency on the
            performance of Spark calls. Finally, the network impact can be further reduced if the native
            Ignite APIs, such as SQL, are called from Spark applications directly. By doing so, you can eliminate
            data shuffling between Spark and Ignite as long as Ignite SQL queries are always executed on
            Ignite nodes returning a much smaller final result set to the application layer.
        </p>

        <h2>Ignite Shared RDDs</h2>
        <p>
            Apache Ignite provides an implementation of the Spark RDD, which allows any data and state to be shared
            in memory as RDDs across Spark jobs. The Ignite RDD provides a shared, mutable view of the data stored
            in Ignite caches across different Spark jobs, workers, or applications.
        </p>

        <p>
            The Ignite RDD is implemented as a view over a distributed Ignite table (aka. cache). It can be deployed
            with an Ignite node either within the Spark job executing process, on a Spark worker, or in a separate
            Ignite cluster. This means that depending on the chosen deployment mode, the shared state may either
            exist only during the lifespan of a Spark application (embedded mode), or it may out-survive the Spark
            application (standalone mode).
        </p>

        <h2>Ignite DataFrames</h2>
        <p>
            The Apache Spark DataFrame API introduced the concept of a schema to describe the data,
            allowing Spark to manage the schema and organize the data into a tabular format. To put it simply,
            a DataFrame is a distributed collection of data organized into named columns. It is conceptually
            equivalent to a table in a relational database and allows Spark to leverage the Catalyst query
            optimizer to produce much more efficient query execution plans in comparison to RDDs, which are
            collections of elements partitioned across the nodes of the cluster.
        </p>
        <p>
            Ignite supports DataFrame APIs allowing Spark to write to and read from Ignite through that interface.
            Furthermore, Ignite analyses execution plans produced by Spark's Catalyst engine and can execute
            parts of the plan on Ignite nodes directly, which will reduce data shuffling and consequently make your
            SparkSQL perform better.
        </p>


        <div class="jumbotron jumbotron-fluid">
            <div class="container">
                <div class="title display-6">Learn More</div>
                <hr class="my-4">
                <div class="row">
                    <div class="col-sm-6">
                        <ul>
                            <li>
                                <a href="/docs/latest/extensions-and-integrations/ignite-for-spark/installation">
                                    Ignite and Spark Installation and Deployment <i
                                        class="fa fa-angle-double-right"></i>
                                </a>
                            </li>
                            <li>
                                <a href="/docs/latest/extensions-and-integrations/ignite-for-spark/ignitecontext-and-rdd">
                                    Ignite RDDs in Details <i class="fas fa-angle-double-right"></i>
                                </a>
                            </li>
                        </ul>
                    </div>
                    <div class="col-sm-6">
                        <ul>
                            <li>
                                <a href="/docs/latest/extensions-and-integrations/ignite-for-spark/ignite-dataframe">
                                    Ignite DataFrames in Details <i class="fas fa-angle-double-right"></i>
                                </a>
                            </li>
                            <li>

                                <a href="/use-cases/digital-integration-hub.html">
                                    Ignite as a Digital Integration Hub <i class="fas fa-angle-double-right"></i>
                                </a>

                            </li>
                        </ul>
                    </div>
                </div>
            </div>
        </div>

    </div>

</article>
<!--#include virtual="/includes/footer.html" -->
<!--#include virtual="/includes/scripts.html" -->
</body>
</html>
