| <!-- |
| ▄▄▄ ██▓███ ▄▄▄ ▄████▄ ██░ ██ ▓█████ ██▓ ▄████ ███▄ █ ██▓▄▄▄█████▓▓█████ |
| ▒████▄ ▓██░ ██▒▒████▄ ▒██▀ ▀█ ▓██░ ██▒▓█ ▀ ▓██▒ ██▒ ▀█▒ ██ ▀█ █ ▓██▒▓ ██▒ ▓▒▓█ ▀ |
| ▒██ ▀█▄ ▓██░ ██▓▒▒██ ▀█▄ ▒▓█ ▄ ▒██▀▀██░▒███ ▒██▒▒██░▄▄▄░▓██ ▀█ ██▒▒██▒▒ ▓██░ ▒░▒███ |
| ░██▄▄▄▄██ ▒██▄█▓▒ ▒░██▄▄▄▄██ ▒▓▓▄ ▄██▒░▓█ ░██ ▒▓█ ▄ ░██░░▓█ ██▓▓██▒ ▐▌██▒░██░░ ▓██▓ ░ ▒▓█ ▄ |
| ▓█ ▓██▒▒██▒ ░ ░ ▓█ ▓██▒▒ ▓███▀ ░░▓█▒░██▓░▒████▒ ░██░░▒▓███▀▒▒██░ ▓██░░██░ ▒██▒ ░ ░▒████▒ |
| ▒▒ ▓▒█░▒▓▒░ ░ ░ ▒▒ ▓▒█░░ ░▒ ▒ ░ ▒ ░░▒░▒░░ ▒░ ░ ░▓ ░▒ ▒ ░ ▒░ ▒ ▒ ░▓ ▒ ░░ ░░ ▒░ ░ |
| ▒ ▒▒ ░░▒ ░ ▒ ▒▒ ░ ░ ▒ ▒ ░▒░ ░ ░ ░ ░ ▒ ░ ░ ░ ░ ░░ ░ ▒░ ▒ ░ ░ ░ ░ ░ |
| ░ ▒ ░░ ░ ▒ ░ ░ ░░ ░ ░ ▒ ░░ ░ ░ ░ ░ ░ ▒ ░ ░ ░ |
| ░ ░ ░ ░░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ |
| --> |
| |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| |
| <!DOCTYPE html> |
| <html> |
| <head> |
| <link rel="canonical" href="https://ignite.apache.org/use-cases/spark/shared-memory-layer.html" /> |
| <meta charset="utf-8"> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| <title>Apache Spark Shared Memory Layer - Apache Ignite</title> |
| <link media="all" rel="stylesheet" href="/css/all.css?v=1538416900"> |
| <link href="https://netdna.bootstrapcdn.com/font-awesome/4.2.0/css/font-awesome.css" rel="stylesheet"> |
| <link media="all" rel="stylesheet" href="/css/syntaxhighlighter.css"> |
| <link href='https://fonts.googleapis.com/css?family=Open+Sans:400,300,300italic,400italic,600,600italic,700,700italic,800,800italic' rel='stylesheet' type='text/css'> |
| |
| <!--#include virtual="/includes/sh.html" --> |
| </head> |
| <body> |
| <div id="wrapper"> |
| <!--#include virtual="/includes/header.html" --> |
| |
| <main id="main" role="main" class="container"> |
| <section id="shared-memory-layer" class="page-section"> |
| <h1 class="first">Shared Memory Layer for Apache Spark</h1> |
| <div class="col-sm-12 col-md-12 col-xs-12" style="padding:0 0 10px 0;"> |
| <div class="col-sm-6 col-md-6 col-xs-12" style="padding-left:0; padding-right:0"> |
| <p> |
| Apache Ignite is a distributed memory-centric database and caching platform that is |
| used by Apache Spark users to: |
| <ul class="page-list" style="margin-bottom: 20px;"> |
| <li> |
| Achieve true in-memory performance at scale and avoid data movement from a data source |
| to Spark workers and applications. |
| </li> |
| <li> |
| Boost DataFrame and SQL performance. |
| </li> |
| <li> |
| More easily share state and data among Spark jobs. |
| </li> |
| </ul> |
| </p> |
| </div> |
| |
| <div class="col-sm-6 col-md-6 col-xs-12" style="padding-right:0"> |
| <img class="img-responsive" src="/images/spark_integration.png" width="440px" style="float:right;"/> |
| </div> |
| </div> |
| |
| <div class="page-heading">Ignite Shared RDDs</div> |
| <p> |
| Apache Ignite provides an implementation of the Spark RDD which allows any data and state to be shared |
| in memory as RDDs across Spark jobs. The Ignite RDD provides a shared, mutable view of the same data |
| in-memory in Ignite across different Spark jobs, workers, or applications. Native Spark RDDs cannot be |
| shared across Spark jobs or applications. |
| </p> |
| |
| <p> |
| The way an IgniteRDD is implemented is as a view over a distributed Ignite table (aka. cache). |
| It can be deployed with an Ignite node either within the Spark job executing process, on a Spark worker, |
| or in a separate Ignite cluster. It means that depending on the chosen deployment mode the shared |
| state may either exist only during the lifespan of a Spark application (embedded mode), or it may |
| out-survive the Spark application (standalone mode). |
| </p> |
| <p> |
| While Apache SparkSQL supports a fairly rich SQL syntax, it doesn't implement any indexing. As a result, |
| Spark queries may take minutes even on moderately small data sets because they have to do full data |
| scans. With Ignite, Spark users can configure primary and secondary indexes that can bring up to 1000x |
| performance gains. |
| </p> |
| |
| <p> |
| <a href="https://apacheignite-fs.readme.io/docs/ignitecontext-igniterdd" target="docs"> |
| <b>Ignite RDDs in Details <i class="fa fa-angle-double-right"></i></b> |
| </a> |
| </p> |
| |
| <div class="page-heading">Ignite DataFrames</div> |
| <p> |
| The Apache Spark DataFrame API introduced the concept of a schema to describe the data, |
| allowing Spark to manage the schema and organize the data into a tabular format. To put it simply, |
| a DataFrame is a distributed collection of data organized into named columns. It is conceptually |
| equivalent to a table in a relational database and allows Spark to leverage the Catalyst query |
| optimizer to produce much more efficient query execution plans in comparison to RDDs, which are |
| just collections of elements partitioned across the nodes of the cluster. |
| </p> |
| <p> |
| Ignite expands DataFrame, simplifying development and improving data access times whenever |
| Ignite is used as memory-centric storage for Spark. Benefits include: |
| <ul class="page-list" style="margin-bottom: 20px;"> |
| <li> |
| Ability to share data and state across Spark jobs by writing and reading DataFrames to/from Ignite. |
| </li> |
| <li> |
| Faster SparkSQL queries by optimizing Spark query execution plans with Ignite SQL engine which |
| include advanced indexing and avoid data movement across the network from Ignite to Spark. |
| </li> |
| </ul> |
| </p> |
| <p> |
| <a href="https://apacheignite-fs.readme.io/docs/ignite-data-frame" target="docs"> |
| <b>Ignite DataFrames in Details <i class="fa fa-angle-double-right"></i></b> |
| </a> |
| </p> |
| </section> |
| </main> |
| |
| <!--#include virtual="/includes/footer.html" --> |
| </div> |
| <!--#include virtual="/includes/scripts.html" --> |
| </body> |
| </html> |