blob: d575b38e48d5de905c24c082458fe60681542e35 [file] [log] [blame]
<!--
▄▄▄ ██▓███ ▄▄▄ ▄████▄ ██░ ██ ▓█████ ██▓ ▄████ ███▄ █ ██▓▄▄▄█████▓▓█████
▒████▄ ▓██░ ██▒▒████▄ ▒██▀ ▀█ ▓██░ ██▒▓█ ▀ ▓██▒ ██▒ ▀█▒ ██ ▀█ █ ▓██▒▓ ██▒ ▓▒▓█ ▀
▒██ ▀█▄ ▓██░ ██▓▒▒██ ▀█▄ ▒▓█ ▄ ▒██▀▀██░▒███ ▒██▒▒██░▄▄▄░▓██ ▀█ ██▒▒██▒▒ ▓██░ ▒░▒███
░██▄▄▄▄██ ▒██▄█▓▒ ▒░██▄▄▄▄██ ▒▓▓▄ ▄██▒░▓█ ░██ ▒▓█ ▄ ░██░░▓█ ██▓▓██▒ ▐▌██▒░██░░ ▓██▓ ░ ▒▓█ ▄
▓█ ▓██▒▒██▒ ░ ░ ▓█ ▓██▒▒ ▓███▀ ░░▓█▒░██▓░▒████▒ ░██░░▒▓███▀▒▒██░ ▓██░░██░ ▒██▒ ░ ░▒████▒
▒▒ ▓▒█░▒▓▒░ ░ ░ ▒▒ ▓▒█░░ ░▒ ▒ ░ ▒ ░░▒░▒░░ ▒░ ░ ░▓ ░▒ ▒ ░ ▒░ ▒ ▒ ░▓ ▒ ░░ ░░ ▒░ ░
▒ ▒▒ ░░▒ ░ ▒ ▒▒ ░ ░ ▒ ▒ ░▒░ ░ ░ ░ ░ ▒ ░ ░ ░ ░ ░░ ░ ▒░ ▒ ░ ░ ░ ░ ░
░ ▒ ░░ ░ ▒ ░ ░ ░░ ░ ░ ▒ ░░ ░ ░ ░ ░ ░ ▒ ░ ░ ░
░ ░ ░ ░░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
-->
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE html>
<html>
<head>
<link rel="canonical" href="https://ignite.apache.org/features/igniterdd.html" />
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="Cache-Control" content="no-cache, no-store, must-revalidate" />
<meta http-equiv="Pragma" content="no-cache" />
<meta http-equiv="Expires" content="0" />
<title>Apache Spark Shared RDDs - Apache Ignite</title>
<link media="all" rel="stylesheet" href="/css/all.css?v=1514336028">
<link href="https://netdna.bootstrapcdn.com/font-awesome/4.2.0/css/font-awesome.css" rel="stylesheet">
<link href='https://fonts.googleapis.com/css?family=Open+Sans:400,300,300italic,400italic,600,600italic,700,700italic,800,800italic' rel='stylesheet' type='text/css'>
<!--#include virtual="/includes/sh.html" -->
</head>
<body>
<div id="wrapper">
<!--#include virtual="/includes/header.html" -->
<main id="main" role="main" class="container">
<section id="igniterdd" class="page-section">
<h1 class="first">Shared Apache Spark RDDs</h1>
<div class="col-sm-12 col-md-12 col-xs-12" style="padding:0 0 10px 0;">
<div class="col-sm-6 col-md-6 col-xs-12" style="padding-left:0; padding-right:0">
<p>
Apache Ignite provides an implementation of Spark RDD abstraction which allows to easily
share state in memory across multiple Spark jobs, either within the same application
or between different Spark applications.
</p>
<p>
<code>IgniteRDD</code> is implemented is as a view over a distributed Ignite cache,
which may be deployed either within the Spark job executing process, or on a Spark worker,
or in its own cluster.
</p>
<p>
Depending on the pre-configured deployment mode, the shared state may either exist only
during the lifespan of a Spark application (<code>embedded mode</code>), or it may out-survive
the Spark application (<code>standalone mode</code>), in which case the state can be shared across
multiple Spark applications.
</p>
</div>
<div class="col-sm-6 col-md-6 col-xs-12" style="padding-right:0">
<img class="img-responsive" src="/images/spark_integration.png" width="440px" style="float:right;"/>
</div>
</div>
<div class="code-examples"><br/>
<div class="page-heading">Code Examples:</div>
<!-- Nav tabs -->
<ul id="ignite-rdd-examples" class="nav nav-tabs">
<li class="active"><a href="#rdd-transform" aria-controls="home" data-toggle="tab">Transformations</a></li>
<li><a href="#rdd-sql" aria-controls="profile" data-toggle="tab">SQL Queries</a></li>
</ul>
<!-- Tab panes -->
<div class="tab-content">
<div class="tab-pane active" id="rdd-transform">
<pre class="brush:java">
val sharedRdd = igniteContext.fromCache("partitioned")
// Store pairs of integers from 1 to 10000 into in-memory cache
// named "partitioned" using 10 parallel store operations.
sharedRdd.savePairs(sparkContext.parallelize(1 to 10000, 10).map(i => (i, i)))
</pre>
</div>
<div class="tab-pane" id="rdd-sql">
<pre class="brush:java">
val sharedRdd = igniteContext.fromCache("partitioned")
val result = sharedRdd.sql(
"select _val from Integer where val > ? and val < ?", 10, 100)
</pre>
</div>
</div>
</div>
</section>
<section id="key-features" class="page-section">
<h2>IgniteRDD Features</h2>
<table class="formatted" name="IgniteRDD Features">
<thead>
<tr>
<th width="35%" class="left">Feature</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td class="left">Shared Spark RDDs</td>
<td>
<p>
<code>IgniteRDD</code> is an implementation of native Spark RDD
and DataFrame APIs which, in addition to all the standard RDD
functionality, also shares the state of the RDD across other
Spark jobs, applications and workers.
</p>
<div class="page-links">
<a href="http://apacheignite-fs.readme.io/docs/ignite-for-spark" target="docs">Docs for this Feature <i class="fa fa-angle-double-right"></i></a>
</div>
</td>
</tr>
<tr>
<td class="left">Faster SQL</td>
<td>
<p>
Spark does not support SQL indexes, while Ignite does. Because of advanced in-memory
indexing capabilities, IgniteRDD allows to execute SQL queries 100s of times faster
than Spark native RDDs or Data Frames.
</p>
<div class="page-links">
<a href="http://apacheignite-fs.readme.io/docs/ignitecontext-igniterdd#section-running-sql-queries-against-ignite-cache" target="docs">Docs for this Feature <i class="fa fa-angle-double-right"></i></a>
</div>
</td>
</tr>
</tbody>
</table>
</section>
</main>
<!--#include virtual="/includes/footer.html" -->
</div>
<!--#include virtual="/includes/scripts.html" -->
</body>
</html>