_src/use-cases/spark-acceleration.pug - ignite-website - Git at Google

 extend ../_components/base.pug

 block pagetitle
     title Apache Spark Performance Acceleration - Distributed Cache, In-Memory Computing
     meta(name="description", content="Ignite integrates with Apache Spark to accelerate the performance of Spark applications and APIs by keeping data in a shared in-memory cluster.")
     link(rel="canonical", href="https://ignite.apache.org/use-cases/spark-acceleration.html")

     meta(property="og:title", content="Apache Spark Performance Acceleration - Distributed Cache, In-Memory Computing")
     meta(property="og:type", content="article")
     meta(property="og:url", content="https://ignite.apache.org/use-cases/spark-acceleration.html")
     meta(property="og:image", content="/img/og-pic.png")
     meta(property="og:description", content="Ignite integrates with Apache Spark to accelerate the performance of Spark applications and APIs by keeping data in a shared in-memory cluster.")

 block css
     link(rel="stylesheet", href="../css/native-persistence.css?ver=" + config.version)
     link(rel="stylesheet", href="../css/compute-apis.css?ver=" + config.version)
     link(rel="stylesheet", href="../css/digital-hub.css?ver=" + config.version)
     link(rel="stylesheet", href="../css/spark.css?ver=" + config.version)


 block main
     - global.pageHref = "usecases"
     - config.hdrClassName = "hdr__blue"
     include ../_components/header.pug


     section.innerhero
         .container.innerhero__cont
             .innerhero__main

                 h1.h1.innerhero__h1 Accelerate Apache Spark Applications
                   <br>
                   span.with-apache  With Apache Ignite
                 .innerhero__descr.pt-2.h5.
                     Minimize data shuffling over the network with the Apache<br> Ignite implementation of RDD and Dataframe APIs
                 .innerhero__action
                     a.button.innerhero__button(href="https://ignite.apache.org/docs/latest/index") Start Coding
             img.innerhero__pic.innerhero__pic--spark(src="/img/usecases/spark/hero-image.svg", alt="hero-image")
     // /.innerhero

     section.spark1
         .container
            h2.h5.spark1__h2 How Does Ignite Accelerate Spark Applications?
            .spark1__block
               .spark1__item
                 h3.spark1__h3.fz20 Horizontally scalable and shared in-memory layer
                 p.spark1__text Ignite is designed to store data sets in memory across a cluster of nodes, reducing latency of Spark operations that usually need to pull date from disk-based systems.
               .spark1__item
                 h3.spark1__h3.fz20 Minimal data shuffling over the network
                 p.spark1__text Ignite tries to minimize data shuffling over the network between its store and Spark applications by running certain Spark tasks, produced by RDDs or DataFrames APIs in-place on Ignite nodes.
               .spark1__item
                 h3.spark1__h3.fz20 Extra performance boost with native Ignite APIs
                 p.spark1__text Use native Ignite APIs such as SQL from Spark applications directly and eliminate data shuffling completely between Spark and Ignite.
            img.spark1__image(src="/img/usecases/spark/image.svg", alt="image")
     // /.spark1

     section.spark2
         .container
             h2.h5.spark2__h2 Ignite Shared RDDs
             .spark2__block
                .spark2__item
                   p.spark2__text Apache Ignite provides an implementation of the Spark RDD, which allows any data and state to be shared in memory as RDDs across Spark jobs.
                   p.spark2__text.pt-2 The Ignite RDD provides a shared, mutable view of the data stored in Ignite caches across different Spark jobs, workers, or applications.
                .spark2__item
                   p.spark2__text The Ignite RDD is implemented as a view over a distributed Ignite table (aka. cache). It can be deployed with an Ignite node either within the Spark job executing process, on a Spark worker, or in a separate Ignite cluster.
                   p.spark2__text.pt-2 This means that depending on the chosen deployment mode, the shared state may either exist only during the lifespan of a Spark application (embedded mode), or it may out-survive the Spark application (standalone mode).
             h2.h5.spark2__h2 Ignite DataFrames
             .spark2__block
                .spark2__item
                   p.spark2__text The Apache Spark DataFrame API introduced the concept of a schema to describe the data, allowing Spark to manage the schema and organize the data into a tabular format.
                   p.spark2__text.pt-2 To put it simply, a DataFrame is a distributed collection of data organized into named columns. Conceptually, it is the equivalent of a table in a relational database. It allows Spark to leverage the Catalyst query optimizer to produce much more efficient query execution plans in comparison to RDDs, which are collections of elements partitioned across the nodes of the cluster.
                .spark2__item
                   p.spark2__text Ignite supports DataFrame APIs allowing Spark to write to and read from Ignite through that interface.
                   p.spark2__text.pt-1 Furthermore, Ignite analyzes execution plans produced by Spark's Catalyst engine and can execute parts of the plan on Ignite nodes directly, which will reduce data shuffling and consequently make your SparkSQL perform better.


     section.native-bottom.container
         .native-bottom__grid
             article.nativebotblock
                 .h4.nativebotblock__title
                     img(src="/img/features/native-rocket.svg", alt="").nativebotblock__icon
                     span Ready to Start?
                 p.nativebotblock__text Discover our quick start guide and build your first application in 5-10 minutes
                 a.nativebotblock__link.arrowlink(href="https://ignite.apache.org/docs/latest/", target="_blank") Quick Start Guide
             article.nativebotblock.nativebotblock--learn
                 .h4.nativebotblock__title
                     img(src="/img/features/native-docs.svg", alt="").nativebotblock__icon
                     span Want to Learn More?
                 p.nativebotblock__text Using Hadoop with Spark? See how Ignite accelerates Hadoop-based deployments
                 a.nativebotblock__link.arrowlink(href="/use-cases/hadoop-acceleration.html") Apache Hadoop Acceleration Article
	extend ../_components/base.pug

	block pagetitle
	title Apache Spark Performance Acceleration - Distributed Cache, In-Memory Computing
	meta(name="description", content="Ignite integrates with Apache Spark to accelerate the performance of Spark applications and APIs by keeping data in a shared in-memory cluster.")
	link(rel="canonical", href="https://ignite.apache.org/use-cases/spark-acceleration.html")

	meta(property="og:title", content="Apache Spark Performance Acceleration - Distributed Cache, In-Memory Computing")
	meta(property="og:type", content="article")
	meta(property="og:url", content="https://ignite.apache.org/use-cases/spark-acceleration.html")
	meta(property="og:image", content="/img/og-pic.png")
	meta(property="og:description", content="Ignite integrates with Apache Spark to accelerate the performance of Spark applications and APIs by keeping data in a shared in-memory cluster.")

	block css
	link(rel="stylesheet", href="../css/native-persistence.css?ver=" + config.version)
	link(rel="stylesheet", href="../css/compute-apis.css?ver=" + config.version)
	link(rel="stylesheet", href="../css/digital-hub.css?ver=" + config.version)
	link(rel="stylesheet", href="../css/spark.css?ver=" + config.version)





	block main
	- global.pageHref = "usecases"
	- config.hdrClassName = "hdr__blue"
	include ../_components/header.pug


	section.innerhero
	.container.innerhero__cont
	.innerhero__main

	h1.h1.innerhero__h1 Accelerate Apache Spark Applications
	<br>
	span.with-apache With Apache Ignite
	.innerhero__descr.pt-2.h5.
	Minimize data shuffling over the network with the Apache<br> Ignite implementation of RDD and Dataframe APIs
	.innerhero__action
	a.button.innerhero__button(href="https://ignite.apache.org/docs/latest/index") Start Coding
	img.innerhero__pic.innerhero__pic--spark(src="/img/usecases/spark/hero-image.svg", alt="hero-image")
	// /.innerhero

	section.spark1
	.container
	h2.h5.spark1__h2 How Does Ignite Accelerate Spark Applications?
	.spark1__block
	.spark1__item
	h3.spark1__h3.fz20 Horizontally scalable and shared in-memory layer
	p.spark1__text Ignite is designed to store data sets in memory across a cluster of nodes, reducing latency of Spark operations that usually need to pull date from disk-based systems.
	.spark1__item
	h3.spark1__h3.fz20 Minimal data shuffling over the network
	p.spark1__text Ignite tries to minimize data shuffling over the network between its store and Spark applications by running certain Spark tasks, produced by RDDs or DataFrames APIs in-place on Ignite nodes.
	.spark1__item
	h3.spark1__h3.fz20 Extra performance boost with native Ignite APIs
	p.spark1__text Use native Ignite APIs such as SQL from Spark applications directly and eliminate data shuffling completely between Spark and Ignite.
	img.spark1__image(src="/img/usecases/spark/image.svg", alt="image")
	// /.spark1

	section.spark2
	.container
	h2.h5.spark2__h2 Ignite Shared RDDs
	.spark2__block
	.spark2__item
	p.spark2__text Apache Ignite provides an implementation of the Spark RDD, which allows any data and state to be shared in memory as RDDs across Spark jobs.
	p.spark2__text.pt-2 The Ignite RDD provides a shared, mutable view of the data stored in Ignite caches across different Spark jobs, workers, or applications.
	.spark2__item
	p.spark2__text The Ignite RDD is implemented as a view over a distributed Ignite table (aka. cache). It can be deployed with an Ignite node either within the Spark job executing process, on a Spark worker, or in a separate Ignite cluster.
	p.spark2__text.pt-2 This means that depending on the chosen deployment mode, the shared state may either exist only during the lifespan of a Spark application (embedded mode), or it may out-survive the Spark application (standalone mode).
	h2.h5.spark2__h2 Ignite DataFrames
	.spark2__block
	.spark2__item
	p.spark2__text The Apache Spark DataFrame API introduced the concept of a schema to describe the data, allowing Spark to manage the schema and organize the data into a tabular format.
	p.spark2__text.pt-2 To put it simply, a DataFrame is a distributed collection of data organized into named columns. Conceptually, it is the equivalent of a table in a relational database. It allows Spark to leverage the Catalyst query optimizer to produce much more efficient query execution plans in comparison to RDDs, which are collections of elements partitioned across the nodes of the cluster.
	.spark2__item
	p.spark2__text Ignite supports DataFrame APIs allowing Spark to write to and read from Ignite through that interface.
	p.spark2__text.pt-1 Furthermore, Ignite analyzes execution plans produced by Spark's Catalyst engine and can execute parts of the plan on Ignite nodes directly, which will reduce data shuffling and consequently make your SparkSQL perform better.









	section.native-bottom.container
	.native-bottom__grid
	article.nativebotblock
	.h4.nativebotblock__title
	img(src="/img/features/native-rocket.svg", alt="").nativebotblock__icon
	span Ready to Start?
	p.nativebotblock__text Discover our quick start guide and build your first application in 5-10 minutes
	a.nativebotblock__link.arrowlink(href="https://ignite.apache.org/docs/latest/", target="_blank") Quick Start Guide
	article.nativebotblock.nativebotblock--learn
	.h4.nativebotblock__title
	img(src="/img/features/native-docs.svg", alt="").nativebotblock__icon
	span Want to Learn More?
	p.nativebotblock__text Using Hadoop with Spark? See how Ignite accelerates Hadoop-based deployments
	a.nativebotblock__link.arrowlink(href="/use-cases/hadoop-acceleration.html") Apache Hadoop Acceleration Article