index.md

layout: global title: Home custom_title: Apache Spark™ - Unified Analytics Engine for Big Data description: Apache Spark is a unified analytics engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing. type: page navigation: weight: 1 show: true

<p class="lead">
  Run workloads 100x faster.
</p>

<p>
  Apache Spark achieves high performance for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine.
</p>

<p class="lead">
  Write applications quickly in Java, Scala, Python, R, and SQL.
</p>

<p>
  Spark offers over 80 high-level operators that make it easy to build parallel apps.
  And you can use it <em>interactively</em>
  from the Scala, Python, R, and SQL shells.
</p>

<p class="lead">
  Combine SQL, streaming, and complex analytics.
</p>

<p>
  Spark powers a stack of libraries including
  <a href="{{site.baseurl}}/sql/">SQL and DataFrames</a>, <a href="{{site.baseurl}}/mllib/">MLlib</a> for machine learning,
  <a href="{{site.baseurl}}/graphx/">GraphX</a>, and <a href="{{site.baseurl}}/streaming/">Spark Streaming</a>.
  You can combine these libraries seamlessly in the same application.
</p>

<p class="lead">
  Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud. It can access diverse data sources.
</p>

<p>
  You can run Spark using its <a href="{{site.baseurl}}/docs/latest/spark-standalone.html">standalone cluster mode</a>, 
  on <a href="https://github.com/amplab/spark-ec2">EC2</a>, 
  on <a href="https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html">Hadoop YARN</a>, 
  on <a href="https://mesos.apache.org">Mesos</a>, or 
  on <a href="https://kubernetes.io/">Kubernetes</a>.
  Access data in <a href="https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html">HDFS</a>, 
  <a href="https://www.alluxio.org/">Alluxio</a>,
  <a href="https://cassandra.apache.org">Apache Cassandra</a>, 
  <a href="https://hbase.apache.org">Apache HBase</a>,
  <a href="https://hive.apache.org">Apache Hive</a>, 
  and hundreds of other data sources.
</p>

<p>
  Spark is used at a wide range of organizations to process large datasets.
  You can find many example use cases on the
  <a href="{{site.baseurl}}/powered-by.html">Powered By</a> page.
</p>

<p>
  There are many ways to reach the community:
</p>
<ul class="list-narrow">
  <li>Use the <a href="{{site.baseurl}}/community.html#mailing-lists">mailing lists</a> to ask questions.</li>
  <li>In-person events include numerous <a href="{{site.baseurl}}/community.html#events">meetup groups and conferences</a>.</li>
  <li>We use <a href="https://issues.apache.org/jira/browse/SPARK">JIRA</a> for issue tracking.</li>
</ul>

<p>
  Apache Spark is built by a wide set of developers from over 300 companies.
  Since 2009, more than 1200 developers have contributed to Spark!
</p>

<p>
  The project's
  <a href="{{site.baseurl}}/committers.html">committers</a>
  come from more than 25 organizations.
</p>

<p>
  If you'd like to participate in Spark, or contribute to the libraries on top of it, learn
  <a href="{{site.baseurl}}/contributing.html">how to contribute</a>.
</p>

<p>Learning Apache Spark is easy whether you come from a Java, Scala, Python, R, or SQL background:</p>
<ul class="list-narrow">
  <li><a href="{{site.baseurl}}/downloads.html">Download</a> the latest release: you can run Spark locally on your laptop.</li>
  <li>Read the <a href="{{site.baseurl}}/docs/latest/quick-start.html">quick start guide</a>.</li>
  <li>Learn how to <a href="{{site.baseurl}}/docs/latest/#launching-on-a-cluster">deploy</a> Spark on a cluster.</li>
</ul>