| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| <meta charset="utf-8"> |
| <meta http-equiv="X-UA-Compatible" content="IE=edge"> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| |
| <title> |
| Third-Party Projects | Apache Spark |
| |
| </title> |
| |
| |
| |
| |
| |
| <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/css/bootstrap.min.css" rel="stylesheet" |
| integrity="sha384-EVSTQN3/azprG1Anm3QDgpJLIm9Nao0Yz1ztcQTwFspd3yD65VohhpuuCOmLASjC" crossorigin="anonymous"> |
| <link rel="preconnect" href="https://fonts.googleapis.com"> |
| <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin> |
| <link href="https://fonts.googleapis.com/css2?family=DM+Sans:ital,wght@0,400;0,500;0,700;1,400;1,500;1,700&Courier+Prime:wght@400;700&display=swap" rel="stylesheet"> |
| <link href="/css/custom.css" rel="stylesheet"> |
| <!-- Code highlighter CSS --> |
| <link href="/css/pygments-default.css" rel="stylesheet"> |
| <link rel="icon" href="/favicon.ico" type="image/x-icon"> |
| </head> |
| <body class="global"> |
| <nav class="navbar navbar-expand-lg navbar-dark p-0 px-4" style="background: #1D6890;"> |
| <a class="navbar-brand" href="/"> |
| <img src="/images/spark-logo-rev.svg" alt="" width="141" height="72"> |
| </a> |
| <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarContent" |
| aria-controls="navbarContent" aria-expanded="false" aria-label="Toggle navigation"> |
| <span class="navbar-toggler-icon"></span> |
| </button> |
| <div class="collapse navbar-collapse col-md-12 col-lg-auto pt-4" id="navbarContent"> |
| |
| <ul class="navbar-nav me-auto"> |
| <li class="nav-item"> |
| <a class="nav-link active" aria-current="page" href="/downloads.html">Download</a> |
| </li> |
| <li class="nav-item dropdown"> |
| <a class="nav-link dropdown-toggle" href="#" id="libraries" role="button" data-bs-toggle="dropdown" |
| aria-expanded="false"> |
| Libraries |
| </a> |
| <ul class="dropdown-menu" aria-labelledby="libraries"> |
| <li><a class="dropdown-item" href="/sql/">SQL and DataFrames</a></li> |
| <li><a class="dropdown-item" href="/streaming/">Spark Streaming</a></li> |
| <li><a class="dropdown-item" href="/mllib/">MLlib (machine learning)</a></li> |
| <li><a class="dropdown-item" href="/graphx/">GraphX (graph)</a></li> |
| <li> |
| <hr class="dropdown-divider"> |
| </li> |
| <li><a class="dropdown-item" href="/third-party-projects.html">Third-Party Projects</a></li> |
| </ul> |
| </li> |
| <li class="nav-item dropdown"> |
| <a class="nav-link dropdown-toggle" href="#" id="documentation" role="button" data-bs-toggle="dropdown" |
| aria-expanded="false"> |
| Documentation |
| </a> |
| <ul class="dropdown-menu" aria-labelledby="documentation"> |
| <li><a class="dropdown-item" href="/docs/latest/">Latest Release (Spark 3.3.0)</a></li> |
| <li><a class="dropdown-item" href="/documentation.html">Older Versions and Other Resources</a></li> |
| <li><a class="dropdown-item" href="/faq.html">Frequently Asked Questions</a></li> |
| </ul> |
| </li> |
| <li class="nav-item"> |
| <a class="nav-link active" aria-current="page" href="/examples.html">Examples</a> |
| </li> |
| <li class="nav-item dropdown"> |
| <a class="nav-link dropdown-toggle" href="#" id="community" role="button" data-bs-toggle="dropdown" |
| aria-expanded="false"> |
| Community |
| </a> |
| <ul class="dropdown-menu" aria-labelledby="community"> |
| <li><a class="dropdown-item" href="/community.html">Mailing Lists & Resources</a></li> |
| <li><a class="dropdown-item" href="/contributing.html">Contributing to Spark</a></li> |
| <li><a class="dropdown-item" href="/improvement-proposals.html">Improvement Proposals (SPIP)</a> |
| </li> |
| <li><a class="dropdown-item" href="https://issues.apache.org/jira/browse/SPARK">Issue Tracker</a> |
| </li> |
| <li><a class="dropdown-item" href="/powered-by.html">Powered By</a></li> |
| <li><a class="dropdown-item" href="/committers.html">Project Committers</a></li> |
| <li><a class="dropdown-item" href="/history.html">Project History</a></li> |
| </ul> |
| </li> |
| <li class="nav-item dropdown"> |
| <a class="nav-link dropdown-toggle" href="#" id="developers" role="button" data-bs-toggle="dropdown" |
| aria-expanded="false"> |
| Developers |
| </a> |
| <ul class="dropdown-menu" aria-labelledby="developers"> |
| <li><a class="dropdown-item" href="/developer-tools.html">Useful Developer Tools</a></li> |
| <li><a class="dropdown-item" href="/versioning-policy.html">Versioning Policy</a></li> |
| <li><a class="dropdown-item" href="/release-process.html">Release Process</a></li> |
| <li><a class="dropdown-item" href="/security.html">Security</a></li> |
| </ul> |
| </li> |
| </ul> |
| <ul class="navbar-nav ml-auto"> |
| <li class="nav-item dropdown"> |
| <a class="nav-link dropdown-toggle" href="#" id="apacheFoundation" role="button" |
| data-bs-toggle="dropdown" aria-expanded="false"> |
| Apache Software Foundation |
| </a> |
| <ul class="dropdown-menu" aria-labelledby="apacheFoundation"> |
| <li><a class="dropdown-item" href="https://www.apache.org/">Apache Homepage</a></li> |
| <li><a class="dropdown-item" href="https://www.apache.org/licenses/">License</a></li> |
| <li><a class="dropdown-item" |
| href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li> |
| <li><a class="dropdown-item" href="https://www.apache.org/foundation/thanks.html">Thanks</a></li> |
| <li><a class="dropdown-item" href="https://www.apache.org/security/">Security</a></li> |
| <li><a class="dropdown-item" href="https://www.apache.org/events/current-event">Event</a></li> |
| </ul> |
| </li> |
| </ul> |
| </div> |
| </nav> |
| |
| <div class="container"> |
| <div class="row mt-4"> |
| <div class="col-12 col-md-9"> |
| <p>This page tracks external software projects that supplement Apache Spark and add to its ecosystem.</p> |
| |
| <p>To add a project, open a pull request against the <a href="https://github.com/apache/spark-website">spark-website</a> |
| repository. Add an entry to |
| <a href="https://github.com/apache/spark-website/blob/asf-site/third-party-projects.md">this markdown file</a>, |
| then run <code class="language-plaintext highlighter-rouge">jekyll build</code> to generate the HTML too. Include |
| both in your pull request. See the README in this repo for more information.</p> |
| |
| <p>Note that all project and product names should follow <a href="/trademarks.html">trademark guidelines</a>.</p> |
| |
| <h2>spark-packages.org</h2> |
| |
| <p><a href="https://spark-packages.org/">spark-packages.org</a> is an external, |
| community-managed list of third-party libraries, add-ons, and applications that work with |
| Apache Spark. You can add a package as long as you have a GitHub repository.</p> |
| |
| <h2>Infrastructure projects</h2> |
| |
| <ul> |
| <li><a href="https://github.com/spark-jobserver/spark-jobserver">REST Job Server for Apache Spark</a> - |
| REST interface for managing and submitting Spark jobs on the same cluster.</li> |
| <li><a href="http://mlbase.org/">MLbase</a> - Machine Learning research project on top of Spark</li> |
| <li><a href="https://mesos.apache.org/">Apache Mesos</a> - Cluster management system that supports |
| running Spark</li> |
| <li><a href="https://www.alluxio.org/">Alluxio</a> (née Tachyon) - Memory speed virtual distributed |
| storage system that supports running Spark</li> |
| <li><a href="https://github.com/filodb/FiloDB">FiloDB</a> - a Spark integrated analytical/columnar |
| database, with in-memory option capable of sub-second concurrent queries</li> |
| <li><a href="http://zeppelin-project.org/">Zeppelin</a> - Multi-purpose notebook which supports 20+ language backends, |
| including Apache Spark</li> |
| <li><a href="https://github.com/EclairJS/eclairjs-node">EclairJS</a> - enables Node.js developers to code |
| against Spark, and data scientists to use Javascript in Jupyter notebooks.</li> |
| <li><a href="https://github.com/Hydrospheredata/mist">Mist</a> - Serverless proxy for Spark cluster (spark middleware)</li> |
| <li><a href="https://github.com/GoogleCloudPlatform/spark-on-k8s-operator">K8S Operator for Apache Spark</a> - Kubernetes operator for specifying and managing the lifecycle of Apache Spark applications on Kubernetes.</li> |
| <li><a href="https://developer.ibm.com/storage/products/ibm-spectrum-conductor-spark/">IBM Spectrum Conductor</a> - Cluster management software that integrates with Spark and modern computing frameworks.</li> |
| <li><a href="https://delta.io">Delta Lake</a> - Storage layer that provides ACID transactions and scalable metadata handling for Apache Spark workloads.</li> |
| <li><a href="https://mlflow.org">MLflow</a> - Open source platform to manage the machine learning lifecycle, including deploying models from diverse machine learning libraries on Apache Spark.</li> |
| <li><a href="https://github.com/databricks/koalas">Koalas</a> - Data frame API on Apache Spark that more closely follows Python’s pandas.</li> |
| <li><a href="https://datafu.apache.org/docs/spark/getting-started.html">Apache DataFu</a> - A collection of utils and user-defined-functions for working with large scale data in Apache Spark, as well as making Scala-Python interoperability easier.</li> |
| </ul> |
| |
| <h2>Applications using Spark</h2> |
| |
| <ul> |
| <li><a href="https://mahout.apache.org/">Apache Mahout</a> - Previously on Hadoop MapReduce, |
| Mahout has switched to using Spark as the backend</li> |
| <li><a href="https://wiki.apache.org/mrql/">Apache MRQL</a> - A query processing and optimization |
| system for large-scale, distributed data analysis, built on top of Apache Hadoop, Hama, and Spark</li> |
| <li><a href="https://github.com/sameeragarwal/blinkdb">BlinkDB</a> - a massively parallel, approximate query engine built |
| on top of Shark and Spark</li> |
| <li><a href="https://github.com/adobe-research/spindle">Spindle</a> - Spark/Parquet-based web |
| analytics query engine</li> |
| <li><a href="https://github.com/thunderain-project/thunderain">Thunderain</a> - a framework |
| for combining stream processing with historical data, think Lambda architecture</li> |
| <li><a href="https://github.com/OryxProject/oryx">Oryx</a> - Lambda architecture on Apache Spark, |
| Apache Kafka for real-time large scale machine learning</li> |
| <li><a href="https://github.com/bigdatagenomics/adam">ADAM</a> - A framework and CLI for loading, |
| transforming, and analyzing genomic data using Apache Spark</li> |
| <li><a href="https://github.com/salesforce/TransmogrifAI">TransmogrifAI</a> - AutoML library for building modular, reusable, strongly typed machine learning workflows on Spark with minimal hand tuning</li> |
| <li><a href="https://github.com/JohnSnowLabs/spark-nlp">Natural Language Processing for Apache Spark</a> - A library to provide simple, performant, and accurate NLP annotations for machine learning pipelines</li> |
| <li><a href="http://rumbledb.org">Rumble for Apache Spark</a> - A JSONiq engine to query, with a functional language, large, nested, and heterogeneous JSON datasets that do not fit in dataframes.</li> |
| </ul> |
| |
| <h2>Performance, monitoring, and debugging tools for Spark</h2> |
| |
| <ul> |
| <li><a href="https://github.com/g1thubhub/phil_stopwatch">Performance and debugging library</a> - A library to analyze Spark and PySpark applications for improving performance and finding the cause of failures</li> |
| <li><a href="https://www.datamechanics.co/delight">Data Mechanics Delight</a> - Delight is a free, hosted, cross-platform Spark UI alternative backed by an open-source Spark agent. It features new metrics and visualizations to simplify Spark monitoring and performance tuning.</li> |
| </ul> |
| |
| <h2>Additional language bindings</h2> |
| |
| <h3>C# / .NET</h3> |
| |
| <ul> |
| <li><a href="https://github.com/Microsoft/Mobius">Mobius</a>: C# and F# language binding and extensions to Apache Spark</li> |
| </ul> |
| |
| <h3>Clojure</h3> |
| |
| <ul> |
| <li><a href="https://github.com/TheClimateCorporation/clj-spark">clj-spark</a></li> |
| <li><a href="https://github.com/zero-one-group/geni">Geni</a> - A Clojure dataframe library that runs on Apache Spark with a focus on optimizing the REPL experience.</li> |
| </ul> |
| |
| <h3>Groovy</h3> |
| |
| <ul> |
| <li><a href="https://github.com/bunions1/groovy-spark-example">groovy-spark-example</a></li> |
| </ul> |
| |
| <h3>Julia</h3> |
| |
| <ul> |
| <li><a href="https://github.com/dfdx/Spark.jl">Spark.jl</a></li> |
| </ul> |
| |
| <h3>Kotlin</h3> |
| |
| <ul> |
| <li><a href="https://github.com/JetBrains/kotlin-spark-api">Kotlin for Apache Spark</a></li> |
| </ul> |
| |
| </div> |
| <div class="col-12 col-md-3"> |
| <div class="news" style="margin-bottom: 20px;"> |
| <h5>Latest News</h5> |
| <ul class="list-unstyled"> |
| |
| <li><a href="/news/spark-3-2-2-released.html">Spark 3.2.2 released</a> |
| <span class="small">(Jul 17, 2022)</span></li> |
| |
| <li><a href="/news/spark-3-3-0-released.html">Spark 3.3.0 released</a> |
| <span class="small">(Jun 16, 2022)</span></li> |
| |
| <li><a href="/news/sigmod-system-award.html">SIGMOD Systems Award for Apache Spark</a> |
| <span class="small">(May 13, 2022)</span></li> |
| |
| <li><a href="/news/3-1-3-released.html">Spark 3.1.3 released</a> |
| <span class="small">(Feb 18, 2022)</span></li> |
| |
| </ul> |
| <p class="small" style="text-align: right;"><a href="/news/index.html">Archive</a></p> |
| </div> |
| <div style="text-align:center; margin-bottom: 20px;"> |
| <a href="https://www.apache.org/events/current-event.html"> |
| <img src="https://www.apache.org/events/current-event-234x60.png" style="max-width: 100%;"/> |
| </a> |
| </div> |
| <div class="hidden-xs hidden-sm"> |
| <a href="/downloads.html" class="btn btn-cta btn-lg d-grid" style="margin-bottom: 30px;"> |
| Download Spark |
| </a> |
| <p style="font-size: 16px; font-weight: 500; color: #555;"> |
| Built-in Libraries: |
| </p> |
| <ul class="list-none"> |
| <li><a href="/sql/">SQL and DataFrames</a></li> |
| <li><a href="/streaming/">Spark Streaming</a></li> |
| <li><a href="/mllib/">MLlib (machine learning)</a></li> |
| <li><a href="/graphx/">GraphX (graph)</a></li> |
| </ul> |
| <a href="/third-party-projects.html">Third-Party Projects</a> |
| </div> |
| </div> |
| </div> |
| |
| |
| |
| <footer class="small"> |
| <hr> |
| Apache Spark, Spark, Apache, the Apache feather logo, and the Apache Spark project logo are either registered |
| trademarks or trademarks of The Apache Software Foundation in the United States and other countries. |
| See guidance on use of Apache Spark <a href="/trademarks.html">trademarks</a>. |
| All other marks mentioned may be trademarks or registered trademarks of their respective owners. |
| Copyright © 2018 The Apache Software Foundation, Licensed under the |
| <a href="https://www.apache.org/licenses/">Apache License, Version 2.0</a>. |
| </footer> |
| </div> |
| |
| <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/js/bootstrap.bundle.min.js" |
| integrity="sha384-MrcW6ZMFYlzcLA8Nl+NtUVF0sA7MsXsP1UyJoMp4YLEuNSfAP+JcXn/tWtIaxVXM" |
| crossorigin="anonymous"></script> |
| <script src="https://code.jquery.com/jquery.js"></script> |
| <script src="/js/lang-tabs.js"></script> |
| <script src="/js/downloads.js"></script> |
| </body> |
| </html> |