blob: ac650f107c51a761f564eb6e827ebf851e1db9d7 [file] [log] [blame]
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
<title>PySpark Documentation &#8212; PySpark 3.2.1 documentation</title>
<link href="_static/css/theme.css" rel="stylesheet">
<link href="_static/css/index.ff1ffe594081f20da1ef19478df9384b.css" rel="stylesheet">
<link rel="stylesheet"
href="_static/vendor/fontawesome/5.13.0/css/all.min.css">
<link rel="preload" as="font" type="font/woff2" crossorigin
href="_static/vendor/fontawesome/5.13.0/webfonts/fa-solid-900.woff2">
<link rel="preload" as="font" type="font/woff2" crossorigin
href="_static/vendor/fontawesome/5.13.0/webfonts/fa-brands-400.woff2">
<link rel="stylesheet" href="_static/css/blank.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="stylesheet" type="text/css" href="_static/css/pyspark.css" />
<link rel="preload" as="script" href="_static/js/index.be7d3bbb2ef33a8344ce.js">
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/language_data.js"></script>
<script src="_static/copybutton.js"></script>
<script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
<script async="async" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/latest.js?config=TeX-AMS-MML_HTMLorMML"></script>
<script type="text/x-mathjax-config">MathJax.Hub.Config({"tex2jax": {"inlineMath": [["$", "$"], ["\\(", "\\)"]], "processEscapes": true, "ignoreClass": "tex2jax_ignore|mathjax_ignore|document", "processClass": "tex2jax_process|mathjax_process|math|output_area"}})</script>
<link rel="canonical" href="https://spark.apache.org/docs/latest/api/python/index.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Getting Started" href="getting_started/index.html" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="docsearch:language" content="None">
<!-- Google Analytics -->
</head>
<body data-spy="scroll" data-target="#bd-toc-nav" data-offset="80">
<div class="container-fluid" id="banner"></div>
<nav class="navbar navbar-light navbar-expand-lg bg-light fixed-top bd-navbar" id="navbar-main"><div class="container-xl">
<div id="navbar-start">
<a class="navbar-brand" href="#">
<img src="_static/spark-logo-reverse.png" class="logo" alt="logo">
</a>
</div>
<button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbar-collapsible" aria-controls="navbar-collapsible" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div id="navbar-collapsible" class="col-lg-9 collapse navbar-collapse">
<div id="navbar-center" class="mr-auto">
<div class="navbar-center-item">
<ul id="navbar-main-elements" class="navbar-nav">
<li class="toctree-l1 nav-item">
<a class="reference internal nav-link" href="getting_started/index.html">
Getting Started
</a>
</li>
<li class="toctree-l1 nav-item">
<a class="reference internal nav-link" href="user_guide/index.html">
User Guide
</a>
</li>
<li class="toctree-l1 nav-item">
<a class="reference internal nav-link" href="reference/index.html">
API Reference
</a>
</li>
<li class="toctree-l1 nav-item">
<a class="reference internal nav-link" href="development/index.html">
Development
</a>
</li>
<li class="toctree-l1 nav-item">
<a class="reference internal nav-link" href="migration_guide/index.html">
Migration Guide
</a>
</li>
</ul>
</div>
</div>
<div id="navbar-end">
<div class="navbar-end-item">
<ul id="navbar-icon-links" class="navbar-nav" aria-label="Icon Links">
</ul>
</div>
</div>
</div>
</div>
</nav>
<div class="container-xl">
<div class="row">
<!-- Only show if we have sidebars configured, else just a small margin -->
<div class="col-12 col-md-3 bd-sidebar"><form class="bd-search d-flex align-items-center" action="search.html" method="get">
<i class="icon fas fa-search"></i>
<input type="search" class="form-control" name="q" id="search-input" placeholder="Search the docs ..." aria-label="Search the docs ..." autocomplete="off" >
</form><nav class="bd-links" id="bd-docs-nav" aria-label="Main navigation">
<div class="bd-toc-item active">
</div>
</nav>
</div>
<div class="d-none d-xl-block col-xl-2 bd-toc">
<div class="toc-item">
<nav id="bd-toc-nav">
</nav>
</div>
<div class="toc-item">
</div>
</div>
<main class="col-12 col-md-9 col-xl-7 py-md-5 pl-md-5 pr-md-4 bd-content" role="main">
<div>
<section id="pyspark-documentation">
<h1>PySpark Documentation<a class="headerlink" href="#pyspark-documentation" title="Permalink to this headline"></a></h1>
<p><a class="reference external" href="https://mybinder.org/v2/gh/apache/spark/v3.2.1?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_df.ipynb">Live Notebook</a> | <a class="reference external" href="https://github.com/apache/spark">GitHub</a> | <a class="reference external" href="https://issues.apache.org/jira/projects/SPARK/issues">Issues</a> | <a class="reference external" href="https://github.com/apache/spark/tree/v3.2.1/examples/src/main/python">Examples</a> | <a class="reference external" href="https://spark.apache.org/community.html">Community</a></p>
<p>PySpark is an interface for Apache Spark in Python. It not only allows you to write
Spark applications using Python APIs, but also provides the PySpark shell for
interactively analyzing your data in a distributed environment. PySpark supports most
of Spark’s features such as Spark SQL, DataFrame, Streaming, MLlib
(Machine Learning) and Spark Core.</p>
<img alt="PySpark Components" src="_images/pyspark-components.png" />
<p><strong>Spark SQL and DataFrame</strong></p>
<p>Spark SQL is a Spark module for structured data processing. It provides
a programming abstraction called DataFrame and can also act as distributed
SQL query engine.</p>
<p><strong>pandas API on Spark</strong></p>
<p>pandas API on Spark allows you to scale your pandas workload out.
With this package, you can:</p>
<ul class="simple">
<li><p>Be immediately productive with Spark, with no learning curve, if you are already familiar with pandas.</p></li>
<li><p>Have a single codebase that works both with pandas (tests, smaller datasets) and with Spark (distributed datasets).</p></li>
<li><p>Switch to pandas API and PySpark API contexts easily without any overhead.</p></li>
</ul>
<p><strong>Streaming</strong></p>
<p>Running on top of Spark, the streaming feature in Apache Spark enables powerful
interactive and analytical applications across both streaming and historical data,
while inheriting Spark’s ease of use and fault tolerance characteristics.</p>
<p><strong>MLlib</strong></p>
<p>Built on top of Spark, MLlib is a scalable machine learning library that provides
a uniform set of high-level APIs that help users create and tune practical machine
learning pipelines.</p>
<p><strong>Spark Core</strong></p>
<p>Spark Core is the underlying general execution engine for the Spark platform that all
other functionality is built on top of. It provides an RDD (Resilient Distributed Dataset)
and in-memory computing capabilities.</p>
<div class="toctree-wrapper compound">
</div>
</section>
</div>
<!-- Previous / next buttons -->
<div class='prev-next-area'>
<a class='right-next' id="next-link" href="getting_started/index.html" title="next page">
<div class="prev-next-info">
<p class="prev-next-subtitle">next</p>
<p class="prev-next-title">Getting Started</p>
</div>
<i class="fas fa-angle-right"></i>
</a>
</div>
</main>
</div>
</div>
<script src="_static/js/index.be7d3bbb2ef33a8344ce.js"></script>
<footer class="footer mt-5 mt-md-0">
<div class="container">
<div class="footer-item">
<p class="copyright">
&copy; Copyright .<br>
</p>
</div>
<div class="footer-item">
<p class="sphinx-version">
Created using <a href="http://sphinx-doc.org/">Sphinx</a> 3.0.4.<br>
</p>
</div>
</div>
</footer>
</body>
</html>