| |
| <!DOCTYPE html> |
| |
| <html> |
| <head> |
| <meta charset="utf-8" /> |
| <title>Contributing to PySpark — PySpark 3.1.1 documentation</title> |
| |
| <link rel="stylesheet" href="../_static/css/index.73d71520a4ca3b99cfee5594769eaaae.css"> |
| |
| |
| <link rel="stylesheet" |
| href="../_static/vendor/fontawesome/5.13.0/css/all.min.css"> |
| <link rel="preload" as="font" type="font/woff2" crossorigin |
| href="../_static/vendor/fontawesome/5.13.0/webfonts/fa-solid-900.woff2"> |
| <link rel="preload" as="font" type="font/woff2" crossorigin |
| href="../_static/vendor/fontawesome/5.13.0/webfonts/fa-brands-400.woff2"> |
| |
| |
| |
| <link rel="stylesheet" |
| href="../_static/vendor/open-sans_all/1.44.1/index.css"> |
| <link rel="stylesheet" |
| href="../_static/vendor/lato_latin-ext/1.44.1/index.css"> |
| |
| |
| <link rel="stylesheet" href="../_static/basic.css" type="text/css" /> |
| <link rel="stylesheet" href="../_static/pygments.css" type="text/css" /> |
| <link rel="stylesheet" type="text/css" href="../_static/css/pyspark.css" /> |
| |
| <link rel="preload" as="script" href="../_static/js/index.3da636dd464baa7582d2.js"> |
| |
| <script id="documentation_options" data-url_root="../" src="../_static/documentation_options.js"></script> |
| <script src="../_static/jquery.js"></script> |
| <script src="../_static/underscore.js"></script> |
| <script src="../_static/doctools.js"></script> |
| <script src="../_static/language_data.js"></script> |
| <script src="../_static/copybutton.js"></script> |
| <script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script> |
| <script async="async" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/latest.js?config=TeX-AMS-MML_HTMLorMML"></script> |
| <script type="text/x-mathjax-config">MathJax.Hub.Config({"tex2jax": {"inlineMath": [["$", "$"], ["\\(", "\\)"]], "processEscapes": true, "ignoreClass": "document", "processClass": "math|output_area"}})</script> |
| <link rel="canonical" href="https://spark.apache.org/docs/latest/api/python/development/contributing.html" /> |
| <link rel="search" title="Search" href="../search.html" /> |
| <link rel="next" title="Testing PySpark" href="testing.html" /> |
| <link rel="prev" title="Development" href="index.html" /> |
| <meta name="viewport" content="width=device-width, initial-scale=1" /> |
| <meta name="docsearch:language" content="en" /> |
| <!-- Matomo --> |
| <script type="text/javascript"> |
| var _paq = window._paq = window._paq || []; |
| /* tracker methods like "setCustomDimension" should be called before "trackPageView" */ |
| _paq.push(["disableCookies"]); |
| _paq.push(['trackPageView']); |
| _paq.push(['enableLinkTracking']); |
| (function() { |
| var u="https://analytics.apache.org/"; |
| _paq.push(['setTrackerUrl', u+'matomo.php']); |
| _paq.push(['setSiteId', '40']); |
| var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; |
| g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); |
| })(); |
| </script> |
| <!-- End Matomo Code --> |
| </head> |
| <body data-spy="scroll" data-target="#bd-toc-nav" data-offset="80"> |
| |
| <nav class="navbar navbar-light navbar-expand-lg bg-light fixed-top bd-navbar" id="navbar-main"> |
| <div class="container-xl"> |
| |
| <a class="navbar-brand" href="../index.html"> |
| |
| <img src="../_static/spark-logo-reverse.png" class="logo" alt="logo" /> |
| |
| </a> |
| <button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbar-menu" aria-controls="navbar-menu" aria-expanded="false" aria-label="Toggle navigation"> |
| <span class="navbar-toggler-icon"></span> |
| </button> |
| |
| <div id="navbar-menu" class="col-lg-9 collapse navbar-collapse"> |
| <ul id="navbar-main-elements" class="navbar-nav mr-auto"> |
| |
| |
| <li class="nav-item "> |
| <a class="nav-link" href="../getting_started/index.html">Getting Started</a> |
| </li> |
| |
| <li class="nav-item "> |
| <a class="nav-link" href="../user_guide/index.html">User Guide</a> |
| </li> |
| |
| <li class="nav-item "> |
| <a class="nav-link" href="../reference/index.html">API Reference</a> |
| </li> |
| |
| <li class="nav-item active"> |
| <a class="nav-link" href="index.html">Development</a> |
| </li> |
| |
| <li class="nav-item "> |
| <a class="nav-link" href="../migration_guide/index.html">Migration Guide</a> |
| </li> |
| |
| |
| </ul> |
| |
| |
| |
| |
| <ul class="navbar-nav"> |
| |
| |
| </ul> |
| </div> |
| </div> |
| </nav> |
| |
| |
| <div class="container-xl"> |
| <div class="row"> |
| |
| <div class="col-12 col-md-3 bd-sidebar"><form class="bd-search d-flex align-items-center" action="../search.html" method="get"> |
| <i class="icon fas fa-search"></i> |
| <input type="search" class="form-control" name="q" id="search-input" placeholder="Search the docs ..." aria-label="Search the docs ..." autocomplete="off" > |
| </form> |
| <nav class="bd-links" id="bd-docs-nav" aria-label="Main navigation"> |
| |
| <div class="bd-toc-item active"> |
| |
| |
| <ul class="nav bd-sidenav"> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class="active"> |
| <a href="">Contributing to PySpark</a> |
| </li> |
| |
| |
| |
| <li class=""> |
| <a href="testing.html">Testing PySpark</a> |
| </li> |
| |
| |
| |
| <li class=""> |
| <a href="debugging.html">Debugging PySpark</a> |
| </li> |
| |
| |
| |
| <li class=""> |
| <a href="setting_ide.html">Setting up IDEs</a> |
| </li> |
| |
| |
| |
| |
| |
| |
| </ul> |
| |
| </nav> |
| </div> |
| |
| |
| |
| <div class="d-none d-xl-block col-xl-2 bd-toc"> |
| |
| <div class="tocsection onthispage pt-5 pb-3"> |
| <i class="fas fa-list"></i> On this page |
| </div> |
| |
| <nav id="bd-toc-nav"> |
| <ul class="nav section-nav flex-column"> |
| |
| <li class="nav-item toc-entry toc-h2"> |
| <a href="#contributing-by-testing-releases" class="nav-link">Contributing by Testing Releases</a> |
| </li> |
| |
| <li class="nav-item toc-entry toc-h2"> |
| <a href="#contributing-documentation-changes" class="nav-link">Contributing Documentation Changes</a> |
| </li> |
| |
| <li class="nav-item toc-entry toc-h2"> |
| <a href="#preparing-to-contribute-code-changes" class="nav-link">Preparing to Contribute Code Changes</a> |
| </li> |
| |
| <li class="nav-item toc-entry toc-h2"> |
| <a href="#contributing-and-maintaining-type-hints" class="nav-link">Contributing and Maintaining Type Hints</a> |
| </li> |
| |
| <li class="nav-item toc-entry toc-h2"> |
| <a href="#code-and-docstring-guide" class="nav-link">Code and Docstring Guide</a> |
| </li> |
| |
| </ul> |
| </nav> |
| |
| |
| |
| </div> |
| |
| |
| |
| <main class="col-12 col-md-9 col-xl-7 py-md-5 pl-md-5 pr-md-4 bd-content" role="main"> |
| |
| <div> |
| |
| <div class="section" id="contributing-to-pyspark"> |
| <h1>Contributing to PySpark<a class="headerlink" href="#contributing-to-pyspark" title="Permalink to this headline">¶</a></h1> |
| <p>There are many types of contribution, for example, helping other users, testing releases, reviewing changes, |
| documentation contribution, bug reporting, JIRA maintenance, code changes, etc. |
| These are documented at <a class="reference external" href="http://spark.apache.org/contributing.html">the general guidelines</a>. |
| This page focuses on PySpark and includes additional details specifically for PySpark.</p> |
| <div class="section" id="contributing-by-testing-releases"> |
| <h2>Contributing by Testing Releases<a class="headerlink" href="#contributing-by-testing-releases" title="Permalink to this headline">¶</a></h2> |
| <p>Before the official release, PySpark release candidates are shared in the <a class="reference external" href="http://apache-spark-developers-list.1001551.n3.nabble.com/">dev@spark.apache.org</a> mailing list to vote on. |
| This release candidates can be easily installed via pip. For example, in case of Spark 3.0.0 RC1, you can install as below:</p> |
| <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>pip install https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz |
| </pre></div> |
| </div> |
| <p>The link for release files such as <code class="docutils literal notranslate"><span class="pre">https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-bin</span></code> can be found in the vote thread.</p> |
| <p>Testing and verifying users’ existing workloads against release candidates is one of the vital contributions to PySpark. |
| It prevents breaking users’ existing workloads before the official release. |
| When there is an issue such as a regression, correctness problem or performance degradation worth enough to drop the release candidate, |
| usually the release candidate is dropped and the community focuses on fixing it to include in the next release candidate.</p> |
| </div> |
| <div class="section" id="contributing-documentation-changes"> |
| <h2>Contributing Documentation Changes<a class="headerlink" href="#contributing-documentation-changes" title="Permalink to this headline">¶</a></h2> |
| <p>The release documentation is located under Spark’s <a class="reference external" href="https://github.com/apache/spark/tree/master/docs">docs</a> directory. |
| <a class="reference external" href="https://github.com/apache/spark/blob/master/docs/README.md">README.md</a> describes the required dependencies and steps |
| to generate the documentations. Usually, PySpark documentation is tested with the command below |
| under the <a class="reference external" href="https://github.com/apache/spark/tree/master/docs">docs</a> directory:</p> |
| <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nv">SKIP_SCALADOC</span><span class="o">=</span><span class="m">1</span> <span class="nv">SKIP_RDOC</span><span class="o">=</span><span class="m">1</span> <span class="nv">SKIP_SQLDOC</span><span class="o">=</span><span class="m">1</span> jekyll serve --watch |
| </pre></div> |
| </div> |
| <p>PySpark uses Sphinx to generate its release PySpark documentation. Therefore, if you want to build only PySpark documentation alone, |
| you can build under <a class="reference external" href="https://github.com/apache/spark/tree/master/python">python/docs</a> directory by:</p> |
| <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>make html |
| </pre></div> |
| </div> |
| <p>It generates the corresponding HTMLs under <code class="docutils literal notranslate"><span class="pre">python/docs/build/html</span></code>.</p> |
| <p>Lastly, please make sure that the new APIs are documented by manually adding methods and/or classes at the corresponding RST files |
| under <code class="docutils literal notranslate"><span class="pre">python/docs/source/reference</span></code>. Otherwise, they would not be documented in PySpark documentation.</p> |
| </div> |
| <div class="section" id="preparing-to-contribute-code-changes"> |
| <h2>Preparing to Contribute Code Changes<a class="headerlink" href="#preparing-to-contribute-code-changes" title="Permalink to this headline">¶</a></h2> |
| <p>Before starting to work on codes in PySpark, it is recommended to read <a class="reference external" href="http://spark.apache.org/contributing.html">the general guidelines</a>. |
| There are a couple of additional notes to keep in mind when contributing to codes in PySpark:</p> |
| <ul class="simple"> |
| <li><p>Be Pythonic.</p></li> |
| <li><p>APIs are matched with Scala and Java sides in general.</p></li> |
| <li><p>PySpark specific APIs can still be considered as long as they are Pythonic and do not conflict with other existent APIs, for example, decorator usage of UDFs.</p></li> |
| <li><p>If you extend or modify public API, please adjust corresponding type hints. See <a class="reference internal" href="#contributing-and-maintaining-type-hints">Contributing and Maintaining Type Hints</a> for details.</p></li> |
| </ul> |
| </div> |
| <div class="section" id="contributing-and-maintaining-type-hints"> |
| <h2>Contributing and Maintaining Type Hints<a class="headerlink" href="#contributing-and-maintaining-type-hints" title="Permalink to this headline">¶</a></h2> |
| <p>PySpark type hints are provided using stub files, placed in the same directory as the annotated module, with exception to <code class="docutils literal notranslate"><span class="pre">#</span> <span class="pre">type:</span> <span class="pre">ignore</span></code> in modules which don’t have their own stubs (tests, examples and non-public API). |
| As a rule of thumb, only public API is annotated.</p> |
| <p>Annotations should, when possible:</p> |
| <ul> |
| <li><p>Reflect expectations of the underlying JVM API, to help avoid type related failures outside Python interpreter.</p></li> |
| <li><p>In case of conflict between too broad (<code class="docutils literal notranslate"><span class="pre">Any</span></code>) and too narrow argument annotations, prefer the latter as one, as long as it is covering most of the typical use cases.</p></li> |
| <li><p>Indicate nonsensical combinations of arguments using <code class="docutils literal notranslate"><span class="pre">@overload</span></code> annotations. For example, to indicate that <code class="docutils literal notranslate"><span class="pre">*Col</span></code> and <code class="docutils literal notranslate"><span class="pre">*Cols</span></code> arguments are mutually exclusive:</p> |
| <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="nd">@overload</span> |
| <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span> |
| <span class="bp">self</span><span class="p">,</span> |
| <span class="o">*</span><span class="p">,</span> |
| <span class="n">threshold</span><span class="p">:</span> <span class="nb">float</span> <span class="o">=</span> <span class="o">...</span><span class="p">,</span> |
| <span class="n">inputCol</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]</span> <span class="o">=</span> <span class="o">...</span><span class="p">,</span> |
| <span class="n">outputCol</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]</span> <span class="o">=</span> <span class="o">...</span> |
| <span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span> <span class="o">...</span> |
| <span class="nd">@overload</span> |
| <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span> |
| <span class="bp">self</span><span class="p">,</span> |
| <span class="o">*</span><span class="p">,</span> |
| <span class="n">thresholds</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">]]</span> <span class="o">=</span> <span class="o">...</span><span class="p">,</span> |
| <span class="n">inputCols</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="n">List</span><span class="p">[</span><span class="nb">str</span><span class="p">]]</span> <span class="o">=</span> <span class="o">...</span><span class="p">,</span> |
| <span class="n">outputCols</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="n">List</span><span class="p">[</span><span class="nb">str</span><span class="p">]]</span> <span class="o">=</span> <span class="o">...</span> |
| <span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span> <span class="o">...</span> |
| </pre></div> |
| </div> |
| </li> |
| <li><p>Be compatible with the current stable MyPy release.</p></li> |
| </ul> |
| <p>Complex supporting type definitions, should be placed in dedicated <code class="docutils literal notranslate"><span class="pre">_typing.pyi</span></code> stubs. See for example <a class="reference external" href="https://github.com/apache/spark/blob/master/python/pyspark/sql/_typing.pyi">pyspark.sql._typing.pyi</a>.</p> |
| <p>Annotations can be validated using <code class="docutils literal notranslate"><span class="pre">dev/lint-python</span></code> script or by invoking mypy directly:</p> |
| <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>mypy --config python/mypy.ini python/pyspark |
| </pre></div> |
| </div> |
| </div> |
| <div class="section" id="code-and-docstring-guide"> |
| <h2>Code and Docstring Guide<a class="headerlink" href="#code-and-docstring-guide" title="Permalink to this headline">¶</a></h2> |
| <p>Please follow the style of the existing codebase as is, which is virtually PEP 8 with one exception: lines can be up |
| to 100 characters in length, not 79. |
| For the docstring style, PySpark follows <a class="reference external" href="https://numpydoc.readthedocs.io/en/latest/format.html">NumPy documentation style</a>.</p> |
| <p>Note that the method and variable names in PySpark are the similar case is <code class="docutils literal notranslate"><span class="pre">threading</span></code> library in Python itself where |
| the APIs were inspired by Java. PySpark also follows <cite>camelCase</cite> for exposed APIs that match with Scala and Java. |
| There is an exception <code class="docutils literal notranslate"><span class="pre">functions.py</span></code> that uses <cite>snake_case</cite>. It was in order to make APIs SQL (and Python) friendly.</p> |
| <p>PySpark leverages linters such as <a class="reference external" href="https://pycodestyle.pycqa.org/en/latest/">pycodestyle</a> and <a class="reference external" href="https://flake8.pycqa.org/en/latest/">flake8</a>, which <code class="docutils literal notranslate"><span class="pre">dev/lint-python</span></code> runs. Therefore, make sure to run that script to double check.</p> |
| </div> |
| </div> |
| |
| |
| </div> |
| |
| |
| <div class='prev-next-bottom'> |
| |
| <a class='left-prev' id="prev-link" href="index.html" title="previous page">Development</a> |
| <a class='right-next' id="next-link" href="testing.html" title="next page">Testing PySpark</a> |
| |
| </div> |
| |
| </main> |
| |
| |
| </div> |
| </div> |
| |
| |
| <script src="../_static/js/index.3da636dd464baa7582d2.js"></script> |
| |
| |
| <footer class="footer mt-5 mt-md-0"> |
| <div class="container"> |
| <p> |
| © Copyright .<br/> |
| Created using <a href="http://sphinx-doc.org/">Sphinx</a> 3.0.4.<br/> |
| </p> |
| </div> |
| </footer> |
| </body> |
| </html> |