| <p>Welcome to the Spark documentation!</p> |
| |
| <p>This readme will walk you through navigating and building the Spark documentation, which is included |
| here with the Spark source code. You can also find documentation specific to release versions of |
| Spark at https://spark.apache.org/documentation.html.</p> |
| |
| <p>Read on to learn more about viewing documentation in plain text (i.e., markdown) or building the |
| documentation yourself. Why build it yourself? So that you have the docs that correspond to |
| whichever version of Spark you currently have checked out of revision control.</p> |
| |
| <h2 id="prerequisites">Prerequisites</h2> |
| |
| <p>The Spark documentation build uses a number of tools to build HTML docs and API docs in Scala, Java, |
| Python, R and SQL.</p> |
| |
| <p>You need to have <a href="https://www.ruby-lang.org/en/documentation/installation/">Ruby</a> and |
| <a href="https://docs.python.org/2/using/unix.html#getting-and-installing-the-latest-version-of-python">Python</a> |
| installed. Make sure the <code class="language-plaintext highlighter-rouge">bundle</code> command is available, if not install the Gem containing it:</p> |
| |
| <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">sudo </span>gem <span class="nb">install </span>bundler |
| </code></pre></div></div> |
| |
| <p>After this all the required ruby dependencies can be installed from the <code class="language-plaintext highlighter-rouge">docs/</code> directory via the Bundler:</p> |
| |
| <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">cd </span>docs |
| <span class="nv">$ </span>bundle <span class="nb">install</span> |
| </code></pre></div></div> |
| |
| <p>Note: If you are on a system with both Ruby 1.9 and Ruby 2.0 you may need to replace gem with gem2.0.</p> |
| |
| <h3 id="sql-and-python-api-documentation-optional">SQL and Python API Documentation (Optional)</h3> |
| |
| <p>To generate SQL and Python API docs, you’ll need to install these libraries:</p> |
| |
| <!-- |
| TODO(SPARK-32407): Sphinx 3.1+ does not correctly index nested classes. |
| See also https://github.com/sphinx-doc/sphinx/issues/7551. |
| |
| TODO(SPARK-35375): Jinja2 3.0.0+ causes error when building with Sphinx. |
| See also https://issues.apache.org/jira/browse/SPARK-35375. |
| --> |
| <p>Run the following command from $SPARK_HOME:</p> |
| <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>pip <span class="nb">install</span> <span class="nt">--upgrade</span> <span class="nt">-r</span> dev/requirements.txt |
| </code></pre></div></div> |
| |
| <h3 id="r-api-documentation-optional">R API Documentation (Optional)</h3> |
| |
| <p>If you’d like to generate R API documentation, you’ll need to <a href="https://pandoc.org/installing.html">install Pandoc</a> |
| and install these libraries:</p> |
| |
| <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">sudo </span>Rscript <span class="nt">-e</span> <span class="s1">'install.packages(c("knitr", "devtools", "testthat", "rmarkdown"), repos="https://cloud.r-project.org/")'</span> |
| <span class="nv">$ </span><span class="nb">sudo </span>Rscript <span class="nt">-e</span> <span class="s1">'devtools::install_version("roxygen2", version = "7.1.2", repos="https://cloud.r-project.org/")'</span> |
| <span class="nv">$ </span><span class="nb">sudo </span>Rscript <span class="nt">-e</span> <span class="s2">"devtools::install_version('pkgdown', version='2.0.1', repos='https://cloud.r-project.org')"</span> |
| <span class="nv">$ </span><span class="nb">sudo </span>Rscript <span class="nt">-e</span> <span class="s2">"devtools::install_version('preferably', version='0.4', repos='https://cloud.r-project.org')"</span> |
| </code></pre></div></div> |
| |
| <p>Note: Other versions of roxygen2 might work in SparkR documentation generation but <code class="language-plaintext highlighter-rouge">RoxygenNote</code> field in <code class="language-plaintext highlighter-rouge">$SPARK_HOME/R/pkg/DESCRIPTION</code> is 7.1.2, which is updated if the version is mismatched.</p> |
| |
| <h2 id="generating-the-documentation-html">Generating the Documentation HTML</h2> |
| |
| <p>We include the Spark documentation as part of the source (as opposed to using a hosted wiki, such as |
| the github wiki, as the definitive documentation) to enable the documentation to evolve along with |
| the source code and be captured by revision control (currently git). This way the code automatically |
| includes the version of the documentation that is relevant regardless of which version or release |
| you have checked out or downloaded.</p> |
| |
| <p>In this directory you will find text files formatted using Markdown, with an “.md” suffix. You can |
| read those text files directly if you want. Start with <code class="language-plaintext highlighter-rouge">index.md</code>.</p> |
| |
| <p>Execute <code class="language-plaintext highlighter-rouge">SKIP_API=1 bundle exec jekyll build</code> from the <code class="language-plaintext highlighter-rouge">docs/</code> directory to compile the site. Compiling the site with |
| Jekyll will create a directory called <code class="language-plaintext highlighter-rouge">_site</code> containing <code class="language-plaintext highlighter-rouge">index.html</code> as well as the rest of the |
| compiled files.</p> |
| |
| <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">cd </span>docs |
| <span class="c"># Skip generating API docs (which takes a while)</span> |
| <span class="nv">$ SKIP_API</span><span class="o">=</span>1 bundle <span class="nb">exec </span>jekyll build |
| </code></pre></div></div> |
| |
| <p>You can also generate the default Jekyll build with API Docs as follows:</p> |
| |
| <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>bundle <span class="nb">exec </span>jekyll build |
| |
| <span class="c"># Serve content locally on port 4000</span> |
| <span class="nv">$ </span>bundle <span class="nb">exec </span>jekyll serve <span class="nt">--watch</span> |
| |
| <span class="c"># Build the site with extra features used on the live page</span> |
| <span class="nv">$ PRODUCTION</span><span class="o">=</span>1 bundle <span class="nb">exec </span>jekyll build |
| </code></pre></div></div> |
| |
| <h2 id="api-docs-scaladoc-javadoc-sphinx-roxygen2-mkdocs">API Docs (Scaladoc, Javadoc, Sphinx, roxygen2, MkDocs)</h2> |
| |
| <p>You can build just the Spark scaladoc and javadoc by running <code class="language-plaintext highlighter-rouge">./build/sbt unidoc</code> from the <code class="language-plaintext highlighter-rouge">$SPARK_HOME</code> directory.</p> |
| |
| <p>Similarly, you can build just the PySpark docs by running <code class="language-plaintext highlighter-rouge">make html</code> from the |
| <code class="language-plaintext highlighter-rouge">$SPARK_HOME/python/docs</code> directory. Documentation is only generated for classes that are listed as |
| public in <code class="language-plaintext highlighter-rouge">__init__.py</code>. The SparkR docs can be built by running <code class="language-plaintext highlighter-rouge">$SPARK_HOME/R/create-docs.sh</code>, and |
| the SQL docs can be built by running <code class="language-plaintext highlighter-rouge">$SPARK_HOME/sql/create-docs.sh</code> |
| after <a href="https://github.com/apache/spark#building-spark">building Spark</a> first.</p> |
| |
| <p>When you run <code class="language-plaintext highlighter-rouge">bundle exec jekyll build</code> in the <code class="language-plaintext highlighter-rouge">docs</code> directory, it will also copy over the scaladoc and javadoc for the various |
| Spark subprojects into the <code class="language-plaintext highlighter-rouge">docs</code> directory (and then also into the <code class="language-plaintext highlighter-rouge">_site</code> directory). We use a |
| jekyll plugin to run <code class="language-plaintext highlighter-rouge">./build/sbt unidoc</code> before building the site so if you haven’t run it (recently) it |
| may take some time as it generates all of the scaladoc and javadoc using <a href="https://github.com/sbt/sbt-unidoc">Unidoc</a>. |
| The jekyll plugin also generates the PySpark docs using <a href="http://sphinx-doc.org/">Sphinx</a>, SparkR docs |
| using <a href="https://cran.r-project.org/web/packages/roxygen2/index.html">roxygen2</a> and SQL docs |
| using <a href="https://www.mkdocs.org/">MkDocs</a>.</p> |
| |
| <p>NOTE: To skip the step of building and copying over the Scala, Java, Python, R and SQL API docs, run <code class="language-plaintext highlighter-rouge">SKIP_API=1 |
| bundle exec jekyll build</code>. In addition, <code class="language-plaintext highlighter-rouge">SKIP_SCALADOC=1</code>, <code class="language-plaintext highlighter-rouge">SKIP_PYTHONDOC=1</code>, <code class="language-plaintext highlighter-rouge">SKIP_RDOC=1</code> and <code class="language-plaintext highlighter-rouge">SKIP_SQLDOC=1</code> can be used |
| to skip a single step of the corresponding language. <code class="language-plaintext highlighter-rouge">SKIP_SCALADOC</code> indicates skipping both the Scala and Java docs.</p> |