| <p>Welcome to the Spark documentation!</p> |
| |
| <p>This readme will walk you through navigating and building the Spark documentation, which is included |
| here with the Spark source code. You can also find documentation specific to release versions of |
| Spark at https://spark.apache.org/documentation.html.</p> |
| |
| <p>Read on to learn more about viewing documentation in plain text (i.e., markdown) or building the |
| documentation yourself. Why build it yourself? So that you have the docs that correspond to |
| whichever version of Spark you currently have checked out of revision control.</p> |
| |
| <h2 id="prerequisites">Prerequisites</h2> |
| |
| <p>The Spark documentation build uses a number of tools to build HTML docs and API docs in Scala, Java, |
| Python, R and SQL.</p> |
| |
| <p>You need to have <a href="https://www.ruby-lang.org/en/documentation/installation/">Ruby</a> and |
| <a href="https://docs.python.org/2/using/unix.html#getting-and-installing-the-latest-version-of-python">Python</a> |
| installed. Also install the following libraries:</p> |
| |
| <pre><code class="language-sh">$ sudo gem install jekyll jekyll-redirect-from pygments.rb |
| $ sudo pip install Pygments |
| # Following is needed only for generating API docs |
| $ sudo pip install sphinx pypandoc mkdocs |
| $ sudo Rscript -e 'install.packages(c("knitr", "devtools", "rmarkdown"), repos="https://cloud.r-project.org/")' |
| $ sudo Rscript -e 'devtools::install_version("roxygen2", version = "5.0.1", repos="https://cloud.r-project.org/")' |
| $ sudo Rscript -e 'devtools::install_version("testthat", version = "1.0.2", repos="https://cloud.r-project.org/")' |
| </code></pre> |
| |
| <p>Note: If you are on a system with both Ruby 1.9 and Ruby 2.0 you may need to replace gem with gem2.0.</p> |
| |
| <p>Note: Other versions of roxygen2 might work in SparkR documentation generation but <code>RoxygenNote</code> field in <code>$SPARK_HOME/R/pkg/DESCRIPTION</code> is 5.0.1, which is updated if the version is mismatched.</p> |
| |
| <h2 id="generating-the-documentation-html">Generating the Documentation HTML</h2> |
| |
| <p>We include the Spark documentation as part of the source (as opposed to using a hosted wiki, such as |
| the github wiki, as the definitive documentation) to enable the documentation to evolve along with |
| the source code and be captured by revision control (currently git). This way the code automatically |
| includes the version of the documentation that is relevant regardless of which version or release |
| you have checked out or downloaded.</p> |
| |
| <p>In this directory you will find text files formatted using Markdown, with an “.md” suffix. You can |
| read those text files directly if you want. Start with <code>index.md</code>.</p> |
| |
| <p>Execute <code>jekyll build</code> from the <code>docs/</code> directory to compile the site. Compiling the site with |
| Jekyll will create a directory called <code>_site</code> containing <code>index.html</code> as well as the rest of the |
| compiled files.</p> |
| |
| <pre><code class="language-sh">$ cd docs |
| $ jekyll build |
| </code></pre> |
| |
| <p>You can modify the default Jekyll build as follows:</p> |
| |
| <pre><code class="language-sh"># Skip generating API docs (which takes a while) |
| $ SKIP_API=1 jekyll build |
| |
| # Serve content locally on port 4000 |
| $ jekyll serve --watch |
| |
| # Build the site with extra features used on the live page |
| $ PRODUCTION=1 jekyll build |
| </code></pre> |
| |
| <h2 id="api-docs-scaladoc-javadoc-sphinx-roxygen2-mkdocs">API Docs (Scaladoc, Javadoc, Sphinx, roxygen2, MkDocs)</h2> |
| |
| <p>You can build just the Spark scaladoc and javadoc by running <code>./build/sbt unidoc</code> from the <code>$SPARK_HOME</code> directory.</p> |
| |
| <p>Similarly, you can build just the PySpark docs by running <code>make html</code> from the |
| <code>$SPARK_HOME/python/docs</code> directory. Documentation is only generated for classes that are listed as |
| public in <code>__init__.py</code>. The SparkR docs can be built by running <code>$SPARK_HOME/R/create-docs.sh</code>, and |
| the SQL docs can be built by running <code>$SPARK_HOME/sql/create-docs.sh</code> |
| after <a href="https://github.com/apache/spark#building-spark">building Spark</a> first.</p> |
| |
| <p>When you run <code>jekyll build</code> in the <code>docs</code> directory, it will also copy over the scaladoc and javadoc for the various |
| Spark subprojects into the <code>docs</code> directory (and then also into the <code>_site</code> directory). We use a |
| jekyll plugin to run <code>./build/sbt unidoc</code> before building the site so if you haven’t run it (recently) it |
| may take some time as it generates all of the scaladoc and javadoc using <a href="https://github.com/sbt/sbt-unidoc">Unidoc</a>. |
| The jekyll plugin also generates the PySpark docs using <a href="http://sphinx-doc.org/">Sphinx</a>, SparkR docs |
| using <a href="https://cran.r-project.org/web/packages/roxygen2/index.html">roxygen2</a> and SQL docs |
| using <a href="https://www.mkdocs.org/">MkDocs</a>.</p> |
| |
| <p>NOTE: To skip the step of building and copying over the Scala, Java, Python, R and SQL API docs, run <code>SKIP_API=1 |
| jekyll build</code>. In addition, <code>SKIP_SCALADOC=1</code>, <code>SKIP_PYTHONDOC=1</code>, <code>SKIP_RDOC=1</code> and <code>SKIP_SQLDOC=1</code> can be used |
| to skip a single step of the corresponding language. <code>SKIP_SCALADOC</code> indicates skipping both the Scala and Java docs.</p> |