GH-325: [Website] Update website deployment script to use latest LTS version of Node.js and Webpack 5.75.0 (#326)

# Overview

This pull request modifies the `apache/arrow-site` website deployment
workflow (`.github/workflows/deploy.yml`) to use the latest LTS version
of Node.js and Webpack 5.75.0 to work around the build issue described
in #325.

# Qualification

To qualify these changes, I:

1. Submitted these changes to the `main` branch of the
`mathworks/arrow-site` fork in order to trigger the `gh-pages`
deployment workflow. I then selected `gh-pages` as the GitHub Pages
deployment branch and verified that the site was deployed as expected to
https://mathworks.github.io/arrow-site/. For an example of a successful
workflow run, see:
https://github.com/mathworks/arrow-site/actions/runs/4313253336/jobs/7524824999.
2. I inspected the GitHub Actions workflow steps to ensure there are no
errors.
 
# Future Directions

1. While qualifying with the [fork deployment
workflow](https://github.com/apache/arrow-site#deployment), I realized
that I needed to [manually change the GitHub Pages deployment
branch](https://docs.github.com/en/pages/quickstart) from `asf-site` to
`gh-pages` in the "Pages" settings of the `mathworks/arrow-site` fork.
This wasn't immediately obvious, and it [isn't listed explicitly as a
required step in the
README.md](https://github.com/apache/arrow-site#deployment) of
`apache/arrow-site`. It would helpful to add an explicit note about this
step. I've captured this as #327 and addressed it with PR #328.
2. As described in the "Workarounds" section of the description of
apache/arrow-site#325, there is still more we could choose to do to
address the root cause of these build failures (the deprecation of the
`md4` hash algorithm in Node 18). This would include setting the
`output.hashFunction` to `xxhash64` for Webpack.
3. We could move the workflow into a container to make it easier to
reproduce the website build process on a local machine (see the
discussion in the comments on this pull request).

# Notes

1. Thank you @sgilmore10 for your help with this pull request!
2. Thank you to @avantgardnerio for your suggestion to move the
deployment workflow inside of an `ubuntu:latest` container!

Closes apache/arrow-site#325.

---------

Co-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
4 files changed
tree: 6dc652ce631a090148e0c087bdb9305c5fbd071a
  1. .github/
  2. _data/
  3. _docs/
  4. _includes/
  5. _layouts/
  6. _posts/
  7. _release/
  8. _webpack/
  9. assets/
  10. css/
  11. img/
  12. scripts/
  13. .asf.yaml
  14. .gitignore
  15. .nvmrc
  16. .ruby-version
  17. _config.yml
  18. arrow.rdf
  19. blog.html
  20. committers.md
  21. community.md
  22. faq.md
  23. Gemfile
  24. index.html
  25. install.md
  26. LICENSE.txt
  27. NOTICE.txt
  28. overview.md
  29. package-lock.json
  30. package.json
  31. powered_by.md
  32. Rakefile
  33. README.md
  34. release-announcement-template.md
  35. robots.txt
  36. security.md
  37. use_cases.md
  38. visual_identity.md
  39. webpack.config.js
README.md

Apache Arrow Website

Overview

Jekyll is used to generate HTML files from the Markdown + templates in this repository. The built version of the site is kept on the asf-site branch, which gets deployed to https://arrow.apache.org.

Adding Content

To add a blog post, create a new markdown file in the _posts directory, following the model of existing posts. In the front matter, you should specify an “author”. This should be your Apache ID if you have one, or it can just be your name. To add additional metadata about yourself (GitHub ID, website), add yourself to _data/contributors.yml. This object is keyed by apacheId, so use that as the author in your post. (It doesn't matter if the ID actually exists in the ASF; all metadata is local to this project.)

Prerequisites

With a recent version of Ruby (i.e. one that does not have an End-Of-Life (EOL) status) installed, run the following commands to install Jekyll.

gem install bundler
bundle install

We also need Node.JS to use webpack for maintaining dependent JavaScript and CSS libraries.

We can install webpack and dependent JavaScript and CSS libraries automatically by following command lines to preview or build the site. So we just need to install Node.JS here.

Previewing the site

Run the following and open http://localhost:4000/ to preview generated site locally:

bundle exec rake

Deployment

apache/arrow-site

On a commit to the main branch of apache/arrow-site, the rendered static site will be published to the asf-site branch using GitHub Actions.

Forks

When implementing changes to the website on a fork, the GitHub Actions workflow behaves differently.

On a commit to the main branch, the rendered static site will be published to a branch named gh-pages (rather than asf-site). If it doesn't already exist, a gh-pages branch will be automatically created by the GitHub Actions workflow when it succeeds.

The gh-pages branch is intended to be used with GitHub Pages. Deploying changes on the gh-pages branch to GitHub Pages is a useful way to preview changes to the website. It can also be a helpful way to share changes that are still in progress with others, since they can easily view them by navigating to the GitHub Pages URL in their web browser.

For the changes on the gh-pages branch to be deployed to GitHub Pages, the Source branch for GitHub Pages deployment must be set to gh-pages in the repository Settings of your fork (by default, the Source branch should be set to asf-site). Instructions on how to configure the Source branch can be found in the GitHub Pages documentation.

FYI: We can also generate the site for https://arrow.apache.org/ to _site/ locally by the following command line:

JEKYLL_ENV=production bundle exec rake generate

Updating Code Documentation

To update the documentation, you can run the script ./dev/gen_apidocs.sh in the apache/arrow repository. This script will run the code documentation tools in a fixed environment.

C (GLib)

First, build Apache Arrow C++ and Apache Arrow GLib. This assumes that you have checkouts your forks of arrow and arrow-site alongside each other in your file system.

mkdir -p ../cpp/build
cd ../cpp/build
cmake .. -DCMAKE_BUILD_TYPE=debug
make
cd ../../c_glib
./autogen.sh
./configure \
  --with-arrow-cpp-build-dir=$PWD/../cpp/build \
  --with-arrow-cpp-build-type=debug \
  --enable-gtk-doc
LD_LIBRARY_PATH=$PWD/../cpp/build/debug make GTK_DOC_V_XREF=": "
rsync -r doc/reference/html/ ../../arrow-site/asf-site/docs/c_glib/

JavaScript

cd ../js
npm run doc
rsync -r doc/ ../../arrow-site/asf-site/docs/js

Then add/commit/push from the asf-site/ git checkout.

Using Docker

If you don't wish to change or install ruby and nodejs locally, you can use docker to build and preview the site with a command like:

docker run -v `pwd`:/arrow-site -p 4000:4000 -it ruby bash
cd arrow-site
apt-get update
apt-get install -y npm
gem install bundler
bundle install
# Serve using local container address
bundle exec rake HOST=0.0.0.0

Then open http://locahost:4000 locally