blob: eaf5b064e5efc79c73ffe546e3192cdc728f2a7a [file] [log] [blame]
= Proposal: Website 2020
This will hopefully serve as the documentation for the website once/if executed.
High-level plan
* Kill all use and trace of the Apache CMS
* Publish html directly to git
* Allow for several sources to publish html
The result will be several sources, that can be run and managed
independently, feeding content into the git repo housing our live html
website.
This is a pragmatic perspective that sets us up to get a best-of-breed
outcome acknowledging trends in all our website endevors:
* All tools we've used have been heavily extended
* Content takes a hit each tool change
* All tools have limitations (strenghts/weaknesses)
* Filling gaps involves extensions (bullet one)
* Tools last on average 2-5 years
* Many types of content actually exist: javadoc, release notes, download pages
* We will always be in a hybrid situation
Think of it as "microservices for content" and avoiding a monolith.
Ideally this sets us up to acknowledge and embrace evolving our
website tech without many of the above disadvantages. If we have a
clean CSS and simple menu, we should be able to take HTML from
anywhere.
When we want to add a new content source we do not need to figure out
how to get it to work "through" the existing generator or redo
everything that already works, we simply have it generate content
directly to html directly to our site git.
As long as we maintain a common CSS and look and feel, we're good.
== DJ
That's one point of view.
I agree with a lot of it, but I'm worried...
Perhaps I can characterize it as thinking that more options are better.
However, nothing stopped anyone from migrating all the content from CMS to an external .md based system, or from .md to .adoc.
But no one did it.
Instead, people piled system on top of system, duplicating content in a partially intelligable state, and reducing the amount of organization and coherence at every step.
I think reasonable priorities are:
* as few systems involved in the website as possible
* avoid extensions and writing our own systems, we don't have time to maintain them let alone document them.
* organize the input to the website generation process.
* discourage adding more systems.
My most profound teacher said to pay attention to the balance between structure and behavior.
We need simple and fairly rigid structure here so behavior can go towards improving the documentation rather than wondering how or why.
I'm very worried about having more than one process push content to the git "website" repo.
Right now, we effectively have that process with the svn repo, and there's some content that seems to have been added and abandoned.
No one will tell me about it, including the person who put it there.
As a result, a requirement in my mind is that publishing the website needs to start with deleting everything previously there.
The next step is adding the entire current content.
Also, "publishing straight to git" actually means committing to a local repository (with tooling this could be bare).
This gives a preview opportunity, followed by pushing to the apache repo.
'''
== Kill all use and trace of the Apache CMS
TODO
=== DJ
I think that my antora preview is 99% there.
Having a second opinion would be valuable, but that could be a time consuming endeavor.
'''
== Publish html directly to git
Apache allows a project to designate a git repository as their
"website." All files in that repository are published as-is to the
internet as the project's website. HTML must be committed to this
repository as it does not offer any generation of any kind.
TODO: what is the process for getting one of these repos?
TODO: can we get Infra to do a svn-git migration of our current flat-html?
=== DJ
IMO having the git repo as anything but "staging" will make the process 100% headed for disaster.
What is this html content?
'''
== Allow for several sources to publish html
In the new architecture each content generator publishes rendered html
directly to the site git.
=== DJ
Perhaps....
I think that something more like a maven build is more likely to be useful.
If there needs to be more than one content generator, then there should be some sort of process to build the entire site and push it to git.
Without this, no one will be able to understand how the site is built.
For Antora, it's easy to set up a gitlab ci pipeline that builds a site and publishies it on netlify.
I don't know if anything similar is possible for github, but I thnk it's worth investigating.
My idea is something like any commit results in updating a preview site.
If you like it you can push to "master", so asf shows it.
'''
The following is a rough outline of the types of content:
* Versioned documentation for a software distribution
* Community/Developer documentation
* Website front-page and "marketing" pages such as major features, benefits, etc
* Examples
* Javadoc
* Release notes and download pages
* Contributors page
=== DJ
Another possible division would be
* asciidoc content
** human written
** generated
* non-asciidoc content
** human written
** generated
A perpendicular categorization is:
* Stuff that fits well in Antoras's information architecture
* Stuff that is outside the scope of Antrora's infomration architecture.
While it's easy to regard Antora as an asciidoc processor, IMO its more important contribution is providing a consistent and easy to understand information architecture framework.
'''
=== Versioned documentation for a software distribution
All of our "product documentation" efforts to date have been in some
way wiki-like in nature. They allow any kind of content to take any
shape and do not encourage structure.
As a result our content is all miscellaneous odds and ends that do not
fit together in any significant chapters or flow. Said another way
we're all "blog" and no "book."
The proposal for this is to use Antora tied to an effort to create a
documentation outline that encourages contribution on-rails. Gaps in
the documentation should be obvious, which hopefully encourages
contribution
==== DJ
I suggest we heavily use Antora's structuring abilities, using components, modules, and topics as much as poosible.
I suggest that we drop any auto-generated TOC or navigation pages and think carefully about organization.
'''
=== Community/Developer documentation
Learning how our community works and how to contribute (be a
developer) is also an experience that really needs to be on-rails.
The proposal for this is to use Antora tied to an effort to create a
deliberately smaller outline of how to get involved.
This content should be very focused on "developer onboarding",
something all open source projects must nail to grow.
==== DJ
My idea is that the "common" component ought to be this, but right now it contains "all the old stuff".
Some of this indeed appears to be community focused, but a lot is earlier versions of what's also in the versioned component.
'''
=== Website front-page and "marketing" pages, features, etc
When people come to the website they must get a human-perfect
orientation that gives them the most important information in
highlighted form with the least clicking.
There is no proven structure for gaining someone's immediate
attention and not losing them. They need to know "why TomEE",
ideally with some pictures or video. There also needs to be
a very small handful of pages to highlight features and further
pull people in.
The proposal for this is to use the existing Jbake setup as it is
free-form and enforces no structure. These pages must be enabled to
continuously discard/reinvent (revolve vs evolve) and keep trying
different ways to get people's attention.
==== DJ
I don't know enough about what you have in mind for this content to know what to think.
There's the front page, which I think is not generated.
Otherwise, what is this content?
I'd think if people write it, it will be in asciidoc, and can be rendered with Antora.
'''
=== Examples
The examples section of the website are arguably the only truly
successful part of the site in its current form. Both the Front-page
and product-documentation parts of the site fall short of
accomplishing what they should do.
The current library of examples is 180 and growing as the #1 place
where new contributors find success contributing to TomEE. After
improvements made in Dec 2018, contributions over the next 12 months
doubled bringing in over 40 contributors all the examples.
The proposal for this is to continue the existing Jbake setup as it
has proven to be very successful for this application and more
enhancements are planned, such as:
* Adding contributors faces to each example page
* Automatically linking code to related online javadoc
* Automatically suggesting related examples
==== DJ
I'm biased, but how is the jboake setup better than the Antora setup, which I've spent about 0 time on yet?
Running everything through Antora will assure a uniform appearance and unified navigation.
'''
=== Javadoc
The current "tomee-site-generator" will clone 34 repositories and
branches across TomEE, Jakarta EE and MicroProfile to generate clean
javadoc trees of each one.
The Javadoc tree for TomEE is created taking all modules and combining
them into one tree so people get a single, fully-linked javadoc tree
and do not need to be burdened by several small modules.
The Javadoc tree for Jakarta EE is created in the same spirit,
grabbing the correct release branch of each API and version in Jakarta
EE 8 and combining it together into one fully-linked "jakartaee-8.0"
tree spanning the full platform.
The Javadoc tree for MicroProfile is created in the same spirit,
grabbing the correct release branch of each API and version in
MicroProfile 2.0 and combining it together into one fully-linked
"microprofile-2.0" tree spanning the full MicroProfile umbrella spec.
Several motivations exist to grabbing the Jakarta EE and MicroProfile
javadoc and publishing it on the TomEE site.
* Oracle will no longer publish "javaee" docs. There is no plan
current in the Jakarta EE side of the fence to publish unified
javadoc. There is an industry gap we can fill that will generate
website traffic to TomEE.
* MicroProfile does not current publish fully-combined javadoc.
There is a gab currently. We can fill this as well to provide
value to the industry and generate traffic to TomEE.
* A future plan for our examples is to link code to javadoc. Linking
to javadoc on our own site has the advantage that they never leave
the site and links are guaranteed stable.
* Reverse linking. The javadoc itself can have links to the relevant
examples that show how that class is used. This can be done having
an index of each example, what api classes it uses and then
inserting multiple `@see` links in the source prior to javadoc
generation.
The proposal is to decouple this code from the current
`tomee-site-generator` code as it is a separate concern, does take a
very long time to generate, and following the spirit of this overall
proposal should be fully independent and not be mixed in with anything
JBake-related.
==== DJ
I applaud this.
Once the javadoc is generated, which I would expect only really needs to happen for a release, there's the question of how to get it into the site.
An advantage of including it in the Antora processed content is that links into it will be checked by Antora while building the site.
However, at the moment including pregenerated javadoc is not built into Antora although it's an experiment I plan to make soon, and definitely in Antora's future.
'''
=== Release notes and download pages
The release notes and download page data at one point came entirely
from https://svn.apache.org/repos/asf/tomee/sandbox/release-tools/
When this process was working at its best, release notes and download
page entries were generated automatically as part of the release
process.
Release cadence slowed and these tools decayed due to lack of
knowledge transfer in their existence and how to maintain them.
As we increase our release cadence we have renewed need to automate
the release overhead of updating download pages and creating release
notes.
The proposal is to move this code from svn "sandbox" to a proper git
repo and employ automation techniques to cause download pages and
release notes to be automatically updated. This time not by a tool
run by the person doing the release, but by a CI job based on the same
technique we will need to automate publishing of docs or examples when
they are updated.
The automated job will run on a timer and simply check dist.apache.org
for a new release. It can also be manually triggered and re-run at any
time via the corresponding CI job.
==== DJ
I wondered where those came from, I'll have to look into this.
Another workflow would be to have the tool generate asciidoc and commit it to a repository that is an Antora site source, so site builds will automatically include it and it will have consistent appearance.
'''
=== Contributors page
We have had several attempts at maintaining a contributors page, none
of them successful.
Manual attempts only reflected some individuals. Automated attempts
were too clever and have broken over time.
The proposal is to create code to run via a CI job triggered via a git
webhook that simply screen-scrapes this page when the TomEE repo is
updated:
* https://github.com/apache/tomee/graphs/contributors
This will allow us to ensure all 98 and growing contributors are
listed and the page is updated when the contributor list changes as
PRs are merged.
In the future we can potentially do more to encourage contributors by
highlighting them on the TomEE website.
==== DJ
A better contributors page would be great!
Screen scraping that github page strikes me as exceedingly fragile.
Linking to it might be an option!
I suspect easier than screen-scraping would be extracting the info from git ourselves.
'''