WEBSITE-2020.adoc - tomee-site-generator - Git at Google

 = Proposal: Website 2020


 This will hopefully serve as the documentation for the website once/if executed.

 High-level plan

 * Kill all use and trace of the Apache CMS
 * Publish html directly to git
 * Allow for several sources to publish html

 The result will be several sources, that can be run and managed
 independently, feeding content into the git repo housing our live html
 website.

 This is a pragmatic perspective that sets us up to get a best-of-breed
 outcome acknowledging trends in all our website endevors:

 * All tools we've used have been heavily extended
 * Content takes a hit each tool change
 * All tools have limitations (strenghts/weaknesses)
 * Filling gaps involves extensions (bullet one)
 * Tools last on average 2-5 years
 * Many types of content actually exist: javadoc, release notes, download pages
 * We will always be in a hybrid situation

 Think of it as "microservices for content" and avoiding a monolith.

 Ideally this sets us up to acknowledge and embrace evolving our
 website tech without many of the above disadvantages.  If we have a
 clean CSS and simple menu, we should be able to take HTML from
 anywhere.

 When we want to add a new content source we do not need to figure out
 how to get it to work "through" the existing generator or redo
 everything that already works, we simply have it generate content
 directly to html directly to our site git.

 As long as we maintain a common CSS and look and feel, we're good.

 == DJ
 That's one point of view.
 I agree with a lot of it, but I'm worried...
 Perhaps I can characterize it as thinking that more options are better.
 However, nothing stopped anyone from migrating all the content from CMS to an external .md based system, or from .md to .adoc.
 But no one did it.
 Instead, people piled system on top of system, duplicating content in a partially intelligable state, and reducing the amount of organization and coherence at every step.
 I think reasonable priorities are:

 * as few systems involved in the website as possible
 * avoid extensions and writing our own systems, we don't have time to maintain them let alone document them.
 * organize the input to the website generation process.
 * discourage adding more systems.

 My most profound teacher said to pay attention to the balance between structure and behavior.
 We need simple and fairly rigid structure here so behavior can go towards improving the documentation rather than wondering how or why.

 I'm very worried about having more than one process push content to the git "website" repo.
 Right now, we effectively have that process with the svn repo, and there's some content that seems to have been added and abandoned.
 No one will tell me about it, including the person who put it there.

 As a result, a requirement in my mind is that publishing the website needs to start with deleting everything previously there.
 The next step is adding the entire current content.

 Also, "publishing straight to git" actually means committing to a local repository (with tooling this could be bare).
 This gives a preview opportunity, followed by pushing to the apache repo.

 '''

 == Kill	all use	and trace of the Apache	CMS
 TODO

 === DJ

 I think that my antora preview is 99% there.
 Having a second opinion would be valuable, but that could be a time consuming endeavor.

 '''

 == Publish html directly to git

 Apache allows a project to designate a git repository as their
 "website."  All files in that repository are published as-is to the
 internet as the project's website.  HTML must be committed to this
 repository as it does not offer any generation of any kind.

 TODO: what is the process for getting one of these repos?

 TODO: can we get Infra to do a svn-git migration of our current flat-html?

 === DJ

 IMO having the git repo as anything but "staging" will make the process 100% headed for disaster.
 What is this html content?

 '''

 == Allow for several sources to publish html

 In the new architecture each content generator publishes rendered html
 directly to the site git.

 === DJ
 Perhaps....
 I think that something more like a maven build is more likely to be useful.
 If there needs to be more than one content generator, then there should be some sort of process to build the entire site and push it to git.
 Without this, no one will be able to understand how the site is built.

 For Antora, it's easy to set up a gitlab ci pipeline that builds a site and publishies it on netlify.
 I don't know if anything similar is possible for github, but I thnk it's worth investigating.
 My idea is something like any commit results in updating a preview site.
 If you like it you can push to "master", so asf shows it.

 '''

 The following is a rough outline of the types of content:

 * Versioned documentation for a software distribution
 * Community/Developer documentation
 * Website front-page and "marketing" pages such as major features, benefits, etc
 * Examples
 * Javadoc
 * Release notes and download pages
 * Contributors page

 === DJ
 Another possible division would be

 * asciidoc content
 ** human written
 ** generated
 * non-asciidoc content
 ** human written
 ** generated

 A perpendicular categorization is:

 * Stuff that fits well in Antoras's information architecture
 * Stuff that is outside the scope of Antrora's infomration architecture.

 While it's easy to regard Antora as an asciidoc processor, IMO its more important contribution is providing a consistent and easy to understand information architecture framework.


 '''

 === Versioned documentation for a software distribution

 All of our "product documentation" efforts to date have been in some
 way wiki-like in nature.  They allow any kind of content to take any
 shape and do not encourage structure.

 As a result our content is all miscellaneous odds and ends that do not
 fit together in any significant chapters or flow.  Said another way
 we're all "blog" and no "book."

 The proposal for this is to use Antora tied to an effort to create a
 documentation outline that encourages contribution on-rails. Gaps in
 the documentation should be obvious, which hopefully encourages
 contribution

 ==== DJ
 I suggest we heavily use Antora's structuring abilities, using components, modules, and topics as much as poosible.
 I suggest that we drop any auto-generated TOC or navigation pages and think carefully about organization.

 '''

 === Community/Developer documentation

 Learning how our community works and how to contribute (be a
 developer) is also an experience that really needs to be on-rails.

 The proposal for this is to use Antora tied to an effort to create a
 deliberately smaller outline of how to get involved.

 This content should be very focused on "developer onboarding",
 something all open source projects must nail to grow.

 ==== DJ
 My idea is that the "common" component ought to be this, but right now it contains "all the old stuff".
 Some of this indeed appears to be community focused, but a lot is earlier versions of what's also in the versioned component.

 '''
 === Website front-page and "marketing" pages, features, etc

 When people come to the website they must get a human-perfect
 orientation that gives them the most important information in
 highlighted form with the least clicking.

 There is no proven structure for gaining someone's immediate
 attention and not losing them.  They need to know "why TomEE",
 ideally with some pictures or video.  There also needs to be
 a very small handful of pages to highlight features and further
 pull people in.

 The proposal for this is to use the existing Jbake setup as it is
 free-form and enforces no structure.  These pages must be enabled to
 continuously discard/reinvent (revolve vs evolve) and keep trying
 different ways to get people's attention.

 ==== DJ
 I don't know enough about what you have in mind for this content to know what to think.
 There's the front page, which I think is not generated.
 Otherwise, what is this content?
 I'd think if people write it, it will be in asciidoc, and can be rendered with Antora.

 '''

 === Examples

 The examples section of the website are arguably the only truly
 successful part of the site in its current form.  Both the Front-page
 and product-documentation parts of the site fall short of
 accomplishing what they should do.

 The current library of examples is 180 and growing as the #1 place
 where new contributors find success contributing to TomEE.  After
 improvements made in Dec 2018, contributions over the next 12 months
 doubled bringing in over 40 contributors all the examples.

 The proposal for this is to continue the existing Jbake setup as it
 has proven to be very successful for this application and more
 enhancements are planned, such as:

 * Adding contributors faces to each example page
 * Automatically linking code to related online javadoc
 * Automatically suggesting related examples

 ==== DJ
 I'm biased, but how is the jboake setup better than the Antora setup, which I've spent about 0 time on yet?
 Running everything through Antora will assure a uniform appearance and unified navigation.

 '''
 === Javadoc

 The current "tomee-site-generator" will clone 34 repositories and
 branches across TomEE, Jakarta EE and MicroProfile to generate clean
 javadoc trees of each one.

 The Javadoc tree for TomEE is created taking all modules and combining
 them into one tree so people get a single, fully-linked javadoc tree
 and do not need to be burdened by several small modules.

 The Javadoc tree for Jakarta EE is created in the same spirit,
 grabbing the correct release branch of each API and version in Jakarta
 EE 8 and combining it together into one fully-linked "jakartaee-8.0"
 tree spanning the full platform.

 The Javadoc tree for MicroProfile is created in the same spirit,
 grabbing the correct release branch of each API and version in
 MicroProfile 2.0 and combining it together into one fully-linked
 "microprofile-2.0" tree spanning the full MicroProfile umbrella spec.

 Several motivations exist to grabbing the Jakarta EE and MicroProfile
 javadoc and publishing it on the TomEE site.

 * Oracle will no longer publish "javaee" docs.  There is no plan
    current in the Jakarta EE side of the fence to publish unified
    javadoc. There is an industry gap we can fill that will generate
    website traffic to TomEE.
 * MicroProfile does not current publish fully-combined javadoc.
    There is a gab currently.  We can fill this as well to provide
    value to the industry and generate traffic to TomEE.
 * A future plan for our examples is to link code to javadoc.  Linking
    to javadoc on our own site has the advantage that they never leave
    the site and links are guaranteed stable.
 * Reverse linking.  The javadoc itself can have links to the relevant
    examples that show how that class is used.  This can be done having
    an index of each example, what api classes it uses and then
    inserting multiple `@see` links in the source prior to javadoc
    generation.

 The proposal is to decouple this code from the current
 `tomee-site-generator` code as it is a separate concern, does take a
 very long time to generate, and following the spirit of this overall
 proposal should be fully independent and not be mixed in with anything
 JBake-related.

 ==== DJ
 I applaud this.
 Once the javadoc is generated, which I would expect only really needs to happen for a release, there's the question of how to get it into the site.
 An advantage of including it in the Antora processed content is that links into it will be checked by Antora while building the site.
 However, at the moment including pregenerated javadoc is not built into Antora although it's an experiment I plan to make soon, and definitely in Antora's future.

 '''

 === Release notes and download pages

 The release notes and download page data at one point came entirely
 from https://svn.apache.org/repos/asf/tomee/sandbox/release-tools/

 When this process was working at its best, release notes and download
 page entries were generated automatically as part of the release
 process.

 Release cadence slowed and these tools decayed due to lack of
 knowledge transfer in their existence and how to maintain them.

 As we increase our release cadence we have renewed need to automate
 the release overhead of updating download pages and creating release
 notes.

 The proposal is to move this code from svn "sandbox" to a proper git
 repo and employ automation techniques to cause download pages and
 release notes to be automatically updated.  This time not by a tool
 run by the person doing the release, but by a CI job based on the same
 technique we will need to automate publishing of docs or examples when
 they are updated.

 The automated job will run on a timer and simply check dist.apache.org
 for a new release.  It can also be manually triggered and re-run at any
 time via the corresponding CI job.

 ==== DJ
 I wondered where those came from, I'll have to look into this.
 Another workflow would be to have the tool generate asciidoc and commit it to a repository that is an Antora site source, so site builds will automatically include it and it will have consistent appearance.

 '''
 === Contributors page

 We have had several attempts at maintaining a contributors page, none
 of them successful.

 Manual attempts only reflected some individuals.  Automated attempts
 were too clever and have broken over time.

 The proposal is to create code to run via a CI job triggered via a git
 webhook that simply screen-scrapes this page when the TomEE repo is
 updated:

 * https://github.com/apache/tomee/graphs/contributors

 This will allow us to ensure all 98 and growing contributors are
 listed and the page is updated when the contributor list changes as
 PRs are merged.

 In the future we can potentially do more to encourage contributors by
 highlighting them on the TomEE website.

 ==== DJ
 A better contributors page would be great!
 Screen scraping that github page strikes me as exceedingly fragile.
 Linking to it might be an option!
 I suspect easier than screen-scraping would be extracting the info from git ourselves.

 '''
	= Proposal: Website 2020


	This will hopefully serve as the documentation for the website once/if executed.

	High-level plan

	* Kill all use and trace of the Apache CMS
	* Publish html directly to git
	* Allow for several sources to publish html

	The result will be several sources, that can be run and managed
	independently, feeding content into the git repo housing our live html
	website.

	This is a pragmatic perspective that sets us up to get a best-of-breed
	outcome acknowledging trends in all our website endevors:

	* All tools we've used have been heavily extended
	* Content takes a hit each tool change
	* All tools have limitations (strenghts/weaknesses)
	* Filling gaps involves extensions (bullet one)
	* Tools last on average 2-5 years
	* Many types of content actually exist: javadoc, release notes, download pages
	* We will always be in a hybrid situation

	Think of it as "microservices for content" and avoiding a monolith.

	Ideally this sets us up to acknowledge and embrace evolving our
	website tech without many of the above disadvantages. If we have a
	clean CSS and simple menu, we should be able to take HTML from
	anywhere.

	When we want to add a new content source we do not need to figure out
	how to get it to work "through" the existing generator or redo
	everything that already works, we simply have it generate content
	directly to html directly to our site git.

	As long as we maintain a common CSS and look and feel, we're good.

	== DJ
	That's one point of view.
	I agree with a lot of it, but I'm worried...
	Perhaps I can characterize it as thinking that more options are better.
	However, nothing stopped anyone from migrating all the content from CMS to an external .md based system, or from .md to .adoc.
	But no one did it.
	Instead, people piled system on top of system, duplicating content in a partially intelligable state, and reducing the amount of organization and coherence at every step.
	I think reasonable priorities are:

	* as few systems involved in the website as possible
	* avoid extensions and writing our own systems, we don't have time to maintain them let alone document them.
	* organize the input to the website generation process.
	* discourage adding more systems.

	My most profound teacher said to pay attention to the balance between structure and behavior.
	We need simple and fairly rigid structure here so behavior can go towards improving the documentation rather than wondering how or why.

	I'm very worried about having more than one process push content to the git "website" repo.
	Right now, we effectively have that process with the svn repo, and there's some content that seems to have been added and abandoned.
	No one will tell me about it, including the person who put it there.

	As a result, a requirement in my mind is that publishing the website needs to start with deleting everything previously there.
	The next step is adding the entire current content.

	Also, "publishing straight to git" actually means committing to a local repository (with tooling this could be bare).
	This gives a preview opportunity, followed by pushing to the apache repo.

	'''

	== Kill all use and trace of the Apache CMS
	TODO

	=== DJ

	I think that my antora preview is 99% there.
	Having a second opinion would be valuable, but that could be a time consuming endeavor.

	'''

	== Publish html directly to git

	Apache allows a project to designate a git repository as their
	"website." All files in that repository are published as-is to the
	internet as the project's website. HTML must be committed to this
	repository as it does not offer any generation of any kind.

	TODO: what is the process for getting one of these repos?

	TODO: can we get Infra to do a svn-git migration of our current flat-html?

	=== DJ

	IMO having the git repo as anything but "staging" will make the process 100% headed for disaster.
	What is this html content?

	'''

	== Allow for several sources to publish html

	In the new architecture each content generator publishes rendered html
	directly to the site git.

	=== DJ
	Perhaps....
	I think that something more like a maven build is more likely to be useful.
	If there needs to be more than one content generator, then there should be some sort of process to build the entire site and push it to git.
	Without this, no one will be able to understand how the site is built.

	For Antora, it's easy to set up a gitlab ci pipeline that builds a site and publishies it on netlify.
	I don't know if anything similar is possible for github, but I thnk it's worth investigating.
	My idea is something like any commit results in updating a preview site.
	If you like it you can push to "master", so asf shows it.

	'''

	The following is a rough outline of the types of content:

	* Versioned documentation for a software distribution
	* Community/Developer documentation
	* Website front-page and "marketing" pages such as major features, benefits, etc
	* Examples
	* Javadoc
	* Release notes and download pages
	* Contributors page

	=== DJ
	Another possible division would be

	* asciidoc content
	** human written
	** generated
	* non-asciidoc content
	** human written
	** generated

	A perpendicular categorization is:

	* Stuff that fits well in Antoras's information architecture
	* Stuff that is outside the scope of Antrora's infomration architecture.

	While it's easy to regard Antora as an asciidoc processor, IMO its more important contribution is providing a consistent and easy to understand information architecture framework.


	'''

	=== Versioned documentation for a software distribution

	All of our "product documentation" efforts to date have been in some
	way wiki-like in nature. They allow any kind of content to take any
	shape and do not encourage structure.

	As a result our content is all miscellaneous odds and ends that do not
	fit together in any significant chapters or flow. Said another way
	we're all "blog" and no "book."

	The proposal for this is to use Antora tied to an effort to create a
	documentation outline that encourages contribution on-rails. Gaps in
	the documentation should be obvious, which hopefully encourages
	contribution

	==== DJ
	I suggest we heavily use Antora's structuring abilities, using components, modules, and topics as much as poosible.
	I suggest that we drop any auto-generated TOC or navigation pages and think carefully about organization.

	'''

	=== Community/Developer documentation

	Learning how our community works and how to contribute (be a
	developer) is also an experience that really needs to be on-rails.

	The proposal for this is to use Antora tied to an effort to create a
	deliberately smaller outline of how to get involved.

	This content should be very focused on "developer onboarding",
	something all open source projects must nail to grow.

	==== DJ
	My idea is that the "common" component ought to be this, but right now it contains "all the old stuff".
	Some of this indeed appears to be community focused, but a lot is earlier versions of what's also in the versioned component.

	'''
	=== Website front-page and "marketing" pages, features, etc

	When people come to the website they must get a human-perfect
	orientation that gives them the most important information in
	highlighted form with the least clicking.

	There is no proven structure for gaining someone's immediate
	attention and not losing them. They need to know "why TomEE",
	ideally with some pictures or video. There also needs to be
	a very small handful of pages to highlight features and further
	pull people in.

	The proposal for this is to use the existing Jbake setup as it is
	free-form and enforces no structure. These pages must be enabled to
	continuously discard/reinvent (revolve vs evolve) and keep trying
	different ways to get people's attention.

	==== DJ
	I don't know enough about what you have in mind for this content to know what to think.
	There's the front page, which I think is not generated.
	Otherwise, what is this content?
	I'd think if people write it, it will be in asciidoc, and can be rendered with Antora.

	'''

	=== Examples

	The examples section of the website are arguably the only truly
	successful part of the site in its current form. Both the Front-page
	and product-documentation parts of the site fall short of
	accomplishing what they should do.

	The current library of examples is 180 and growing as the #1 place
	where new contributors find success contributing to TomEE. After
	improvements made in Dec 2018, contributions over the next 12 months
	doubled bringing in over 40 contributors all the examples.

	The proposal for this is to continue the existing Jbake setup as it
	has proven to be very successful for this application and more
	enhancements are planned, such as:

	* Adding contributors faces to each example page
	* Automatically linking code to related online javadoc
	* Automatically suggesting related examples

	==== DJ
	I'm biased, but how is the jboake setup better than the Antora setup, which I've spent about 0 time on yet?
	Running everything through Antora will assure a uniform appearance and unified navigation.

	'''
	=== Javadoc

	The current "tomee-site-generator" will clone 34 repositories and
	branches across TomEE, Jakarta EE and MicroProfile to generate clean
	javadoc trees of each one.

	The Javadoc tree for TomEE is created taking all modules and combining
	them into one tree so people get a single, fully-linked javadoc tree
	and do not need to be burdened by several small modules.

	The Javadoc tree for Jakarta EE is created in the same spirit,
	grabbing the correct release branch of each API and version in Jakarta
	EE 8 and combining it together into one fully-linked "jakartaee-8.0"
	tree spanning the full platform.

	The Javadoc tree for MicroProfile is created in the same spirit,
	grabbing the correct release branch of each API and version in
	MicroProfile 2.0 and combining it together into one fully-linked
	"microprofile-2.0" tree spanning the full MicroProfile umbrella spec.

	Several motivations exist to grabbing the Jakarta EE and MicroProfile
	javadoc and publishing it on the TomEE site.

	* Oracle will no longer publish "javaee" docs. There is no plan
	current in the Jakarta EE side of the fence to publish unified
	javadoc. There is an industry gap we can fill that will generate
	website traffic to TomEE.
	* MicroProfile does not current publish fully-combined javadoc.
	There is a gab currently. We can fill this as well to provide
	value to the industry and generate traffic to TomEE.
	* A future plan for our examples is to link code to javadoc. Linking
	to javadoc on our own site has the advantage that they never leave
	the site and links are guaranteed stable.
	* Reverse linking. The javadoc itself can have links to the relevant
	examples that show how that class is used. This can be done having
	an index of each example, what api classes it uses and then
	inserting multiple `@see` links in the source prior to javadoc
	generation.

	The proposal is to decouple this code from the current
	`tomee-site-generator` code as it is a separate concern, does take a
	very long time to generate, and following the spirit of this overall
	proposal should be fully independent and not be mixed in with anything
	JBake-related.

	==== DJ
	I applaud this.
	Once the javadoc is generated, which I would expect only really needs to happen for a release, there's the question of how to get it into the site.
	An advantage of including it in the Antora processed content is that links into it will be checked by Antora while building the site.
	However, at the moment including pregenerated javadoc is not built into Antora although it's an experiment I plan to make soon, and definitely in Antora's future.

	'''

	=== Release notes and download pages

	The release notes and download page data at one point came entirely
	from https://svn.apache.org/repos/asf/tomee/sandbox/release-tools/

	When this process was working at its best, release notes and download
	page entries were generated automatically as part of the release
	process.

	Release cadence slowed and these tools decayed due to lack of
	knowledge transfer in their existence and how to maintain them.

	As we increase our release cadence we have renewed need to automate
	the release overhead of updating download pages and creating release
	notes.

	The proposal is to move this code from svn "sandbox" to a proper git
	repo and employ automation techniques to cause download pages and
	release notes to be automatically updated. This time not by a tool
	run by the person doing the release, but by a CI job based on the same
	technique we will need to automate publishing of docs or examples when
	they are updated.

	The automated job will run on a timer and simply check dist.apache.org
	for a new release. It can also be manually triggered and re-run at any
	time via the corresponding CI job.

	==== DJ
	I wondered where those came from, I'll have to look into this.
	Another workflow would be to have the tool generate asciidoc and commit it to a repository that is an Antora site source, so site builds will automatically include it and it will have consistent appearance.

	'''
	=== Contributors page

	We have had several attempts at maintaining a contributors page, none
	of them successful.

	Manual attempts only reflected some individuals. Automated attempts
	were too clever and have broken over time.

	The proposal is to create code to run via a CI job triggered via a git
	webhook that simply screen-scrapes this page when the TomEE repo is
	updated:

	* https://github.com/apache/tomee/graphs/contributors

	This will allow us to ensure all 98 and growing contributors are
	listed and the page is updated when the contributor list changes as
	PRs are merged.

	In the future we can potentially do more to encourage contributors by
	highlighting them on the TomEE website.

	==== DJ
	A better contributors page would be great!
	Screen scraping that github page strikes me as exceedingly fragile.
	Linking to it might be an option!
	I suspect easier than screen-scraping would be extracting the info from git ourselves.

	'''