This website is built using Pelican. Configure the build using the pelicanconf.py settings.
# Theme THEME = './theme/apache'
See theme template for details about this site's theme.
The Pelican environment is enhanced with plugins. Our environment has its own copy of the asf
plugins, while the pelican-build.py
script provides pelican-gfm
.
# Pelican Plugins # pelican-gfm is installed in the buildbot as part of build_pelican.py. It is an ASF Infra custom plugin. # other plugins are discoverable and can be installed via pip by mentioning them in requirements.txt # You can find plugins here: https://github.com/pelican-plugins # Plugins that are custom for this site are found in PLUGIN_PATHS. PLUGIN_PATHS = ['./theme/plugins'] PLUGINS = ['asfgenid', 'asfdata', 'pelican-gfm', 'asfreader']
asfdata.py
plugin builds a metadata model that is shared with every page.pelican-gfm
plugin reads .md, .markdown, .mkd, and .mdown files and converts the GFM Markdown into HTML.asfreader.py
plugin reads .ezmd files, injects data, translates ezt, and converts the GFM Markdown into HTML.asfgenid.py
plugin performs a number of enhancements to the HTML.See process for the steps signaled. See plugins for the Python code.
Pages and static content are stored in the same tree. Generated content is output with the same relative path, except with an html extension. These are the necessary settings.
PATH = 'content' # Save pages using full directory preservation PAGE_PATHS = ['.'] # Path with no extension PATH_METADATA = '(?P<path_no_ext>.*)\..*' # We are not slugifying any pages ARTICLE_URL = ARTICLE_SAVE_AS = PAGE_URL = PAGE_SAVE_AS = '{path_no_ext}.html' # We want to serve our static files mixed with content STATIC_PATHS = ['.'] # we want any html to be served as-is READERS = {'html': None} # ignore README.md files in the content tree and the interviews and include folders IGNORE_FILES = ['README.md','interviews','include']
Pelican uses signals as it goes through the process of reading and generating content. Pages are processed in no particular order. Our plugins provide the following activity:
Pelican Signal | Step | GFM Content | EZMD Content | Description |
---|---|---|---|---|
Initialization | Data Model | Read data sources | ||
Reader | Class | GFMReader | ASFReader(GFMReader) | Pelican Reader class |
Read | read_source | super.read_source | read page source and metadata | |
Model Metadata | add_data | add asf data to the model and expand any [{ reference }] | ||
Translate | ezt | ezt template translation | ||
Render GFM | render | super.render | render GFM/HTML into HTML | |
Content | Generate ID | generate_id | generate_id | Perform ASF specific HTML enhancements |
Generator | Template | translate | translate | Create output HTML by pushing the generated content and metadata through the theme's templates |
See local builds for how to install Pelican ASF on your system.
A shared metadata model is used by ezmd templates to generate content. There are three types of data:
When refereced | Data Type |
---|---|
EZMD Reader, Content, Generator | Constants - either integer or string values |
EZMD Reader | Sequences - arrays of objects with attributes where an attribute may be another sequence |
EZMD Reader | Dictionaries - key-value maps where the value may be another dictionary |
The constants are also available to the asfgenid.py
plugin and the theme's templates.
There are examples of how to inject shared metadata below. See metadata model for how asfdata.py
works to populate the shared metadata.
The read_source
method is used to open a file and convert it into a metadata dictionary and text.
Example:
Title: ASF Export Classifications and Source Links license: https://www.apache.org/licenses/LICENSE-2.0 asf_headings: False #### ASF Project ...
The first three lines specify three metadata
key-value pairs. There is a blank line and the rest is the text
.
Code from pelican-gfm
with some parts elided.
def read_source(self, source_path): "Read metadata and content from the source." ... # Fetch the source content, with a few appropriate tweaks with pelican.utils.pelican_open(source_path) as text: # Extract the metadata from the header of the text lines = text.splitlines() for i in range(len(lines)): line = lines[i] match = GFMReader.RE_METADATA.match(line) if match: name = match.group(1).strip().lower() ... metadata[name] = value elif not line.strip(): # blank line continue else: # reached actual content break ... # Reassemble content, minus the metadata text = '\n'.join(lines[i:]) return text, metadata
In asfreader.py
we extend EZT syntax to do metadata substitution prior to EZT translation. This allows for a more natural and direct representation than with EZT sequences.
| | | | |-----------|-----------|-------------| | [{ board[0].name }] | [{ board[1].name }] | [{ board[2].name }] | | [{ board[3].name }] | [{ board[4].name }] | [{ board[5].name }] | | [{ board[6].name }] | [{ board[7].name }] | [{ board[8].name }] |
| Office | Individual | |-----------|-------------| | Board Chair | [{ ci[boardchair][roster] }] | | Vice Chair | [{ ci[vicechair][roster] }] | | President | [{ ci[president][roster] }] | | Exec. V.P | [{ ci[execvp][roster] }] | | [[]Treasurer](https://treasurer.apache.org/) | [{ ci[treasurer][roster] }] | | Assistant Treasurer | [{ ci[assistanttreasurer][roster] }] | | Secretary | [{ ci[secretary][roster] }] | | Assistant Secretary | [{ ci[assistantsecretary][roster] }] | | V.P., [[]Legal Affairs](/legal/) | [{ ci[legal][chair] }] | | Assistant V.P., [[]Legal Affairs](/legal/) | [{ ci[assistantvplegalaffairs][roster] }] |
- All volunteer community - [{ code_lines }]+ lines of code in stewardship - [{ code_changed }]+ lines of code changed - [{ code_commits }]+ code commits - [{ asf_members }] individual ASF Members - [{ asf_committers }]+ Apache Committers - [{ asf_contributors }]+ code contributors - [{ asf_people }]+ people involved in our communities
The asfreader.py
plugin is responsible for reading the source, adding metadata, ezt translation, and rendering GFM
def add_data(self, text, metadata): "Mix in ASF data as metadata" asf_metadata = self.settings.get('ASF_DATA', { }).get('metadata') if asf_metadata: metadata.update(asf_metadata) # insert any direct references m = 1 while m: m = METADATA_RE.search(text) if m: this_data = m.group(1).strip() format_string = '{{{0}}}'.format(this_data) try: new_string = format_string.format(**metadata) print(f'{{{{{m.group(1)}}}}} -> {new_string}') except Exception: # the data expression was not found new_string = format_string print(f'{{{{{m.group(1)}}}}} is not found') text = re.sub(METADATA_RE, new_string, text, count=1) return text, metadata
ezmd Pages files are ezt templates that create Markdown and HTML output. See EZT Syntax for the directives.
Project list:
| Office | Individual | |-----------|-------------|[for projects] | V.P., [if-any projects.site][[][end]Apache [projects.display_name][if-any projects.site]]([projects.site])[end] | [projects.chair] |[end]
Featured projects:
[for featured_projs]<li [if-index featured_projs first]class="active"[end]> <a href="#[featured_projs.key_id]" data-toggle="tab">[featured_projs.display_name]</a> </li>[end]
Insert a file as is into the output:
Title: Apache Download Mirrors [insertfile "include/closer.ezt"]
Code from asfreader.py
# prepare text as an ezt template # compress_whitespace=0 is required as blank lines and indentation have meaning in markdown template = ezt.Template(compress_whitespace=0) reader = ASFTemplateReader(source_path, text) template.parse(reader, base_format=ezt.FORMAT_HTML) assert template # generate content from ezt template with metadata fp = io.StringIO() template.generate(fp, metadata)
Content is in GitHub Flavored Markdown (GFM).
The site uses a version of cmark-gfm by GitHub through the pelican-gfm plugin created by Apache Infra.
Detailed Specification with many examples
Some differences from markdown.pl
used in the Apache CMS.
style
, pre
, and script
.Disallowed html the tagfilter extension disables certain html. The asfgenid plugin reenables script
, style
, and iframe
html.
The pelican-gfm
plugin reads the content file and renders it to HTML.
From asfreader.py
:
# Render the markdown into HTML content = super().render(fp.getvalue().encode('utf-8')).decode('utf-8') assert content
From pelican-gfm
:
def render(self, text): "Use cmark-gfm to render the Markdown into an HTML fragment." parser = F_cmark_parser_new(OPTS) assert parser for name in EXTENSIONS: ext = F_cmark_find_syntax_extension(name.encode('utf-8')) assert ext rv = F_cmark_parser_attach_syntax_extension(parser, ext) assert rv exts = F_cmark_parser_get_syntax_extensions(parser) F_cmark_parser_feed(parser, text, len(text)) doc = F_cmark_parser_finish(parser) assert doc output = F_cmark_render_html(doc, OPTS, exts) F_cmark_parser_free(parser) F_cmark_node_free(doc) return output
We use the asfgenid
plugin to perform modifications on the generated content that mimics the markdown extensions in the Apache CMS. Many of these ASF-specific enhancements are controlled in pelican settings in the ASF_GENID
dictionary.
ASF_GENID key | default | process | page override |
---|---|---|---|
- | - | fix up some HTML tags that the GFM autofilter extension marks as unsafe | |
- | - | convert HTML into beautiful soup | |
metadata | True | {{ metadata }} include data in the HTML | |
- | True | inventory of all ID attributes; duplicates are invalid | |
elements | True | find all {#id} and {.class} texts and assign attributes | |
headings | True | assign IDs to all headings w/o IDs already present or assigned with {#id} text | asf_headings |
headings_re | r'^h[1-6]' | regex for finding headings that require IDs | |
tables | True | tables with a class attribute are assgned class=table | |
toc | True | generate a table of contents if [TOC] is found. If this is set to False then the toc.py plugin may used. | |
toc_headers | r'h[1-6]' | headings to include in the [TOC] | |
- | - | convert beautiful soup back into HTML |
# Configure the asfgenid plugin ASF_GENID = { 'metadata': True, 'elements': True, 'headings': True, 'headings_re': r'^h[1-4]', 'permalinks': True, 'toc': True, 'toc_headers': r"h[1-4]", 'tables': True, 'debug': False }
Set the heading id and permalink to #what
## What is the Apache Software Foundation? {#what} The Apache Software Foundation (ASF) is a non-profit 501(c)(3) corporation, incorporated in Delaware, USA, in June of 1999. The ASF is a natural outgrowth of The Apache Group, which formed in 1995 to develop the Apache HTTP Server.
Set the class to display an image to float-right
![Logo](images/logo.svg) {.float-right}
An HTML fragment is also feasible for a similar purpose
<div class=".pull-right" style="float:right; border-style:dotted; width:200px; padding:5px; margin:5px"> SEE INSTEAD: [Trademark Resources Site Map][resources]. </div>
Code from asfgenid.py
uses BeautifulSoup 4 to manipulate the rendered HTML. Here is an example
# from Apache CMS markdown/extensions/headerid.py - slugify in the same way as the Apache CMS def slugify(value, separator): """ Slugify a string, to make it URL friendly. """ value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore') value = re.sub('[^\\w\\s-]', '', value.decode('ascii')).strip().lower() return re.sub('[%s\\s]+' % separator, separator, value) ... # append a permalink def permalink(soup, mod_element): new_tag = soup.new_tag('a', href='#' + mod_element['id']) new_tag['class'] = 'headerlink' new_tag['title'] = 'Permalink' new_tag.string = LINK_CHAR mod_element.append(new_tag) ... # generate id for a heading def headingid_transform(ids, soup, tag, permalinks, perma_set): new_string = tag.string if not new_string: # roll up strings if no immediate string new_string = tag.find_all( text=lambda t: not isinstance(t, Comment), recursive=True) new_string = ''.join(new_string) # don't have an id create it from text new_id = slugify(new_string, '-') tag['id'] = unique(new_id, ids) if permalinks: permalink(soup, tag) # inform if there is a duplicate permalink unique(tag['id'], perma_set) ... # step 6 - find all headings w/o ids already present or assigned with {#id} text if asf_headings == 'True': if asf_genid['debug']: print(f'headings: {content.relative_source_path}') # Find heading tags HEADING_RE = re.compile(asf_genid['headings_re']) for tag in soup.findAll(HEADING_RE, id=False): headingid_transform(ids, soup, tag, asf_genid['permalinks'], permalinks)