Parser for unified. Parses markdown to mdast syntax trees. Used in the remark processor but can be used on its own as well. Can be extended to change how markdown is parsed.
Announcing the unified collective! 🎉 Read more about it on Medium »
npm:
npm install remark-parse
var unified = require('unified') var createStream = require('unified-stream') var markdown = require('remark-parse') var html = require('remark-html') var processor = unified() .use(markdown, {commonmark: true}) .use(html) process.stdin.pipe(createStream(processor)).pipe(process.stdout)
processor.use(parse[, options])
Configure the processor
to read markdown as input and process mdast syntax trees.
options
Options are passed directly, or passed later through processor.data()
.
options.gfm
hello ~~hi~~ world
GFM mode (boolean
, default: true
) turns on:
options.commonmark
This is a paragraph and this is also part of the preceding paragraph.
CommonMark mode (boolean
, default: false
) allows:
(
and )
) around for link and image titles)
) as an ordered list markerCommonMark mode disallows:
# Hash headings
) without spacing after opening hashes or and before closing hashesUnderline headings\n---
) when following a paragraph<
and >
)>
), for lists, code, and thematicBreakoptions.footnotes
Something something[^or something?]. And something else[^1]. [^1]: This reference footnote contains a paragraph... * ...and a list
Footnotes mode (boolean
, default: false
) enables reference footnotes and inline footnotes. Both are wrapped in square brackets and preceded by a caret (^
), and can be referenced from inside other footnotes.
options.blocks
<block>foo </block>
Blocks (Array.<string>
, default: list of block HTML elements) exposes let’s users define block-level HTML elements.
options.pedantic
Check out some_file_name.txt
Pedantic mode (boolean
, default: false
) turns on:
_alpha_
) and importance (__bravo__
) with underscores in words*
, -
, +
)commonmark
is also turned on, ordered lists with different markers (.
, )
)parse.Parser
Access to the parser, if you need it.
Most often, using transformers to manipulate a syntax tree produces the desired output. Sometimes, mainly when introducing new syntactic entities with a certain level of precedence, interfacing with the parser is necessary.
If the remark-parse
plugin is used, it adds a Parser
constructor to the processor
. Other plugins can add tokenizers to the parser’s prototype to change how markdown is parsed.
The below plugin adds a tokenizer for at-mentions.
module.exports = mentions function mentions() { var Parser = this.Parser var tokenizers = Parser.prototype.inlineTokenizers var methods = Parser.prototype.inlineMethods // Add an inline tokenizer (defined in the following example). tokenizers.mention = tokenizeMention // Run it just before `text`. methods.splice(methods.indexOf('text'), 0, 'mention') }
Parser#blockTokenizers
An object mapping tokenizer names to tokenizers. These tokenizers (for example: fencedCode
, table
, and paragraph
) eat from the start of a value to a line ending.
See #blockMethods
below for a list of methods that are included by default.
Parser#blockMethods
Array of blockTokenizers
names (string
) specifying the order in which they run.
newline
indentedCode
fencedCode
blockquote
atxHeading
thematicBreak
list
setextHeading
html
footnote
definition
table
paragraph
Parser#inlineTokenizers
An object mapping tokenizer names to tokenizers. These tokenizers (for example: url
, reference
, and emphasis
) eat from the start of a value. To increase performance, they depend on locators.
See #inlineMethods
below for a list of methods that are included by default.
Parser#inlineMethods
Array of inlineTokenizers
names (string
) specifying the order in which they run.
escape
autoLink
url
html
link
reference
strong
emphasis
deletion
code
break
text
function tokenizer(eat, value, silent)
tokenizeMention.notInLink = true tokenizeMention.locator = locateMention function tokenizeMention(eat, value, silent) { var match = /^@(\w+)/.exec(value) if (match) { if (silent) { return true } return eat(match[0])({ type: 'link', url: 'https://social-network/' + match[1], children: [{type: 'text', value: match[0]}] }) } }
The parser knows two types of tokenizers: block level and inline level. Block level tokenizers are the same as inline level tokenizers, with the exception that the latter must have a locator.
Tokenizers test whether a document starts with a certain syntactic entity. In silent mode, they return whether that test passes. In normal mode, they consume that token, a process which is called “eating”. Locators enable tokenizers to function faster by providing information on where the next entity may occur.
Node? = tokenizer(eat, value)
boolean? = tokenizer(eat, value, silent)
eat
(Function
) — Eat, when applicable, an entityvalue
(string
) — Value which may start an entitysilent
(boolean
, optional) — Whether to detect or consumelocator
(Function
) — Required for inline tokenizersonlyAtStart
(boolean
) — Whether nodes can only be found at the beginning of the documentnotInBlock
(boolean
) — Whether nodes cannot be in blockquotes, lists, or footnote definitionsnotInList
(boolean
) — Whether nodes cannot be in listsnotInLink
(boolean
) — Whether nodes cannot be in linksvalue
value
tokenizer.locator(value, fromIndex)
function locateMention(value, fromIndex) { return value.indexOf('@', fromIndex) }
Locators are required for inline tokenization to keep the process performant. Locators enable inline tokenizers to function faster by providing information on the where the next entity occurs. Locators may be wrong, it’s OK if there actually isn’t a node to be found at the index they return, but they must skip any nodes.
value
(string
) — Value which may contain an entityfromIndex
(number
) — Position to start searching atIndex at which an entity may start, and -1
otherwise.
eat(subvalue)
var add = eat('foo')
Eat subvalue
, which is a string at the start of the tokenized value
(it’s tracked to ensure the correct value is eaten).
subvalue
(string
) - Value to eat.add
.
add(node[, parent])
var add = eat('foo') add({type: 'text', value: 'foo'})
Add positional information to node
and add it to parent
.
node
(Node
) - Node to patch position on and insertparent
(Node
, optional) - Place to add node
to in the syntax tree. Defaults to the currently processed nodeThe given node
.
add.test()
Get the positional information which would be patched on node
by add
.
add.reset(node[, parent])
add
, but resets the internal location. Useful for example in lists, where the same content is first eaten for a list, and later for list items
node
(Node
) - Node to patch position on and insertparent
(Node
, optional) - Place to add node
to in the syntax tree. Defaults to the currently processed nodeThe given node
.
In rare situations, you may want to turn off a tokenizer to avoid parsing that syntactic feature. This can be done by replacing the tokenizer from your Parser’s blockTokenizers
(or blockMethods
) or inlineTokenizers
(or inlineMethods
).
The following example turns off indented code blocks:
remarkParse.Parser.prototype.blockTokenizers.indentedCode = indentedCode function indentedCode() { return true }
Preferably, just use this plugin.
MIT © Titus Wormer