| <!doctype html> |
| <meta charset=utf-8> |
| <title>HTML Tidy for HTML5 (experimental)</title> |
| <style type="text/css"> |
| html { |
| background: #DDE5D9 url(data:image/gif;base64,R0lGODlhBAAEAIAAANra2v///yH5BAAAAAAALAAAAAAEAAQAAAIFTGB4xlcAOw==) repeat 0 0; |
| font-family: "Lucida Sans Unicode", "Lucida Sans", verdana, arial, helvetica; |
| } |
| body { |
| border: solid 1px #CED4CA; |
| background-color: #FFF; |
| padding: 4px 40px 40px 40px; |
| margin: 20px 20px 20px 20px; |
| padding-right: 20%; |
| } |
| h1, h2 { |
| color: #0B5B9D; |
| } |
| h1 { |
| font-size: 39px; |
| font-weight: normal; |
| vertical-align: top; |
| margin-bottom: 0px; |
| } |
| a { |
| text-decoration: none; |
| color: #0B5B9D; |
| padding: 2px; |
| } |
| |
| a:hover { |
| text-decoration: none; |
| background-color: #0B5B9D; |
| color: white; |
| } |
| a:active { |
| text-decoration: none; |
| background-color: white; |
| color: black; |
| } |
| #toc { |
| position: fixed; |
| top: 10px; |
| right: 10px; |
| border: 2px solid #0B5B9D; |
| background: rgba(255,255,255,.9); |
| padding: 15px; |
| z-index: 999; |
| max-height: 400px; |
| overflow: auto; |
| font-size: 11px; |
| font-family: Verdana, sans-serif; |
| } |
| #toc-button { |
| position: fixed; |
| top: 10px; |
| right: 10px; |
| background: transparent; |
| padding: 15px; |
| z-index: 999; |
| max-height: 400px; |
| overflow: auto; |
| font-size: 11px; |
| font-family: Verdana, sans-serif; |
| } |
| #toc .button, |
| #toc-button .button, |
| #quickref-button .button { |
| float: right; |
| width: 59px; |
| text-align: center; |
| margin: 0 0 5px 5px; |
| padding: 5px; |
| border: 1px #008 solid; |
| color:#00f; |
| background-color:#ccf; |
| cursor: pointer; |
| } |
| #toc ol { |
| margin: 0; |
| padding: 0; |
| font-size: 11px; |
| font-family: Verdana, sans-serif; |
| } |
| #toc li { |
| list-style: decimal outside; |
| margin-left: 20px; |
| font-size: 11px; |
| font-family: Verdana, sans-serif; |
| } |
| #toc li a { |
| font-size: 11px; |
| font-family: Verdana, sans-serif; |
| } |
| .hide { |
| display: none; |
| } |
| .show { |
| display: block; |
| } |
| #quickref-button { |
| position: fixed; |
| top: 40px; |
| right: 10px; |
| background: transparent; |
| padding: 15px; |
| z-index: 999; |
| max-height: 400px; |
| overflow: auto; |
| font-size: 11px; |
| font-family: Verdana, sans-serif; |
| } |
| code { color: green; font-weight: bold; } |
| pre { color: green; font-weight: bold; font-family: monospace} |
| em { font-style: italic; color: rgb(0, 0, 153) } |
| :link { color: rgb(0, 0, 153) } |
| :visited { color: rgb(153, 0, 153) } |
| </style> |
| |
| <h1 id=intro>HTML Tidy for HTML5 (experimental)</h1> |
| <p>This page documents the experimental HTML5 fork of HTML Tidy available |
| at |
| <a href="https://github.com/w3c/tidy-html5">https://github.com/w3c/tidy-html5</a>. |
| |
| <p>File bug reports and enhancement requests at |
| <a href="https://github.com/w3c/tidy-html5/issues">https://github.com/w3c/tidy-html5/issues</a>.</p> |
| |
| <p>The W3C public mailing list for HTML Tidy discussion is |
| <b>html-tidy@w3.org</b> (<a href= "http://lists.w3.org/Archives/Public/html-tidy/">list archive</a>). |
| |
| <p>For more information on HTML5:</p> |
| <ul> |
| <li> |
| <a href="http://dev.w3.org/html5/spec-author-view">HTML: Edition for Web Authors</a> (the latest HTML specification) |
| <li> |
| <a href="http://dev.w3.org/html5/markup/">HTML: The Markup Language</a> (an HTML language reference) |
| </ul> |
| <p> |
| Validate your HTML documents using the |
| <a href="http://validator.w3.org/nu/">W3C Nu Markup Validator</a>. |
| |
| <h2 id=what-tidy-does>What Tidy does</h2> |
| <p>Tidy corrects and cleans up HTML content by fixing markup errors. |
| Here are a few examples: |
| <ul> |
| <li><b>Mismatched end tags:</b> |
| <pre> |
| <h2>subheading</h3> |
| </pre> |
| <p>…is converted to:</p> |
| <pre> |
| <h2>subheading</h2> |
| </pre></li> |
| <li><b>Misnested tags:</b> |
| <pre> |
| <p>here is a para <b>bold <i>bold italic</b> bold?</i> normal? |
| </pre> |
| <p>…is converted to:</p> |
| <pre> |
| <p>here is a para <b>bold <i>bold italic</i> bold?</b> normal? |
| </pre></li> |
| <li><b>Missing end tags:</b> |
| <pre> |
| <h1>heading |
| <h2>subheading</h2> |
| </pre> |
| <p>…is converted to:</p> |
| <pre> |
| <h1>heading</h1> |
| <h2>subheading</h2> |
| </pre> |
| …and |
| <pre> |
| <h1><i>italic heading</h1> |
| </pre> |
| <p>…is converted to:</p> |
| <pre> |
| <h1><i>italic heading</i></h1> |
| </pre></li> |
| <li><b>Mixed-up tags</b> |
| <pre> |
| <i><h1>heading</h1></i> |
| <p>new paragraph <b>bold text |
| <p>some more bold text |
| </pre> |
| <p>…is converted to:</p> |
| <pre> |
| <h1><i>heading</i></h1> |
| <p>new paragraph <b>bold text</b> |
| <p><b>some more bold text</b> |
| </pre></li> |
| <li><b>Tag in the wrong place:</b> |
| <pre> |
| <h1><hr>heading</h1> |
| <h2>sub<hr>heading</h2> |
| </pre> |
| <p>…is converted to:</p> |
| <pre> |
| <hr> |
| <h1>heading</h1> |
| <h2>sub</h2> |
| <hr> |
| <h2>heading</h2> |
| </pre></li> |
| <li><b>Missing "/" in end tags:</b> |
| <pre> |
| <a href="#refs">References<a> |
| </pre> |
| <p>…is converted to:</p> |
| <pre> |
| <a href="#refs">References</a> |
| </pre></li> |
| <li><b>List markup with missing tags:</b> |
| <pre> |
| <body> |
| <li>1st list item |
| <li>2nd list item |
| </pre> |
| <p>…is converted to:</p> |
| <pre> |
| <body> |
| <ul> |
| <li>1st list item</li> |
| <li>2nd list item</li> |
| </ul> |
| </pre></li> |
| <li><b>Missing quotation marks around attribute values</b> |
| <p>Tidy inserts quotation marks around all attribute values for you. It |
| can also detect when you have forgotten the closing quotation mark, |
| although this is something you will have to fix yourself.</p> |
| </li> |
| <li><b>Unknown/proprietary attributes</b> |
| <p>Tidy has a comprehensive knowledge of the attributes defined in HTML5. |
| That often allows you to spot where you have mis-typed an attribute. |
| </li> |
| <li><b>Tags lacking a terminating ">"</b> |
| <p>This is something you then have to fix yourself as Tidy cannot |
| determine where the ">" was meant to be inserted.</p> |
| </li> |
| </ul> |
| |
| <h2 id="help">How to run Tidy from the command line</h2> |
| <p>This is the syntax for invoking Tidy from the command line: |
| <pre> |
| <code>tidy <em>[[options] filename]*</em></code> |
| </pre> |
| <p> |
| Tidy defaults to reading from standard input, so if you run Tidy without |
| specifying the <code><em>filename</em></code> argument, it will just sit |
| there waiting for input to read. |
| And Tidy defaults to writing to standard output. So you can pipe output |
| from Tidy to other programs, as well as pipe output from other programs to |
| Tidy. You can page through the output from Tidy by piping it to a pager:</p> |
| <pre> |
| tidy file.html | less |
| </pre> |
| <p> |
| To have Tidy write its output to a file instead, either use the |
| <code>-o <em>filename</em></code> or <code>-output <em>filename</em></code> |
| option, or redirect standard output to the file; for example: |
| <pre> |
| tidy -o output.html index.html |
| tidy index.html > output.html |
| </pre> |
| <p>Both of those run tidy on the file <b>index.html</b> and write the |
| output to the file <b>output.html</b>, while writing any error messages to |
| standard error. |
| <p> |
| Tidy defaults to writing its error messages to standard error (that is, to |
| the console where you’re running Tidy). To page through the error messages, |
| along with the output, redirect standard error to standard output, and pipe |
| it to your pager: |
| <pre> |
| tidy index.html 2>&1 | less |
| </pre> |
| <p> |
| To have Tidy write the errors to a file instead, either use the |
| <code>-f <em>filename</em></code> or <code>-file <em>filename</em></code> |
| option, or redirect standard error to a file:</p> |
| <pre> |
| tidy -o output.html -f errs.txt index.html |
| tidy index.html > output.html 2> errs.txt |
| </pre> |
| <p>Both of those run tidy on the file <b>index.html</b> and write the |
| output to the file <b>output.html</b>, while writing any error messages to |
| the file <b>errs.txt</b>. |
| <p> |
| Writing the error messages to a file is especially useful if the file you |
| are checking has many errors; reading them from a file instead of the |
| console or pager can make it easier to review them. |
| <p>You can use the or <code>-m</code> or <code>-modify</code> option to |
| modify (in-place) the contents of the input file you are checking; that is, |
| to overwrite those contents with the output from Tidy. Example: |
| <pre> |
| tidy -f errs.txt -m index.html |
| </pre> |
| <p>That runs tidy on the file <b>index.html</b>, modifying it in place |
| and writing the error messages to the file <b>errs.txt</b>. |
| <p> |
| <b>Caution:</b> If you use the -m option, you should first save a copy of your file. |
| <h2 id=options>Options and configuration settings</h2> |
| <p>To get a list of available options, use:</p> |
| <pre> |
| tidy -help |
| </pre> |
| <p>To get a list of all configuration settings, use:</p> |
| <pre> |
| tidy -help-config |
| </pre> |
| <p>To read the help output a page at time, pipe it to a pager: |
| <pre> |
| tidy -help | less |
| tidy -help-config | less |
| </pre> |
| <p>Single-letter options other than -f may be combined; for example: |
| <pre> |
| tidy -f errs.txt -imu foo.html |
| </pre> |
| |
| <h2 id="config">Using a config file</h2> |
| <p>The most convenient way to configure Tidy is by using separate |
| config file. |
| Assuming you have created a |
| Tidy config file named <b>config.txt</b> (the name doesn't matter), you can |
| instruct Tidy to use it via the command line option |
| <code>-config config.txt</code>; for example: |
| <pre> |
| tidy -config config.txt file1.html file2.html |
| </pre> |
| <p>Alternatively, you can name the default config file via the |
| environment variable named <b>HTML_TIDY</b>, the value of which is |
| the absolute path for the config file. |
| <p>You can also set config options on the command line by preceding |
| the name of the option immediately (no intervening space) with the string "<code>--</code>"; |
| for example:</p> |
| <pre> |
| tidy --break-before-br true --show-warnings false |
| </pre> |
| <p>You can find documentation for full set of configuration options |
| on the |
| <a href= "quickref.html">Quick Reference</a> |
| page. |
| |
| <h2 id=sample-config>Sample config file</h2> |
| <p>The following is an example of a Tidy config file.</p> |
| <pre> |
| // sample config file for HTML tidy |
| indent: auto |
| indent-spaces: 2 |
| wrap: 72 |
| markup: yes |
| output-xml: no |
| input-xml: no |
| show-warnings: yes |
| numeric-entities: yes |
| quote-marks: yes |
| quote-nbsp: yes |
| quote-ampersand: no |
| break-before-br: no |
| uppercase-tags: no |
| uppercase-attributes: no |
| char-encoding: latin1 |
| new-inline-tags: cfif, cfelse, math, mroot, |
| mrow, mi, mn, mo, msqrt, mfrac, msubsup, munderover, |
| munder, mover, mmultiscripts, msup, msub, mtext, |
| mprescripts, mtable, mtr, mtd, mth |
| new-blocklevel-tags: cfoutput, cfquery |
| new-empty-tags: cfelse |
| </pre> |
| |
| <h2 id="new-config">New configuration options</h2> |
| <p>The experimental HTML5-aware fork of Tidy adds the following new |
| configuration options: |
| <ul> |
| <li><a href="http://w3c.github.com/tidy-html5/quickref.html#coerce-endtags">coerce-endtags</a> |
| <li><a href="http://w3c.github.com/tidy-html5/quickref.html#drop-empty-elements">drop-empty-elements</a> |
| <li><a href="http://w3c.github.com/tidy-html5/quickref.html#merge-emphasis">merge-emphasis</a> |
| <li><a href="http://w3c.github.com/tidy-html5/quickref.html#omit-optional-tags">omit-optional-tags</a> |
| <li><a href="http://w3c.github.com/tidy-html5/quickref.html#show-info">show-info</a> |
| </ul> |
| <p>In addition, it also adds a new <code>html5</code> value for the |
| <a href="http://w3c.github.com/tidy-html5/quickref.html#doctype">doctype</a> |
| configuration option. |
| |
| <h2 id=indenting>Indenting output for readability</h2> |
| <p>Indenting the source markup of an HTML document makes the markup easier |
| to read. Tidy can indent the markup for an HTML document while recognizing |
| elements whose contents should not be indented. In the example below, Tidy |
| indents the output while preserving the formatting of the <pre> |
| element:</p> |
| <p>Input:</p> |
| <pre> |
| <html> |
| <head> |
| <title>Test document</title> |
| </head> |
| <body> |
| <p>This example shows how Tidy can indent output while preserving |
| formatting of particular elements.</p> |
| |
| <pre>This is |
| <em>genuine |
| preformatted</em> |
| text |
| </pre> |
| </body> |
| </html> |
| |
| </pre> |
| <p>Output:</p> |
| <pre> |
| <html> |
| <head> |
| <title>Test document</title> |
| </head> |
| |
| <body> |
| <p>This example shows how Tidy can indent output while preserving |
| formatting of particular elements.</p> |
| <pre> |
| This is |
| <em>genuine |
| preformatted</em> |
| text |
| </pre> |
| </body> |
| </html> |
| </pre> |
| <p>Tidy’s indenting behavior is not perfect and can sometimes cause your |
| output to be rendered by browsers in a different way than the input. |
| You can avoid unexpected indenting-related rendering problems by setting |
| <code>indent: no</code> or <code>indent: auto</code> in a config file.</p> |
| |
| <h2 id=preserve-indenting>Preserving original indenting not possible</h2> |
| <p>Tidy is not capable of preserving the original indenting of the markup |
| from the input it receives. That’s because Tidy starts by building a clean |
| parse tree from the input, and that parse tree doesn’t contain any |
| information about the original indenting. Tidy then pretty-prints the parse |
| tree using the current config settings. Trying to preserve the original |
| indenting from the input would interact badly with the repair operations |
| needed to build a clean parse tree, and would considerably complicate the |
| code.</p> |
| |
| <h2 id=encodings>Encodings and character references</h2> |
| <p> |
| Tidy defaults to assuming you want output to be encoded in UTF-8. |
| But Tidy offers you a choice of other character encodings: US ASCII, ISO |
| Latin-1, and the ISO 2022 family of 7 bit encodings. |
| <p> |
| Tidy doesn't yet recognize the use of the HTML <meta> element for |
| specifying the character encoding.</p> |
| <p> |
| The full set of HTML character references are defined. Cleaned-up output |
| uses named character references for characters when appropriate. Otherwise, |
| characters outside the normal range are output as numeric character |
| references. |
| |
| <h2 id=accessibility>Accessibility</h2> |
| <p>Tidy offers advice on potential accessibility problems for people using |
| non-graphical browsers. |
| |
| <h2 id=presentational-markup>Cleaning up presentational markup</h2> |
| <p>Some tools generate HTML with presentational elements such as <font>, |
| <nobr>, and <center>. |
| Tidy's <code>-clean</code> option will replace those elements with CSS style |
| properties. |
| <p>Some HTML documents rely on the presentational effects of <p> start |
| tags that are not followed by any content. Tidy deletes such <p> tags |
| (as well as any headings that don’t have content). So do not use <p> |
| tags simply for adding vertical whitespace; instead use CSS, or the |
| <br> element. However, note that Tidy won’t discard <p> tags that |
| are followed by any nonbreaking space (that is, the &nbsp; named |
| character reference). |
| |
| <h2 id=new-tags>Teaching Tidy about new tags</h2> |
| <p>You can teach Tidy about new tags by declaring them in the |
| configuration file, the syntax is:</p> |
| <pre> |
| new-inline-tags: <em>tag1, tag2, tag3</em> |
| new-empty-tags: <em>tag1, tag2, tag3</em> |
| new-blocklevel-tags: <em>tag1, tag2, tag3</em> |
| new-pre-tags: <em>tag1, tag2, tag3</em> |
| </pre> |
| <p>The same tag can be defined as empty and as inline or as empty |
| and as block.</p> |
| <p>These declarations can be combined to define a new empty |
| inline or empty block element. But you are not advised to declare |
| tags as being both inline and block.</p> |
| <p>Note that the new tags can only appear where Tidy expects inline |
| or block-level tags respectively. That means you can’t place |
| new tags within the document head or other contexts with restricted |
| content models. |
| |
| <h2 id=php-asp-jste>Ignoring PHP, ASP, and JSTE instructions</h2> |
| <p>Tidy will gracefully ignore many cases of PHP, ASP, and JSTE |
| instructions within element content and as replacements for attributes, |
| and preserve them as-is in output; for example:</p> |
| <pre> |
| <option <% if rsSchool.Fields("ID").Value |
| = session("sessSchoolID") |
| then Response.Write("selected") %> |
| value='<%=rsSchool.Fields("ID").Value%>'> |
| <%=rsSchool.Fields("Name").Value%> |
| (<%=rsSchool.Fields("ID").Value%>) |
| </option> |
| </pre> |
| <p>But note that Tidy may report missing attributes when those are “hidden” |
| within the PHP, ASP, or JSTE code. If you use PHP, ASP, or JSTE code to |
| create a start tag, but place the end tag explicitly in the HTML markup, Tidy |
| won’t be able to match them up, and will delete the end tag. So in that |
| case you are advised to make the start tag explicit and to use PHP, ASP, or |
| JSTE code for just the attributes; for example:</p> |
| <pre> |
| <a href="<%=random.site()%>">do you feel lucky?</a> |
| </pre> |
| <p> |
| Tidy can also get things wrong if the PHP, ASP, or JSTE code includes |
| quotation marks; for example: |
| </p> |
| <pre> |
| value="<%=rsSchool.Fields("ID").Value%>" |
| </pre> |
| <p>Tidy will see the quotation mark preceding <i>ID</i> as ending the |
| attribute value, and proceed to complain about what follows. |
| <p>Tidy allows you to control whether line wrapping on spaces within |
| PHP, ASP, and JSTE |
| instructions is enabled; see the <b>wrap-php</b>, <b>wrap-asp</b>, |
| and <b>wrap-jste</b> config options.</p> |
| |
| <h2 id=xml>Correcting well-formedness errors in XML markup</h2> |
| <p>Tidy can help you to correct well-formedness errors in XML markup. Tidy |
| doesn't yet recognize all XML features, though; for example, it doesn't |
| understand CDATA sections or DTD subsets.</p> |
| |
| <h2 id="scripts">Using Tidy from scripts</h2> |
| <p>If you want to run Tidy from a Perl or other scripting language |
| you may find it of value to inspect the result returned by Tidy |
| when it exits: 0 if everything is fine, 1 if there were warnings |
| and 2 if there were errors. This is an example using Perl:</p> |
| <pre> |
| if (close(TIDY) == 0) { |
| my $exitcode = $? >> 8; |
| if ($exitcode == 1) { |
| printf STDERR "tidy issued warning messages\n"; |
| } elsif ($exitcode == 2) { |
| printf STDERR "tidy issued error messages\n"; |
| } else { |
| die "tidy exited with code: $exitcode\n"; |
| } |
| } else { |
| printf STDERR "tidy detected no errors\n"; |
| } |
| </pre> |
| |
| <h2 id="implementation">Source code</h2> |
| <p>The source code for the experimental HTML5 fork of Tidy can be found at |
| <a href="https://github.com/w3c/tidy-html5">https://github.com/w3c/tidy-html5</a>. |
| |
| <h2 id=build-cli>Building the tidy command-line tool</h2> |
| <p>For Linux/BSD/OSX platforms, you can build and install the |
| <code>tidy</code> command-line tool from the source code using the |
| following steps.</p> |
| |
| <ol> |
| <li><code>make -C build/gmake/</code></li> |
| <li><code>make install -C build/gmake/</code></li> |
| </ol> |
| |
| <p>Note that you will either need to run <code>make install</code> as root, |
| or with <code>sudo make install</code>.</p> |
| |
| <h2 id=build-library>Building the libtidy shared library</h2> |
| <p>For Linux/BSD/OSX platforms, you can build and install the |
| <code>tidylib</code> shared library (for integrating Tidy into other |
| applications) from the source code using the following steps.</p> |
| |
| <ol> |
| <li><code>sh build/gnuauto/setup.sh && ./configure && make</code></li> |
| <li><code>make install</code></li> |
| </ol> |
| |
| <p>Note that you will either need to run <code>make install</code> as root, |
| or with <code>sudo make install</code>.</p> |
| |
| <h2 id=acks>Acknowledgements</h2> |
| <p>Dave Raggett has a list of |
| <a href="http://www.w3.org/People/Raggett/tidy/#acks">Acknowledgements</a> |
| for people who made suggestions or reported bugs for the |
| original version of Tidy. |
| |
| <div id=toc-button style=""> |
| <a class=button onclick=" |
| document.getElementById('toc').className = 'show'; |
| document.getElementById('toc-button').className = 'hide'; |
| document.getElementById('quickref-button').className = 'hide';" |
| >Show TOC</a> |
| </div> |
| <div id=toc class=hide> |
| <a class=button onclick=" |
| document.getElementById('toc').className = 'hide'; |
| document.getElementById('toc-button').className = 'show'; |
| document.getElementById('quickref-button').className = 'show';" |
| >Close</a> |
| <ol> |
| <li><a href="#what-tidy-does">What Tidy does</a> |
| <li><a href="#help">How to run Tidy from the command line</a> |
| <li><a href="#options">Options and configuration settings</a> |
| <li><a href="#config">Using a config file</a> |
| <li><a href="#sample-config">Sample config file</a> |
| <li><a href="#new-config">New configuration options</a> |
| <li><a href="#indenting">Indenting output for readability</a> |
| <li><a href="#preserve-indenting">Preserving original indenting not possible</a> |
| <li><a href="#encodings">Encodings and character references</a> |
| <li><a href="#accessibility">Accessibility</a> |
| <li><a href="#presentational-markup">Cleaning up presentational markup</a> |
| <li><a href="#new-tags">Teaching Tidy about new tags</a> |
| <li><a href="#php-asp-jste">Ignoring PHP, ASP, and JSTE instructions</a> |
| <li><a href="#xml">Correcting well-formedness errors in XML markup</a> |
| <li><a href="#scripts">Using Tidy from scripts</a> |
| <li><a href="#implementation">Source code</a> |
| <li><a href="#build-cli">Building the tidy command-line tool</a> |
| <li><a href="#build-library">Building the tidylib shared library</a> |
| <li><a href="#acks">Acknowledgements</a> |
| </ol> |
| </div> |
| |
| <div id=quickref-button style=""> |
| <a class=button href="quickref.html">QuickRef</a> |
| </div> |
| <script> |
| document.addEventListener("mouseup", function() { |
| if (document.getElementById('toc').className = 'show') { |
| document.getElementById('toc').className = 'hide'; |
| document.getElementById('toc-button').className = 'show'; |
| document.getElementById('quickref-button').className = 'show'; |
| } |
| }, false) |
| </script> |