| <refentry xmlns="http://docbook.org/ns/docbook" |
| xmlns:xlink="http://www.w3.org/1999/xlink" |
| xmlns:xi="http://www.w3.org/2001/XInclude" |
| xmlns:src="http://nwalsh.com/xmlns/litprog/fragment" |
| xmlns:xsl="http://www.w3.org/1999/XSL/Transform" |
| version="5.0" xml:id="make.index.markup"> |
| <refmeta> |
| <refentrytitle>make.index.markup</refentrytitle> |
| <refmiscinfo class="other" otherclass="datatype">boolean</refmiscinfo> |
| </refmeta> |
| <refnamediv> |
| <refname>make.index.markup</refname> |
| <refpurpose>Generate XML index markup in the index?</refpurpose> |
| </refnamediv> |
| |
| <refsynopsisdiv> |
| <src:fragment xml:id="make.index.markup.frag"> |
| <xsl:param name="make.index.markup" select="0"/> |
| </src:fragment> |
| </refsynopsisdiv> |
| |
| <refsection><info><title>Description</title></info> |
| |
| <para>This parameter enables a very neat trick for getting properly |
| merged, collated back-of-the-book indexes. G. Ken Holman suggested |
| this trick at Extreme Markup Languages 2002 and I'm indebted to him |
| for it.</para> |
| |
| <para>Jeni Tennison's excellent code in |
| <filename>autoidx.xsl</filename> does a great job of merging and |
| sorting <tag>indexterm</tag>s in the document and building a |
| back-of-the-book index. However, there's one thing that it cannot |
| reasonably be expected to do: merge page numbers into ranges. (I would |
| not have thought that it could collate and suppress duplicate page |
| numbers, but in fact it appears to manage that task somehow.)</para> |
| |
| <para>Ken's trick is to produce a document in which the index at the |
| back of the book is <quote>displayed</quote> in XML. Because the index |
| is generated by the FO processor, all of the page numbers have been resolved. |
| It's a bit hard to explain, but what it boils down to is that instead of having |
| an index at the back of the book that looks like this:</para> |
| |
| <blockquote> |
| <formalpara><info><title>A</title></info> |
| <para>ap1, 1, 2, 3</para> |
| </formalpara> |
| </blockquote> |
| |
| <para>you get one that looks like this:</para> |
| |
| <blockquote> |
| <programlisting><indexdiv>A</indexdiv> |
| <indexentry> |
| <primaryie>ap1</primaryie>, |
| <phrase role="pageno">1</phrase>, |
| <phrase role="pageno">2</phrase>, |
| <phrase role="pageno">3</phrase> |
| </indexentry></programlisting> |
| </blockquote> |
| |
| <para>After building a PDF file with this sort of odd-looking index, you can |
| extract the text from the PDF file and the result is a proper index expressed in |
| XML.</para> |
| |
| <para>Now you have data that's amenable to processing and a simple Perl script |
| (such as <filename>fo/pdf2index</filename>) can |
| merge page ranges and generate a proper index.</para> |
| |
| <para>Finally, reformat your original document using this literal index instead of |
| an automatically generated one and <quote>bingo</quote>!</para> |
| |
| </refsection> |
| </refentry> |