| ================================== |
| Design document for PDF generation |
| ================================== |
| |
| :Author: kn |
| |
| This is design document for the PDF generation in the eZ Components document |
| component. PDF documents should be created from Docbook documents, which can |
| be generated from each markup available in the document component. |
| |
| The requirements which should be designed in this document are specified in |
| the pdf_requirements.txt document. |
| |
| Layout directives |
| ================= |
| |
| The PDF document will be created from the Docbook document object. It will |
| already contain basic formatting rules for meaningful default document layout. |
| Additional rules may be passed to the document in various ways. |
| |
| Generally a CSS like approach will be used to encode layout information. This |
| allows both, the easily readable addressing of nodes in an XML tree, like |
| already known from CSS, and humanly readable formatting options. |
| |
| A limited subset of CSS will be used for now for addressing elements inside the |
| Docbook XML tree. The grammar for those rules will be:: |
| |
| Address ::= Element ( Rule )* |
| Rule ::= '>'? Element |
| Element ::= ElementName ( '.' ClassName | '#' ElementId ) |
| |
| ClassName ::= [A-Za-z_-]+ |
| ElementName ::= XMLName* | '*' |
| ElementId ::= XMLName |
| |
| * XMLName references to http://www.w3.org/TR/REC-xml/#NT-Name |
| |
| The semantics of this simple subset of addressing directives are the same as in |
| CSS. A second level title could for example then be addressed by:: |
| |
| section title |
| |
| The formatting options are also mostly the same as in CSS, but again only |
| using a subset of the definitions available in CSS and with some additional |
| formatting options, relevant especially for PDF rendering. The used formatting |
| options depend on the renderer - unknown formatting options may issue errors |
| or warnings. |
| |
| The PDF document wrapper class will implement Iterator and ArrayAccess to |
| access the layout directives, like the following example shows:: |
| |
| $pdf = new ezcDocumentPdf(); |
| $pdf->createFromDocbook( $docbook ); |
| |
| $pdf->styles['article > section title']['font-size'] = '1.6em'; |
| |
| Directives which are specified later will always overwrite earlier directives, |
| for each each formatting option specified in the later directive. The |
| overwriting of formatting options will NOT depend on the complexity of the |
| node addressing like in CSS. |
| |
| Importing and exporting layout directives |
| ----------------------------------------- |
| |
| The layout directives can be exported and imported to and from files, so that |
| users of the component may store a custom PDF layout. The storage format will |
| again very much look like a simplified variant of CSS:: |
| |
| File ::= Directive+ |
| Directive ::= Address '{' Formatting* '}' |
| Formatting ::= Name ':' '"' Value '"' ';' |
| Name ::= [A-Za-z-]+ |
| Value ::= [^"]+ |
| |
| C-style comments are allowed anywhere in the definition file, like ```/* .. |
| */``` and ```// ...```. |
| |
| Importing and exporting styles may be accomblished by:: |
| |
| $pdf->styles->load( 'styles.pcss' ); |
| |
| List of formatting options |
| -------------------------- |
| |
| There will be formatting options just processed, like they are defined in CSS, |
| and some custom options. The options just reused from CSS are: |
| |
| - background-color |
| - background-image |
| - background-position |
| - background-repeat |
| - border-color |
| - border-width |
| - border-bottom-color |
| - border-bottom-width |
| - border-left-color |
| - border-left-width |
| - border-right-color |
| - border-right-width |
| - border-top-color |
| - border-top-width |
| - color |
| - direction |
| - font-family |
| - font-size |
| - font-style |
| - font-variant |
| - font-weight |
| - line-height |
| - list-style |
| - list-style-position |
| - list-style-type |
| - margin |
| - margin-bottom |
| - margin-left |
| - margin-right |
| - margin-top |
| - orphans |
| - padding |
| - padding-bottom |
| - padding-left |
| - padding-right |
| - padding-top |
| - page-break-after |
| - page-break-before |
| - text-align |
| - text-decoration |
| - text-indent |
| - white-space |
| - widows |
| - word-spacing |
| |
| Custom properties are: |
| |
| text-columns |
| Number of text text columns in one section. |
| text-column-spacing |
| The margin between multiple text comlumns on one page |
| page-size |
| Size of pages |
| page-orientation |
| Orientation of pages |
| |
| Not all options can be applied to each element. The renderer might complain on |
| invalid options, depending on the configured error level. |
| |
| Special layout elements |
| ======================= |
| |
| Footers & Headers |
| ----------------- |
| |
| Footnotes and Headers are special layout elements, which can be rendered |
| manually by the user of the component. They can be considered as small |
| sub-documents, but their renderer receives additional information about the |
| current page they are rendered on. |
| |
| They can be set like:: |
| |
| $pdf = new ezcDocumentPdf(); |
| $pdf->createFromDocbook( $docbook ); |
| |
| $pdf->footer = new myDocumentPdfPart(); |
| |
| Each of those parts can render itself and calculate the appropriate bounding. |
| There might be extensions from the basic PDFPart class, which again render small |
| Docbook sub documents into one header, or just take a string, replacing |
| placeholders with page dependent contents. |
| |
| Possible implementations would be: |
| |
| ezcDocumentPdfDocbookPart |
| Receives a docbook document and renders it using a a defined style at the |
| header or footer of the current page. Placeholders in the text, |
| represented by, for example, entities might be replaced. |
| ezcDocumentPdfStringPart |
| Receives a simple string, in which simple placeholders are replaced. |
| |
| Other elements |
| -------------- |
| |
| There are various possible full site elements, which might be rendered before or |
| after the actual contents. Those are for example: |
| |
| - Cover page |
| - Bibliography |
| - Back page |
| |
| To add those to on PDF document you can create a pdf set, which is then rendered |
| into one file:: |
| |
| $pdf = new ezcDocumentPdf(); |
| $pdf->createFromDocbook( $docbook ); |
| |
| $set = new ezcDocumentPdfSet(); |
| $set->parts = array( |
| new ezcDocumentPdfPdfPart( 'title.pdf' ), |
| $customTableOfContents, |
| $pdf, |
| $bibliography, |
| ); |
| $set->render( 'my.pdf' ); |
| |
| Some of the documents aggregated in one set can of course again be documents |
| created from Docbook documents. Each element in the set may contain custom |
| layout directives. |
| |
| For the inclusion of other document parts into a PdfSet you are expected to |
| extend from the PDF base class and implement you custom functionality there. |
| This could mean generating idexes, or a bibliography from the content. |
| |
| Drivers |
| ======= |
| |
| The actual PDF renderer calls methods on the driver, which abstract the quirks |
| of the respective implementations. There will be drivers for at least: |
| |
| - pecl/libharu |
| - TCPDF |
| |
| Renderer |
| ======== |
| |
| The renderer will be responsible for the actual typesetting. It will receive a |
| Docbook document, apply the given layout directives and calculate the |
| appropriate calls to the driver from those. |
| |
| The renderer optionally receives a set of helper objects, which perform relevant |
| parts of the typesetting, like: |
| |
| Hyphenator |
| Class implementing hyphenation for a specific language. We might provide a |
| default implementation, which reads standard hyphenation files. |
| |
| The renderer state will be shared using an object representing the page |
| currently processed, which contains information about the already covered |
| areas and therefore the still available space. |
| |
| Using such a state object, the state can easily be shared between different |
| renderers for different aspects of the rendering process. This should allow us |
| to write simpler rendering classes, which should be better maintainable then |
| one big renderer class, which methods would take care of all aspects. |
| |
| This page state object, knowing about free space on the current page, for |
| example allows to float text around images spanning multiple paragraphs, |
| because the already covered space is encoded. This allows all renderers for |
| the different aspects to reuse this knowledge and depend their rendering on |
| this. The space already covered on a page will most probably be represented by |
| a list of bounding boxes. |
| |
| Which renderer classes can be separated, will show up during implementation, |
| but those for example could be: |
| |
| ezcDocumentPdfParagraphRenderer |
| Takes care of rendering the Docbook inline markup inside one paragraph. |
| Respects orphans and widows and might be required to split paragraphs. |
| |
| ezcDocumentPdfTableRenderer |
| Renders tables. It might be useful to even split this up more into a table |
| row and cell renderer. |
| |
| Additional renderer features |
| ---------------------------- |
| |
| If the used driver class implements the respective interfaces the renderer will |
| also offer to sign PDF documents, or add write protection (or similar) to the |
| PDF document. |
| |
| Example |
| ======= |
| |
| A full example for the creation of a PDF document from a HTML page could look |
| like:: |
| |
| <?php |
| $html = new ezcDocumentXhtml(); |
| $html->loadFile( 'http://ezcomponents.org/introduction' ); |
| |
| $pdf = new ezcDocumentPdf(); |
| $pdf->createFromDocbook( $html->getAsDocbook() ); |
| |
| // Load some custom layout directives |
| $pdf->style->load( 'my_styles.pcss' ); |
| $pdf->style['article']['text-columns'] = 3; |
| |
| // Set a custom header |
| $pdf->header = new ezcDocumentPdfStringPart( |
| '%title by %author - %pageNum / %pageCount' |
| ); |
| |
| // Set a custom paragraph renderer |
| $pdf->renderer->paragraph = new myPdfParagraphRenderer(); |
| |
| // Use the hyphenator with a german dictionary |
| $pdf->renderer->hyphenator = new myDictionaryHyphenator( |
| '/path/to/german.dict' |
| ); |
| |
| // Store the generated PDF |
| file_put_contents( 'my.pdf', $pdf ); |
| ?> |
| |
| A file containing the layout directives could look like:: |
| |
| article { |
| page-size: "A4"; |
| } |
| |
| paragraph { |
| font-family: "Bitstream Vera Sans"; |
| font-size: "1em"; |
| } |
| |
| article > title { |
| font-weight: "bold"; |
| } |
| |
| section title { |
| font-weight: "normal"; |
| } |
| |
| Classes |
| ======= |
| |
| The classes implemented for the PDF generation are: |
| |
| ezcDocumentPdf |
| Base class, representing the PDF generation. Aggregates the style |
| information, the docbook source document, renderer and page parts like |
| footer and header. |
| |
| ezcDocumentPdfSet |
| Class aggregating multiple ezcDocumentPdf objects, to create one single |
| PDF document from multiple parts, like a cover page, the actual content, a |
| bibliography, etc. |
| |
| ezcDocumentPdfStyles |
| Class containing the PDF layout directives, also implements loading and |
| storing of those layout directives. |
| |
| ezcDocumentPdfPart |
| Abstract base class for page parts, like headers and footers. Renders the |
| respective part and will be extended by multiple concrete |
| implementations, which offer convient rendering methods. |
| |
| ezcDocumentPdfRenderer |
| Basic renderer class, which aggregates renderers for distinct page |
| elements, like paragraphs and tables, and dispatches the rendering to |
| them. Also maintains the ezcDocumentPdfPage state object, which contains |
| information of already covered parts of the pages. |
| |
| ezcDocumentPdfParagraphRenderer |
| Example for the concrete aspect specific renderer classes, which only |
| implement the rendering of small parts of a document, like single |
| paragraphs, tables, or table cell contents. |
| |
| ezcDocumentPdfPage |
| State object describing the current state of a single page in the PDF |
| document, like still available space. |
| |
| ezcDocumentPdfHyphenator |
| Abstract base class for hyphenation implementations for more accurate word |
| wrapping. |
| |