| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> |
| <!-- |
| #************************************************************** |
| # |
| # Licensed to the Apache Software Foundation (ASF) under one |
| # or more contributor license agreements. See the NOTICE file |
| # distributed with this work for additional information |
| # regarding copyright ownership. The ASF licenses this file |
| # to you under the Apache License, Version 2.0 (the |
| # "License"); you may not use this file except in compliance |
| # with the License. You may obtain a copy of the License at |
| # |
| # http://www.apache.org/licenses/LICENSE-2.0 |
| # |
| # Unless required by applicable law or agreed to in writing, |
| # software distributed under the License is distributed on an |
| # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| # KIND, either express or implied. See the License for the |
| # specific language governing permissions and limitations |
| # under the License. |
| # |
| #************************************************************** |
| --> |
| <html> |
| <head> |
| <title>org.openoffice.xmerge.converter.xml.sxw.aportisdoc package</title> |
| </head> |
| |
| <body bgcolor="white"> |
| |
| <p>Provides the tools for doing the conversion of StarWriter XML to |
| and from AportisDoc format.</p> |
| |
| <p>It follows the {@link org.openoffice.xmerge} framework for the conversion process.</p> |
| |
| <p>Since it converts to/from a Palm application format, these converters |
| follow the <a href=../../../../converter/palm/package-summary.html#streamformat> |
| <code>PalmDB</code> stream format</a> for writing out to the Palm sync client or |
| reading in from the Palm sync client.</p> |
| |
| <p>Note that <code>PluginFactoryImpl</code> also provides a |
| <code>DocumentMerger</code> object, i.e. {@link org.openoffice.xmerge.converter.xml.sxw.aportisdoc.DocumentMergerImpl DocumentMergerImpl}. |
| This functionality was derived from its superclass |
| {@link org.openoffice.xmerge.converter.xml.sxw.SxwPluginFactory |
| SxwPluginFactory}.</p> |
| |
| <h2>AportisDoc pdb format - Doc</h2> |
| |
| <p>The AportisDoc pdb format is widely used by different Palm applications, |
| e.g. QuickWord, AportisDoc Reader, MiniWrite, etc. Note that some |
| of these applications put tweaks into the format. The converters will only |
| support the default AportisDoc format, plus some very minor tweaks to accommodate |
| other applications.</p> |
| |
| <p>The text content of the format is plain text, i.e. there are no styles |
| or structures. There is no notion of lists, list items, paragraphs, |
| headings, etc. The format does have support for bookmarks.</p> |
| |
| <p>For most Doc applications, the default character encoding supported is |
| the extended ASCII character set, i.e. ISO-8859-1. StarWriter XML is in |
| UTF-8 encoding scheme. Since UTF-8 encoding scheme covers more characters, |
| converting UTF-8 strings into extended ASCII would mean that there can be |
| possible loss of character mappings.</p> |
| |
| <p>Using JAXP, XML files can be parsed and read in as Java <code>String</code>s |
| which is in Unicode format, there is no loss of character mapping from UTF-8 |
| to Java Strings. There is possible loss of character mapping in |
| converting Java <code>String</code>s to ASCII bytes. Java characters that |
| cannot be represented in extended ASCII are converted into the ASCII |
| character '?' or x3F in hex digit via the <code>String.getBytes(encoding)</code> |
| API.</p> |
| |
| <h2>SXW to DOC Conversion</h2> |
| |
| <p>The <code>DocumentSerializerImpl</code> class implements the |
| <code>org.openoffice.xmerge.DocumentSerializer</code>. |
| This class specifically provides the conversion process from a given |
| <code>SxwDocument</code> object to DOC formatted records, which are |
| then passed back to the client via the <code>ConvertData</code> object.</p> |
| |
| <p>The following XML tags are handled. [Note that some may not be implemented yet.]</p> |
| <ul> |
| <li> |
| <p>Paragraphs <tt><text:p></tt> and Headings <tt><text:h></tt></p> |
| |
| <p>Heading elements are classified the same as paragraph |
| elements since both have the same possible elements inside. |
| Their main difference is that they refer to different types |
| of style information, which is outside of their element tags. |
| Since there are no styles on the DOC format, headings should |
| be treated the same way a paragraph is converted.</p> |
| |
| <p>For paragraph elements, convert and transfer text nodes |
| that are essential. Text nodes directly contained within paragraph |
| nodes are such. There are also a number of elements that |
| a paragraph element may contain. These are explained in their |
| own context.</p> |
| |
| <p>At the end of the paragraph, an EOL character is added by |
| the converter to provide a separation for each paragraph, |
| since the Doc format does not have a notion of a paragraph.</p> |
| </li> |
| <li> |
| <p>White spaces <tt><text:s></tt> and Tabs <tt><text:tab-stop></tt></p> |
| |
| <p>In SXW, normally 2 or more white-space characters are collapsed into |
| a single space character. In order to make sure that the document |
| content really contains those white-space characters, there are special |
| elements assigned to them.</p> |
| |
| <p>The space element specifies the number of spaces are in it. |
| Thus, converting it just means providing the specific number of spaces |
| that the element requires.</p> |
| |
| <p>There is also the tab-stop element. This is a bit tricky. In a |
| StarWriter document, tab-stops are specified by a column position. |
| A tab is not an exact number of space, but rather a specific column |
| positioning. Say, regular tab-stops are set at every 5th column. |
| At column 4, if I hit a tab, it goes to column 5. At column 1, hitting |
| a tab would put the cursor at column 5 as well. SmartDoc and AporticDoc |
| applications goes by columns for the ASCII tab character. The only problem |
| is that in StarWriter, one could specify a different tab-stop, but not |
| in most of these Doc applications, at least I have not seen one. |
| Solution for this is just to go with the converting to the ASCII tab |
| character and not do anything for different tab-stop positioning.</p> |
| </li> |
| <li> |
| <p>Line breaks <tt><text:line-break></tt></p> |
| |
| <p>To represent line breaks, it is simpliest to just put an ASCII LF |
| character. Note that the side effect of this is that an end of paragraph |
| also contains an ASCII LF character. Thus, for the DOC to SXW conversion, |
| line breaks are not distinguishable from specifying the end of a |
| paragraph.</p> |
| </li> |
| <li> |
| <p>Text spans <tt><text:span></tt></p> |
| |
| <p>Text spans contain text that have different style attributes |
| from the paragraphs'. Text spans can be embedded within another |
| text span. Since it is purely for style tagging, we only needed |
| to convert and transfer the text elements within these.</p> |
| </li> |
| <li> |
| <p>Hyperlinks <tt><text:a></tt> |
| |
| <p>Convert and transfer the text portion.</p> |
| </li> |
| <li> |
| <p>Bookmarks <tt><text:bookmark></tt> <tt><text:bookmark-start></tt> |
| <tt><text:bookmark-end></tt> [Not implemented yet]</p> |
| |
| <p>In SXW, bookmark elements are embedded inside paragraph elements. |
| Bookmarks can either mark a text position or a text range. <tt><text:bookmark></tt> |
| marks a position while the pair <tt><text:bookmark-start></tt> and |
| <tt><text:bookmark-end></tt></p> marks a text range. The DOC format only |
| supports bookmarking a text position. Thus, for the conversion, |
| <tt><text:bookmark></tt> and <tt><text:bookmark-start></tt> will both mark |
| a text position.</p> |
| </li> |
| <li> |
| <p>Change Tracking <tt><text:tracked-changes></tt> |
| <tt><text:change*></tt> [Not implemented yet]</p> |
| |
| <p>Change tracking elements are not supported yet on the current |
| OpenOffice.org XML filters, will have to watch out on this. The text |
| within these elements have to be interpreted properly during the |
| conversion process.</p> |
| </li> |
| <li> |
| <p>Lists <tt><text:unordered-list></tt> and |
| <tt><text:ordered-lists></tt></p> |
| |
| <p>A list can only contain one optional <tt><text:list-header></tt> |
| and one or more <tt><text:list-item></tt> elements.</p> |
| |
| <p>A <tt><text:list-header></tt> contains one or more paragraph |
| elements. Since there are no styles, the conversion process does not |
| do anything special for list headers, conversion for the paragraphs |
| within list headers are the same as explained above.</p> |
| |
| <p>A <tt><text:list-item></tt> may contain one or more of paragraphs, |
| headings, list, etc. Since the Doc format does not support any list |
| structure, there will not be any special handling for this element. |
| Conversion for elements within it shall be applied according to the |
| element type. Thus, lists with paragraphs within it will result in just |
| plain paragraphs. Sublists will not be identifiable. Paragraphs in |
| sublists will still appear.</p> |
| </li> |
| <li> |
| <p><tt><text:section></tt></p> |
| |
| <p>I am not sure what this is yet, will need to investigate more on this.</p> |
| </li> |
| </ul> |
| <p>There may be other tags that will still need to be addressed for this conversion.</p> |
| |
| <p>Refer to {@link org.openoffice.xmerge.converter.xml.sxw.aportisdoc.DocumentSerializerImpl DocumentSerializerImpl} |
| for details of implementation. It uses <code>DocEncoder</code> class to do the encoding |
| part.</p> |
| |
| <h2>DOC to SXW Conversion</h2> |
| |
| <p>The <code>DocumentDeserializerImpl</code> class implements the |
| <code>org.openoffice.xmerge.DocumentDeserializer</code>. It is |
| passed the device document in the form of a <code>ConvertData</code> object. |
| It will then create a <code>SxwDocument</code> object from the conversion of |
| the DOC formatted records.</p> |
| |
| <p>The text content of the Doc format will be transferred as text. Paragraph |
| elements will be formed based on the existence of an ASCII LF character. There |
| will be at least one paragraph element.</p> |
| |
| <p>Bookmarks in the Doc format will be converted to the bookmark element |
| <tt><text:bookmark></tt> [Not implemented yet].</p> |
| |
| |
| <h2>Merging changes</h2> |
| |
| <p>As mentioned above, the <code>DocumentMerger</code> object produced by |
| <code>PluginFactoryImpl</code> is <code>DocumentMergerImpl</code>. |
| Refer to the javadocs for that package/class on its merging specifications. |
| </p> |
| |
| <h2>TODO list</h2> |
| |
| <p><ol> |
| <li>Investigate Palm's with different character encodings.</li> |
| <li>Investigate other StarWriter XML tags</li> |
| </ol></p> |
| |
| </body> |
| </html> |