| <?xml version="1.0" encoding="UTF-8"?> |
| <!-- |
| ==================================================================== |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| ==================================================================== |
| --> |
| <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" "../dtd/document-v11.dtd"> |
| |
| <document> |
| <header> |
| <title>POI-HDGF - Java API To Access Microsoft Visio Format Files</title> |
| <subtitle>Overview</subtitle> |
| <authors> |
| <person name="Nick Burch" email="nick at apache dot org"/> |
| </authors> |
| </header> |
| |
| <body> |
| <section> |
| <title>Overview</title> |
| |
| <p>HDGF is the POI Project's pure Java implementation of the Visio file format.</p> |
| <p>Currently, HDGF provides a low-level, read-only api for |
| accessing Visio documents. It also provides a |
| <link href="http://svn.apache.org/repos/asf/poi/trunk/src/scratchpad/src/org/apache/poi/hdgf/extractor/">way</link> |
| to extract the textual content from a file. |
| </p> |
| <p>At this time, there is no <em>usermodel</em> api or similar, |
| only low level access to the streams, chunks and chunk commands. |
| Users are advised to check the unit tests to see how everything |
| works. They are also well advised to read the documentation |
| supplied with |
| <link href="http://www.gnome.ru/projects/vsdump_en.html">vsdump</link> |
| to get a feel for how Visio files are structured.</p> |
| <p>To get a feel for the contents of a file, and to track down |
| where data of interest is stored, HDGF comes with |
| <link href="http://svn.apache.org/repos/asf/poi/trunk/src/scratchpad/src/org/apache/poi/hdgf/dev/">VSDDumper</link> |
| to print out the contents of the file. Users should also make |
| use of |
| <link href="http://www.gnome.ru/projects/vsdump_en.html">vsdump</link> |
| to probe the structure of files.</p> |
| <note> |
| This code currently lives the |
| <link href="http://svn.apache.org/viewcvs.cgi/poi/trunk/src/scratchpad/">scratchpad area</link> |
| of the POI SVN repository. |
| Ensure that you have the scratchpad jar or the scratchpad |
| build area in your |
| classpath before experimenting with this code. |
| </note> |
| |
| <section> |
| <title>Steps required for write support</title> |
| <p>Currently, HDGF is only able to read visio files, it is |
| not able to write them back out again. We believe the |
| following are the steps that would need to be taken to |
| implement it.</p> |
| <ol> |
| <li>Re-write the decompression support in LZW4HDGF to be |
| less opaque, and also under the ASL.</li> |
| <li>Add compression support to the new LZw4HDGF.</li> |
| <li>Have HDGF just write back the raw bytes it read in, and |
| have a test to ensure the file is un-changed.</li> |
| <li>Have HDGF generate the bytes to write out from the |
| Stream stores, using the compressed data as appropriate, |
| without re-compressing. Plus test to ensure file is |
| un-changed.</li> |
| <li>Have HDGF generate the bytes to write out from the |
| Stream stores, re-compressing any streams that were |
| decompressed. Plus test to ensure file is un-changed.</li> |
| <li>Have HDGF re-generate the offsets in pointers for the |
| locations of the streams. Plus test to ensure file is |
| un-changed.</li> |
| <li>Have HDGF re-generate the bytes for all the chunks, from |
| the chunk commands. Tests to ensure the chunks are |
| serialized properly, and then that the file is un-changed</li> |
| <li>Alter the data of one command, but keep it the same |
| length, and check visio can open the file when written |
| out.</li> |
| <li>Alter the data of one command, to a new length, and |
| check that visio can open the file when written out.</li> |
| </ol> |
| </section> |
| </section> |
| </body> |
| </document> |