blob: f14bb1e7660e7dca12ab5c850ebd95e4d3de9f95 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!--
====================================================================
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
====================================================================
-->
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" "../dtd/document-v11.dtd">
<document>
<header>
<title>POI-HDGF - Java API To Access Microsoft Visio Format Files</title>
<subtitle>Overview</subtitle>
<authors>
<person name="Nick Burch" email="nick at apache dot org"/>
</authors>
</header>
<body>
<section>
<title>Overview</title>
<p>HDGF is the POI Project's pure Java implementation of the Visio file format.</p>
<p>Currently, HDGF provides a low-level, read-only api for
accessing Visio documents. It also provides a
<link href="http://svn.apache.org/repos/asf/poi/trunk/src/scratchpad/src/org/apache/poi/hdgf/extractor/">way</link>
to extract the textual content from a file.
</p>
<p>At this time, there is no <em>usermodel</em> api or similar,
only low level access to the streams, chunks and chunk commands.
Users are advised to check the unit tests to see how everything
works. They are also well advised to read the documentation
supplied with
<link href="http://www.gnome.ru/projects/vsdump_en.html">vsdump</link>
to get a feel for how Visio files are structured.</p>
<p>To get a feel for the contents of a file, and to track down
where data of interest is stored, HDGF comes with
<link href="http://svn.apache.org/repos/asf/poi/trunk/src/scratchpad/src/org/apache/poi/hdgf/dev/">VSDDumper</link>
to print out the contents of the file. Users should also make
use of
<link href="http://www.gnome.ru/projects/vsdump_en.html">vsdump</link>
to probe the structure of files.</p>
<note>
This code currently lives the
<link href="http://svn.apache.org/viewcvs.cgi/poi/trunk/src/scratchpad/">scratchpad area</link>
of the POI SVN repository.
Ensure that you have the scratchpad jar or the scratchpad
build area in your
classpath before experimenting with this code.
</note>
<section>
<title>Steps required for write support</title>
<p>Currently, HDGF is only able to read visio files, it is
not able to write them back out again. We believe the
following are the steps that would need to be taken to
implement it.</p>
<ol>
<li>Re-write the decompression support in LZW4HDGF to be
less opaque, and also under the ASL.</li>
<li>Add compression support to the new LZw4HDGF.</li>
<li>Have HDGF just write back the raw bytes it read in, and
have a test to ensure the file is un-changed.</li>
<li>Have HDGF generate the bytes to write out from the
Stream stores, using the compressed data as appropriate,
without re-compressing. Plus test to ensure file is
un-changed.</li>
<li>Have HDGF generate the bytes to write out from the
Stream stores, re-compressing any streams that were
decompressed. Plus test to ensure file is un-changed.</li>
<li>Have HDGF re-generate the offsets in pointers for the
locations of the streams. Plus test to ensure file is
un-changed.</li>
<li>Have HDGF re-generate the bytes for all the chunks, from
the chunk commands. Tests to ensure the chunks are
serialized properly, and then that the file is un-changed</li>
<li>Alter the data of one command, but keep it the same
length, and check visio can open the file when written
out.</li>
<li>Alter the data of one command, to a new length, and
check that visio can open the file when written out.</li>
</ol>
</section>
</section>
</body>
</document>