blob: cac8ecd21a5081cd60110c6ed6a15052586bd8fc [file] [log] [blame]
<?xml version="1.0"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<document>
<properties>
<title>Commons Compress TAR package</title>
<author email="dev@commons.apache.org">Commons Documentation Team</author>
</properties>
<body>
<section name="The TAR package">
<p>In addition to the information stored
in <code>ArchiveEntry</code> a <code>TarArchiveEntry</code>
stores various attributes including information about the
original owner and permissions.</p>
<p>There are several different tar formats and the TAR package
of Compress 1.4 mostly only provides the common functionality
of the existing variants.</p>
<p>The original format (often called "ustar") didn't support
file names longer than 100 characters or bigger than 8 GiB and
the tar package will by default fail if you try to write an
entry that goes beyond those limits.</p>
<p>The tar package does not support the full POSIX tar standard
nor more modern GNU extension of said standard.</p>
<subsection name="Long File Names">
<p>The <code>longFileMode</code> option of
<code>TarArchiveOutputStream</code> controls how files with
names longer than 100 characters are handled. The possible
choices are:</p>
<ul>
<li><code>LONGFILE_ERROR</code>: throw an exception if such a
file is added. This is the default.</li>
<li><code>LONGFILE_TRUNCATE</code>: truncate such names.</li>
<li><code>LONGFILE_GNU</code>: use a GNU tar variant now
refered to as "oldgnu" of storing such names. If you choose
the GNU tar option, the archive can not be extracted using
many other tar implementations like the ones of OpenBSD,
Solaris or MacOS X.</li>
<li><code>LONGFILE_POSIX</code>: use a PAX <a
href="http://pubs.opengroup.org/onlinepubs/009695399/utilities/pax.html#tag_04_100_13_03">extended
header</a> as defined by POSIX 1003.1. Most modern tar
implementations are able to extract such archives. <em>since
Commons Compress 1.4</em></li>
</ul>
<p><code>TarArchiveInputStream</code> will recognize the GNU
tar as well as the POSIX extensions (starting with Commons
Compress 1.2) for long file names and reads the longer names
transparently.</p>
</subsection>
<subsection name="Big Numeric Values">
<p>The <code>bigNumberMode</code> option of
<code>TarArchiveOutputStream</code> controls how files larger
than 8GiB or with other big numeric values that can't be
encoded in traditional header fields are handled. The
possible choices are:</p>
<ul>
<li><code>BIGNUMBER_ERROR</code>: throw an exception if such an
entry is added. This is the default.</li>
<li><code>BIGNUMBER_STAR</code>: use a variant first
introduced by J&#xf6;rg Schilling's <a
href="http://developer.berlios.de/projects/star">star</a>
and later adopted by GNU and BSD tar. This method is not
supported by all implementations.</li>
<li><code>BIGNUMBER_POSIX</code>: use a PAX <a
href="http://pubs.opengroup.org/onlinepubs/009695399/utilities/pax.html#tag_04_100_13_03">extended
header</a> as defined by POSIX 1003.1. Most modern tar
implementations are able to extract such archives.</li>
</ul>
<p>Starting with Commons Compress 1.4
<code>TarArchiveInputStream</code> will recognize the star as
well as the POSIX extensions for big numeric values and reads them
transparently.</p>
</subsection>
<subsection name="File Name Encoding">
<p>The original ustar format only supports 7-Bit ASCII file
names, later implementations use the platform's default
encoding to encode file names. The POSIX standard recommends
using PAX extension headers for non-ASCII file names
instead.</p>
<p>Commons Compress 1.1 to 1.3 assumed file names would be
encoded using ISO-8859-1. Starting with Commons Compress 1.4
you can specify the encoding to expect (to use when writing)
as a parameter to <code>TarArchiveInputStream</code>
(<code>TarArchiveOutputStream</code>), it now defaults to the
platform's default encoding.</p>
<p>Since Commons Compress 1.4 another optional parameter -
<code>addPaxHeadersForNonAsciiNames</code> - of
<code>TarArchiveOutputStream</code> controls whether PAX
extension headers will be written for non-ASCII file names.
By default they will not be written to preserve space.
<code>TarArchiveInputStream</code> will read them
transparently if present.</p>
</subsection>
<subsection name="Sparse files">
<p><code>TarArchiveInputStream</code> will recognize sparse
file entries stored using the "oldgnu" format
(<code>-&#x2d;sparse-version=0.0</code> in GNU tar) but is not
able to extract them correctly. <a href="#Unsupported
Features"><code>canReadEntryData</code></a> will return false
on such entries. The other variants of sparse files can
currently not be detected at all.</p>
</subsection>
<subsection name="Consuming Archives Completely">
<p>The end of a tar archive is signalled by two consecutive
records of all zeros. Unfortunately not all tar
implementations adhere to this and some only write one record
to end the archive. Commons Compress will always write two
records but stop reading an archive as soon as finds one
record of all zeros.</p>
<p>Prior to version 1.5 this could leave the second EOF record
inside the stream when <code>getNextEntry</code> or
<code>getNextTarEntry</code> returned <code>null</code>
Starting with version 1.5 <code>TarArchiveInputStream</code>
will try to read a second record as well if present,
effectively consuming the archive completely.</p>
</subsection>
</section>
</body>
</document>