1<?xml version="1.0"?>
2<!--
3
4   Licensed to the Apache Software Foundation (ASF) under one or more
5   contributor license agreements.  See the NOTICE file distributed with
6   this work for additional information regarding copyright ownership.
7   The ASF licenses this file to You under the Apache License, Version 2.0
8   (the "License"); you may not use this file except in compliance with
9   the License.  You may obtain a copy of the License at
10
11       http://www.apache.org/licenses/LICENSE-2.0
12
13   Unless required by applicable law or agreed to in writing, software
14   distributed under the License is distributed on an "AS IS" BASIS,
15   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16   See the License for the specific language governing permissions and
17   limitations under the License.
18
19-->
20<document>
21  <properties>
22    <title>Commons Compress User Guide</title>
23    <author email="dev@commons.apache.org">Commons Documentation Team</author>
24  </properties>
25  <body>
26    <section name="General Notes">
27
28      <subsection name="Archivers and Compressors">
29        <p>Commons Compress calls all formats that compress a single
30        stream of data compressor formats while all formats that
31        collect multiple entries inside a single (potentially
32        compressed) archive are archiver formats.</p>
33
34        <p>The compressor formats supported are gzip, bzip2, xz, lzma,
35        Pack200, DEFLATE, Brotli, DEFLATE64, ZStandard and Z, the archiver formats are 7z, ar, arj,
36        cpio, dump, tar and zip.  Pack200 is a special case as it can
37        only compress JAR files.</p>
38
39        <p>We currently only provide read support for arj,
40        dump, Brotli, DEFLATE64 and Z.  arj can only read uncompressed archives, 7z can read
41        archives with many compression and encryption algorithms
42        supported by 7z but doesn't support encryption when writing
43        archives.</p>
44      </subsection>
45
46      <subsection name="Buffering">
47        <p>The stream classes all wrap around streams provided by the
48          calling code and they work on them directly without any
49          additional buffering.  On the other hand most of them will
50          benefit from buffering so it is highly recommended that
51          users wrap their stream
52          in <code>Buffered<em>(In|Out)</em>putStream</code>s before
53          using the Commons Compress API.</p>
54
55      </subsection>
56
57      <subsection name="Factories">
58
59        <p>Compress provides factory methods to create input/output
60          streams based on the names of the compressor or archiver
61          format as well as factory methods that try to guess the
62          format of an input stream.</p>
63
64        <p>To create a compressor writing to a given output by using
65          the algorithm name:</p>
66        <source><![CDATA[
67CompressorOutputStream gzippedOut = new CompressorStreamFactory()
68    .createCompressorOutputStream(CompressorStreamFactory.GZIP, myOutputStream);
69]]></source>
70
71        <p>Make the factory guess the input format for a given
72        archiver stream:</p>
73        <source><![CDATA[
74ArchiveInputStream input = new ArchiveStreamFactory()
75    .createArchiveInputStream(originalInput);
76]]></source>
77
78        <p>Make the factory guess the input format for a given
79        compressor stream:</p>
80        <source><![CDATA[
81CompressorInputStream input = new CompressorStreamFactory()
82    .createCompressorInputStream(originalInput);
83]]></source>
84
85        <p>Note that there is no way to detect the lzma or Brotli formats so only
86        the two-arg version of
87        <code>createCompressorInputStream</code> can be used.  Prior
88        to Compress 1.9 the .Z format hasn't been auto-detected
89        either.</p>
90
91      </subsection>
92
93      <subsection name="Restricting Memory Usage">
94        <p>Starting with Compress 1.14
95        <code>CompressorStreamFactory</code> has an optional
96        constructor argument that can be used to set an upper limit of
97        memory that may be used while decompressing or compressing a
98        stream. As of 1.14 this setting only affects decompressing Z,
99        XZ and LZMA compressed streams.</p>
100        <p>For the Snappy and LZ4 formats the amount of memory used
101        during compression is directly proportional to the window
102        size.</p>
103      </subsection>
104
105      <subsection name="Statistics">
106        <p>Starting with Compress 1.17 most of the
107        <code>CompressorInputStream</code> implementations as well as
108        <code>ZipArchiveInputStream</code> and all streams returned by
109        <code>ZipFile.getInputStream</code> implement the
110        <code>InputStreamStatistics</code>
111        interface. <code>SevenZFile</code> provides statistics for the
112        current entry via the
113        <code>getStatisticsForCurrentEntry</code> method. This
114        interface can be used to track progress while extracting a
115        stream or to detect potential <a
116        href="https://en.wikipedia.org/wiki/Zip_bomb">zip bombs</a>
117        when the compression ration becomes suspiciously large.</p>
118      </subsection>
119
120    </section>
121    <section name="Archivers">
122
123      <subsection name="Unsupported Features">
124        <p>Many of the supported formats have developed different
125        dialects and extensions and some formats allow for features
126        (not yet) supported by Commons Compress.</p>
127
128        <p>The <code>ArchiveInputStream</code> class provides a method
129        <code>canReadEntryData</code> that will return false if
130        Commons Compress can detect that an archive uses a feature
131        that is not supported by the current implementation.  If it
132        returns false you should not try to read the entry but skip
133        over it.</p>
134
135      </subsection>
136
137      <subsection name="Entry Names">
138        <p>All archive formats provide meta data about the individual
139        archive entries via instances of <code>ArchiveEntry</code> (or
140        rather subclasses of it). When reading from an archive the
141        information provided the <code>getName</code> method is the
142        raw name as stored inside of the archive. There is no
143        guarantee the name represents a relative file name or even a
144        valid file name on your target operating system at all. You
145        should double check the outcome when you try to create file
146        names from entry names.</p>
147      </subsection>
148
149      <subsection name="Common Extraction Logic">
150        <p>Apart from 7z all formats provide a subclass of
151        <code>ArchiveInputStream</code> that can be used to create an
152        archive. For 7z <code>SevenZFile</code> provides a similar API
153        that does not represent a stream as our implementation
154        requires random access to the input and cannot be used for
155        general streams. The ZIP implementation can benefit a lot from
156        random access as well, see the <a
157        href="zip.html#ZipArchiveInputStream_vs_ZipFile">zip
158        page</a> for details.</p>
159
160        <p>Assuming you want to extract an archive to a target
161        directory you'd call <code>getNextEntry</code>, verify the
162        entry can be read, construct a sane file name from the entry's
163        name, create a <codee>File</codee> and write all contents to
164        it - here <code>IOUtils.copy</code> may come handy. You do so
165        for every entry until <code>getNextEntry</code> returns
166        <code>null</code>.</p>
167
168        <p>A skeleton might look like:</p>
169
170        <source><![CDATA[
171File targetDir = ...
172try (ArchiveInputStream i = ... create the stream for your format, use buffering...) {
173    ArchiveEntry entry = null;
174    while ((entry = i.getNextEntry()) != null) {
175        if (!i.canReadEntryData(entry)) {
176            // log something?
177            continue;
178        }
179        String name = fileName(targetDir, entry);
180        File f = new File(name);
181        if (entry.isDirectory()) {
182            if (!f.isDirectory() && !f.mkdirs()) {
183                throw new IOException("failed to create directory " + f);
184            }
185        } else {
186            File parent = f.getParentFile();
187            if (!parent.isDirectory() && !parent.mkdirs()) {
188                throw new IOException("failed to create directory " + parent);
189            }
190            try (OutputStream o = Files.newOutputStream(f.toPath())) {
191                IOUtils.copy(i, o);
192            }
193        }
194    }
195}
196]]></source>
197
198        <p>where the hypothetical <code>fileName</code> method is
199        written by you and provides the absolute name for the file
200        that is going to be written on disk. Here you should perform
201        checks that ensure the resulting file name actually is a valid
202        file name on your operating system or belongs to a file inside
203        of <code>targetDir</code> when using the entry's name as
204        input.</p>
205
206        <p>If you want to combine an archive format with a compression
207        format - like when reading a "tar.gz" file - you wrap the
208        <code>ArchiveInputStream</code> around
209        <code>CompressorInputStream</code> for example:</p>
210
211        <source><![CDATA[
212try (InputStream fi = Files.newInputStream(Paths.get("my.tar.gz"));
213     InputStream bi = new BufferedInputStream(fi);
214     InputStream gzi = new GzipCompressorInputStream(bi);
215     ArchiveInputStream o = new TarArchiveInputStream(gzi)) {
216}
217]]></source>
218
219      </subsection>
220
221      <subsection name="Common Archival Logic">
222        <p>Apart from 7z all formats that support writing provide a
223        subclass of <code>ArchiveOutputStream</code> that can be used
224        to create an archive. For 7z <code>SevenZOutputFile</code>
225        provides a similar API that does not represent a stream as our
226        implementation requires random access to the output and cannot
227        be used for general streams. The
228        <code>ZipArchiveOutputStream</code> class will benefit from
229        random access as well but can be used for non-seekable streams
230        - but not all features will be available and the archive size
231        might be slightly bigger, see <a
232        href="zip.html#ZipArchiveOutputStream">the zip page</a> for
233        details.</p>
234
235        <p>Assuming you want to add a collection of files to an
236        archive, you can first use <code>createArchiveEntry</code> for
237        each file. In general this will set a few flags (usually the
238        last modified time, the size and the information whether this
239        is a file or directory) based on the <code>File</code>
240        instance. Alternatively you can create the
241        <code>ArchiveEntry</code> subclass corresponding to your
242        format directly. Often you may want to set additional flags
243        like file permissions or owner information before adding the
244        entry to the archive.</p>
245
246        <p>Next you use <code>putArchiveEntry</code> in order to add
247        the entry and then start using <code>write</code> to add the
248        content of the entry - here <code>IOUtils.copy</code> may
249        come handy. Finally you invoke
250        <code>closeArchiveEntry</code> once you've written all content
251        and before you add the next entry.</p>
252
253        <p>Once all entries have been added you'd invoke
254        <code>finish</code> and finally <code>close</code> the
255        stream.</p>
256
257        <p>A skeleton might look like:</p>
258
259        <source><![CDATA[
260Collection<File> filesToArchive = ...
261try (ArchiveOutputStream o = ... create the stream for your format ...) {
262    for (File f : filesToArchive) {
263        // maybe skip directories for formats like AR that don't store directories
264        ArchiveEntry entry = o.createArchiveEntry(f, entryName(f));
265        // potentially add more flags to entry
266        o.putArchiveEntry(entry);
267        if (f.isFile()) {
268            try (InputStream i = Files.newInputStream(f.toPath())) {
269                IOUtils.copy(i, o);
270            }
271        }
272        o.closeArchiveEntry();
273    }
274    out.finish();
275}
276]]></source>
277
278        <p>where the hypothetical <code>entryName</code> method is
279        written by you and provides the name for the entry as it is
280        going to be written to the archive.</p>
281
282        <p>If you want to combine an archive format with a compression
283        format - like when creating a "tar.gz" file - you wrap the
284        <code>ArchiveOutputStream</code> around a
285        <code>CompressorOutputStream</code> for example:</p>
286
287        <source><![CDATA[
288try (OutputStream fo = Files.newOutputStream(Paths.get("my.tar.gz"));
289     OutputStream gzo = new GzipCompressorOutputStream(fo);
290     ArchiveOutputStream o = new TarArchiveOutputStream(gzo)) {
291}
292]]></source>
293
294      </subsection>
295
296      <subsection name="7z">
297
298        <p>Note that Commons Compress currently only supports a subset
299        of compression and encryption algorithms used for 7z archives.
300        For writing only uncompressed entries, LZMA, LZMA2, BZIP2 and
301        Deflate are supported - in addition to those reading supports
302        AES-256/SHA-256 and DEFLATE64.</p>
303
304        <p>Multipart archives are not supported at all.</p>
305
306        <p>7z archives can use multiple compression and encryption
307        methods as well as filters combined as a pipeline of methods
308        for its entries.  Prior to Compress 1.8 you could only specify
309        a single method when creating archives - reading archives
310        using more than one method has been possible before.  Starting
311        with Compress 1.8 it is possible to configure the full
312        pipeline using the <code>setContentMethods</code> method of
313        <code>SevenZOutputFile</code>.  Methods are specified in the
314        order they appear inside the pipeline when creating the
315        archive, you can also specify certain parameters for some of
316        the methods - see the Javadocs of
317        <code>SevenZMethodConfiguration</code> for details.</p>
318
319        <p>When reading entries from an archive the
320        <code>getContentMethods</code> method of
321        <code>SevenZArchiveEntry</code> will properly represent the
322        compression/encryption/filter methods but may fail to
323        determine the configuration options used.  As of Compress 1.8
324        only the dictionary size used for LZMA2 can be read.</p>
325
326        <p>Currently solid compression - compressing multiple files
327        as a single block to benefit from patterns repeating accross
328        files - is only supported when reading archives.  This also
329        means compression ratio will likely be worse when using
330        Commons Compress compared to the native 7z executable.</p>
331
332        <p>Reading or writing requires a
333        <code>SeekableByteChannel</code> that will be obtained
334        transparently when reading from or writing to a file. The
335        class
336        <code>org.apache.commons.compress.utils.SeekableInMemoryByteChannel</code>
337        allows you to read from or write to an in-memory archive.</p>
338
339        <p>Adding an entry to a 7z archive:</p>
340<source><![CDATA[
341SevenZOutputFile sevenZOutput = new SevenZOutputFile(file);
342SevenZArchiveEntry entry = sevenZOutput.createArchiveEntry(fileToArchive, name);
343sevenZOutput.putArchiveEntry(entry);
344sevenZOutput.write(contentOfEntry);
345sevenZOutput.closeArchiveEntry();
346]]></source>
347
348        <p>Uncompressing a given 7z archive (you would
349          certainly add exception handling and make sure all streams
350          get closed properly):</p>
351<source><![CDATA[
352SevenZFile sevenZFile = new SevenZFile(new File("archive.7z"));
353SevenZArchiveEntry entry = sevenZFile.getNextEntry();
354byte[] content = new byte[entry.getSize()];
355LOOP UNTIL entry.getSize() HAS BEEN READ {
356    sevenZFile.read(content, offset, content.length - offset);
357}
358]]></source>
359
360          <p>Uncompressing a given in-memory 7z archive:</p>
361          <source><![CDATA[
362byte[] inputData; // 7z archive contents
363SeekableInMemoryByteChannel inMemoryByteChannel = new SeekableInMemoryByteChannel(inputData);
364SevenZFile sevenZFile = new SevenZFile(inMemoryByteChannel);
365SevenZArchiveEntry entry = sevenZFile.getNextEntry();
366sevenZFile.read();  // read current entry's data
367]]></source>
368
369          <h4><a name="Encrypted 7z Archives"></a>Encrypted 7z Archives</h4>
370
371          <p>Currently Compress supports reading but not writing of
372          encrypted archives. When reading an encrypted archive a
373          password has to be provided to one of
374          <code>SevenZFile</code>'s constructors. If you try to read
375          an encrypted archive without specifying a password a
376          <code>PasswordRequiredException</code> (a subclass of
377          <code>IOException</code>) will be thrown.</p>
378
379          <p>When specifying the password as a <code>byte[]</code> one
380          common mistake is to use the wrong encoding when creating
381          the <code>byte[]</code> from a <code>String</code>. The
382          <code>SevenZFile</code> class expects the bytes to
383          correspond to the UTF16-LE encoding of the password. An
384          example of reading an encrypted archive is</p>
385
386<source><![CDATA[
387SevenZFile sevenZFile = new SevenZFile(new File("archive.7z"), "secret".getBytes(StandardCharsets.UTF_16LE));
388SevenZArchiveEntry entry = sevenZFile.getNextEntry();
389byte[] content = new byte[entry.getSize()];
390LOOP UNTIL entry.getSize() HAS BEEN READ {
391    sevenZFile.read(content, offset, content.length - offset);
392}
393]]></source>
394
395        <p>Starting with Compress 1.17 new constructors have been
396        added that accept the password as <code>char[]</code> rather
397        than a <code>byte[]</code>. We recommend you use these in
398        order to avoid the problem above.</p>
399
400<source><![CDATA[
401SevenZFile sevenZFile = new SevenZFile(new File("archive.7z"), "secret".toCharArray());
402SevenZArchiveEntry entry = sevenZFile.getNextEntry();
403byte[] content = new byte[entry.getSize()];
404LOOP UNTIL entry.getSize() HAS BEEN READ {
405    sevenZFile.read(content, offset, content.length - offset);
406}
407]]></source>
408
409      </subsection>
410
411      <subsection name="ar">
412
413        <p>In addition to the information stored
414          in <code>ArchiveEntry</code> a <code>ArArchiveEntry</code>
415          stores information about the owner user and group as well as
416          Unix permissions.</p>
417
418        <p>Adding an entry to an ar archive:</p>
419<source><![CDATA[
420ArArchiveEntry entry = new ArArchiveEntry(name, size);
421arOutput.putArchiveEntry(entry);
422arOutput.write(contentOfEntry);
423arOutput.closeArchiveEntry();
424]]></source>
425
426        <p>Reading entries from an ar archive:</p>
427<source><![CDATA[
428ArArchiveEntry entry = (ArArchiveEntry) arInput.getNextEntry();
429byte[] content = new byte[entry.getSize()];
430LOOP UNTIL entry.getSize() HAS BEEN READ {
431    arInput.read(content, offset, content.length - offset);
432}
433]]></source>
434
435        <p>Traditionally the AR format doesn't allow file names longer
436          than 16 characters.  There are two variants that circumvent
437          this limitation in different ways, the GNU/SRV4 and the BSD
438          variant.  Commons Compress 1.0 to 1.2 can only read archives
439          using the GNU/SRV4 variant, support for the BSD variant has
440          been added in Commons Compress 1.3.  Commons Compress 1.3
441          also optionally supports writing archives with file names
442          longer than 16 characters using the BSD dialect, writing
443          the SVR4/GNU dialect is not supported.</p>
444
445        <table>
446          <thead>
447            <tr>
448              <th>Version of Apache Commons Compress</th>
449              <th>Support for Traditional AR Format</th>
450              <th>Support for GNU/SRV4 Dialect</th>
451              <th>Support for BSD Dialect</th>
452            </tr>
453          </thead>
454          <tbody>
455            <tr>
456              <td>1.0 to 1.2</td>
457              <td>read/write</td>
458              <td>read</td>
459              <td>-</td>
460            </tr>
461            <tr>
462              <td>1.3 and later</td>
463              <td>read/write</td>
464              <td>read</td>
465              <td>read/write</td>
466            </tr>
467          </tbody>
468        </table>
469
470        <p>It is not possible to detect the end of an AR archive in a
471        reliable way so <code>ArArchiveInputStream</code> will read
472        until it reaches the end of the stream or fails to parse the
473        stream's content as AR entries.</p>
474
475      </subsection>
476
477      <subsection name="arj">
478
479        <p>Note that Commons Compress doesn't support compressed,
480        encrypted or multi-volume ARJ archives, yet.</p>
481
482        <p>Uncompressing a given arj archive (you would
483          certainly add exception handling and make sure all streams
484          get closed properly):</p>
485<source><![CDATA[
486ArjArchiveEntry entry = arjInput.getNextEntry();
487byte[] content = new byte[entry.getSize()];
488LOOP UNTIL entry.getSize() HAS BEEN READ {
489    arjInput.read(content, offset, content.length - offset);
490}
491]]></source>
492      </subsection>
493
494      <subsection name="cpio">
495
496        <p>In addition to the information stored
497          in <code>ArchiveEntry</code> a <code>CpioArchiveEntry</code>
498          stores various attributes including information about the
499          original owner and permissions.</p>
500
501        <p>The cpio package supports the "new portable" as well as the
502          "old" format of CPIO archives in their binary, ASCII and
503          "with CRC" variants.</p>
504
505        <p>Adding an entry to a cpio archive:</p>
506<source><![CDATA[
507CpioArchiveEntry entry = new CpioArchiveEntry(name, size);
508cpioOutput.putArchiveEntry(entry);
509cpioOutput.write(contentOfEntry);
510cpioOutput.closeArchiveEntry();
511]]></source>
512
513        <p>Reading entries from an cpio archive:</p>
514<source><![CDATA[
515CpioArchiveEntry entry = cpioInput.getNextCPIOEntry();
516byte[] content = new byte[entry.getSize()];
517LOOP UNTIL entry.getSize() HAS BEEN READ {
518    cpioInput.read(content, offset, content.length - offset);
519}
520]]></source>
521
522        <p>Traditionally CPIO archives are written in blocks of 512
523        bytes - the block size is a configuration parameter of the
524        <code>Cpio*Stream</code>'s constuctors.  Starting with version
525        1.5 <code>CpioArchiveInputStream</code> will consume the
526        padding written to fill the current block when the end of the
527        archive is reached.  Unfortunately many CPIO implementations
528        use larger block sizes so there may be more zero-byte padding
529        left inside the original input stream after the archive has
530        been consumed completely.</p>
531
532      </subsection>
533
534      <subsection name="jar">
535        <p>In general, JAR archives are ZIP files, so the JAR package
536          supports all options provided by the <a href="#zip">ZIP</a> package.</p>
537
538        <p>To be interoperable JAR archives should always be created
539          using the UTF-8 encoding for file names (which is the
540          default).</p>
541
542        <p>Archives created using <code>JarArchiveOutputStream</code>
543          will implicitly add a <code>JarMarker</code> extra field to
544          the very first archive entry of the archive which will make
545          Solaris recognize them as Java archives and allows them to
546          be used as executables.</p>
547
548        <p>Note that <code>ArchiveStreamFactory</code> doesn't
549          distinguish ZIP archives from JAR archives, so if you use
550          the one-argument <code>createArchiveInputStream</code>
551          method on a JAR archive, it will still return the more
552          generic <code>ZipArchiveInputStream</code>.</p>
553
554        <p>The <code>JarArchiveEntry</code> class contains fields for
555          certificates and attributes that are planned to be supported
556          in the future but are not supported as of Compress 1.0.</p>
557
558        <p>Adding an entry to a jar archive:</p>
559<source><![CDATA[
560JarArchiveEntry entry = new JarArchiveEntry(name, size);
561entry.setSize(size);
562jarOutput.putArchiveEntry(entry);
563jarOutput.write(contentOfEntry);
564jarOutput.closeArchiveEntry();
565]]></source>
566
567        <p>Reading entries from an jar archive:</p>
568<source><![CDATA[
569JarArchiveEntry entry = jarInput.getNextJarEntry();
570byte[] content = new byte[entry.getSize()];
571LOOP UNTIL entry.getSize() HAS BEEN READ {
572    jarInput.read(content, offset, content.length - offset);
573}
574]]></source>
575      </subsection>
576
577      <subsection name="dump">
578
579        <p>In addition to the information stored
580          in <code>ArchiveEntry</code> a <code>DumpArchiveEntry</code>
581          stores various attributes including information about the
582          original owner and permissions.</p>
583
584        <p>As of Commons Compress 1.3 only dump archives using the
585          new-fs format - this is the most common variant - are
586          supported.  Right now this library supports uncompressed and
587          ZLIB compressed archives and can not write archives at
588          all.</p>
589
590        <p>Reading entries from an dump archive:</p>
591<source><![CDATA[
592DumpArchiveEntry entry = dumpInput.getNextDumpEntry();
593byte[] content = new byte[entry.getSize()];
594LOOP UNTIL entry.getSize() HAS BEEN READ {
595    dumpInput.read(content, offset, content.length - offset);
596}
597]]></source>
598
599        <p>Prior to version 1.5 <code>DumpArchiveInputStream</code>
600        would close the original input once it had read the last
601        record.  Starting with version 1.5 it will not close the
602        stream implicitly.</p>
603
604      </subsection>
605
606      <subsection name="tar">
607
608        <p>The TAR package has a <a href="tar.html">dedicated
609            documentation page</a>.</p>
610
611        <p>Adding an entry to a tar archive:</p>
612<source><![CDATA[
613TarArchiveEntry entry = new TarArchiveEntry(name);
614entry.setSize(size);
615tarOutput.putArchiveEntry(entry);
616tarOutput.write(contentOfEntry);
617tarOutput.closeArchiveEntry();
618]]></source>
619
620        <p>Reading entries from an tar archive:</p>
621<source><![CDATA[
622TarArchiveEntry entry = tarInput.getNextTarEntry();
623byte[] content = new byte[entry.getSize()];
624LOOP UNTIL entry.getSize() HAS BEEN READ {
625    tarInput.read(content, offset, content.length - offset);
626}
627]]></source>
628      </subsection>
629
630      <subsection name="zip">
631        <p>The ZIP package has a <a href="zip.html">dedicated
632            documentation page</a>.</p>
633
634        <p>Adding an entry to a zip archive:</p>
635<source><![CDATA[
636ZipArchiveEntry entry = new ZipArchiveEntry(name);
637entry.setSize(size);
638zipOutput.putArchiveEntry(entry);
639zipOutput.write(contentOfEntry);
640zipOutput.closeArchiveEntry();
641]]></source>
642
643        <p><code>ZipArchiveOutputStream</code> can use some internal
644          optimizations exploiting <code>SeekableByteChannel</code> if it
645          knows it is writing to a seekable output rather than a non-seekable
646          stream.  If you are writing to a file, you should use the
647          constructor that accepts a <code>File</code> or
648          <code>SeekableByteChannel</code> argument rather
649          than the one using an <code>OutputStream</code> or the
650          factory method in <code>ArchiveStreamFactory</code>.</p>
651
652        <p>Reading entries from an zip archive:</p>
653<source><![CDATA[
654ZipArchiveEntry entry = zipInput.getNextZipEntry();
655byte[] content = new byte[entry.getSize()];
656LOOP UNTIL entry.getSize() HAS BEEN READ {
657    zipInput.read(content, offset, content.length - offset);
658}
659]]></source>
660
661        <p>Reading entries from an zip archive using the
662          recommended <code>ZipFile</code> class:</p>
663<source><![CDATA[
664ZipArchiveEntry entry = zipFile.getEntry(name);
665InputStream content = zipFile.getInputStream(entry);
666try {
667    READ UNTIL content IS EXHAUSTED
668} finally {
669    content.close();
670}
671]]></source>
672
673          <p>Reading entries from an in-memory zip archive using
674              <code>SeekableInMemoryByteChannel</code> and <code>ZipFile</code> class:</p>
675<source><![CDATA[
676byte[] inputData; // zip archive contents
677SeekableInMemoryByteChannel inMemoryByteChannel = new SeekableInMemoryByteChannel(inputData);
678ZipFile zipFile = new ZipFile(inMemoryByteChannel);
679ZipArchiveEntry archiveEntry = zipFile.getEntry("entryName");
680InputStream inputStream = zipFile.getInputStream(archiveEntry);
681inputStream.read() // read data from the input stream
682]]></source>
683
684          <p>Creating a zip file with multiple threads:</p>
685
686          A simple implementation to create a zip file might look like this:
687
688<source>
689public class ScatterSample {
690
691  ParallelScatterZipCreator scatterZipCreator = new ParallelScatterZipCreator();
692  ScatterZipOutputStream dirs = ScatterZipOutputStream.fileBased(File.createTempFile("scatter-dirs", "tmp"));
693
694  public ScatterSample() throws IOException {
695  }
696
697  public void addEntry(ZipArchiveEntry zipArchiveEntry, InputStreamSupplier streamSupplier) throws IOException {
698     if (zipArchiveEntry.isDirectory() &amp;&amp; !zipArchiveEntry.isUnixSymlink())
699        dirs.addArchiveEntry(ZipArchiveEntryRequest.createZipArchiveEntryRequest(zipArchiveEntry, streamSupplier));
700     else
701        scatterZipCreator.addArchiveEntry( zipArchiveEntry, streamSupplier);
702  }
703
704  public void writeTo(ZipArchiveOutputStream zipArchiveOutputStream)
705  throws IOException, ExecutionException, InterruptedException {
706     dirs.writeTo(zipArchiveOutputStream);
707     dirs.close();
708     scatterZipCreator.writeTo(zipArchiveOutputStream);
709  }
710}
711</source>
712      </subsection>
713
714    </section>
715    <section name="Compressors">
716
717      <subsection name="Concatenated Streams">
718        <p>For the bzip2, gzip and xz formats as well as the framed
719        lz4 format a single compressed file
720        may actually consist of several streams that will be
721        concatenated by the command line utilities when decompressing
722        them.  Starting with Commons Compress 1.4 the
723        <code>*CompressorInputStream</code>s for these formats support
724        concatenating streams as well, but they won't do so by
725        default.  You must use the two-arg constructor and explicitly
726        enable the support.</p>
727      </subsection>
728
729      <subsection name="Brotli">
730
731        <p>The implementation of this package is provided by the
732          <a href="https://github.com/google/brotli">Google Brotli dec</a> library.</p>
733
734        <p>Uncompressing a given Brotli compressed file (you would
735          certainly add exception handling and make sure all streams
736          get closed properly):</p>
737<source><![CDATA[
738InputStream fin = Files.newInputStream(Paths.get("archive.tar.br"));
739BufferedInputStream in = new BufferedInputStream(fin);
740OutputStream out = Files.newOutputStream(Paths.get("archive.tar"));
741BrotliCompressorInputStream brIn = new BrotliCompressorInputStream(in);
742final byte[] buffer = new byte[buffersize];
743int n = 0;
744while (-1 != (n = brIn.read(buffer))) {
745    out.write(buffer, 0, n);
746}
747out.close();
748brIn.close();
749]]></source>
750      </subsection>
751
752      <subsection name="bzip2">
753
754        <p>Note that <code>BZipCompressorOutputStream</code> keeps
755          hold of some big data structures in memory.  While it is
756          recommended for <em>any</em> stream that you close it as soon as
757          you no longer need it, this is even more important
758          for <code>BZipCompressorOutputStream</code>.</p>
759
760        <p>Uncompressing a given bzip2 compressed file (you would
761          certainly add exception handling and make sure all streams
762          get closed properly):</p>
763<source><![CDATA[
764InputStream fin = Files.newInputStream(Paths.get("archive.tar.bz2"));
765BufferedInputStream in = new BufferedInputStream(fin);
766OutputStream out = Files.newOutputStream(Paths.get("archive.tar"));
767BZip2CompressorInputStream bzIn = new BZip2CompressorInputStream(in);
768final byte[] buffer = new byte[buffersize];
769int n = 0;
770while (-1 != (n = bzIn.read(buffer))) {
771    out.write(buffer, 0, n);
772}
773out.close();
774bzIn.close();
775]]></source>
776
777        <p>Compressing a given file using bzip2 (you would
778          certainly add exception handling and make sure all streams
779          get closed properly):</p>
780<source><![CDATA[
781InputStream in = Files.newInputStream(Paths.get("archive.tar"));
782OutputStream fout = Files.newOutputStream(Paths.get("archive.tar.gz"));
783BufferedOutputStream out = new BufferedOutputStream(fout);
784BZip2CompressorOutputStream bzOut = new BZip2CompressorOutputStream(out);
785final byte[] buffer = new byte[buffersize];
786int n = 0;
787while (-1 != (n = in.read(buffer))) {
788    bzOut.write(buffer, 0, n);
789}
790bzOut.close();
791in.close();
792]]></source>
793
794      </subsection>
795
796      <subsection name="DEFLATE">
797
798        <p>The implementation of the DEFLATE/INFLATE code used by this
799        package is provided by the <code>java.util.zip</code> package
800        of the Java class library.</p>
801
802        <p>Uncompressing a given DEFLATE compressed file (you would
803          certainly add exception handling and make sure all streams
804          get closed properly):</p>
805<source><![CDATA[
806InputStream fin = Files.newInputStream(Paths.get("some-file"));
807BufferedInputStream in = new BufferedInputStream(fin);
808OutputStream out = Files.newOutputStream(Paths.get("archive.tar"));
809DeflateCompressorInputStream defIn = new DeflateCompressorInputStream(in);
810final byte[] buffer = new byte[buffersize];
811int n = 0;
812while (-1 != (n = defIn.read(buffer))) {
813    out.write(buffer, 0, n);
814}
815out.close();
816defIn.close();
817]]></source>
818
819        <p>Compressing a given file using DEFLATE (you would
820          certainly add exception handling and make sure all streams
821          get closed properly):</p>
822<source><![CDATA[
823InputStream in = Files.newInputStream(Paths.get("archive.tar"));
824OutputStream fout = Files.newOutputStream(Paths.get("some-file"));
825BufferedOutputStream out = new BufferedOutputStream(fout);
826DeflateCompressorOutputStream defOut = new DeflateCompressorOutputStream(out);
827final byte[] buffer = new byte[buffersize];
828int n = 0;
829while (-1 != (n = in.read(buffer))) {
830    defOut.write(buffer, 0, n);
831}
832defOut.close();
833in.close();
834]]></source>
835
836      </subsection>
837
838      <subsection name="DEFLATE64">
839
840        <p>Uncompressing a given DEFLATE64 compressed file (you would
841          certainly add exception handling and make sure all streams
842          get closed properly):</p>
843<source><![CDATA[
844InputStream fin = Files.newInputStream(Paths.get("some-file"));
845BufferedInputStream in = new BufferedInputStream(fin);
846OutputStream out = Files.newOutputStream(Paths.get("archive.tar"));
847Deflate64CompressorInputStream defIn = new Deflate64CompressorInputStream(in);
848final byte[] buffer = new byte[buffersize];
849int n = 0;
850while (-1 != (n = defIn.read(buffer))) {
851    out.write(buffer, 0, n);
852}
853out.close();
854defIn.close();
855]]></source>
856
857      </subsection>
858
859      <subsection name="gzip">
860
861        <p>The implementation of the DEFLATE/INFLATE code used by this
862        package is provided by the <code>java.util.zip</code> package
863        of the Java class library.</p>
864
865        <p>Uncompressing a given gzip compressed file (you would
866          certainly add exception handling and make sure all streams
867          get closed properly):</p>
868<source><![CDATA[
869InputStream fin = Files.newInputStream(Paths.get("archive.tar.gz"));
870BufferedInputStream in = new BufferedInputStream(fin);
871OutputStream out = Files.newOutputStream(Paths.get("archive.tar"));
872GzipCompressorInputStream gzIn = new GzipCompressorInputStream(in);
873final byte[] buffer = new byte[buffersize];
874int n = 0;
875while (-1 != (n = gzIn.read(buffer))) {
876    out.write(buffer, 0, n);
877}
878out.close();
879gzIn.close();
880]]></source>
881
882        <p>Compressing a given file using gzip (you would
883          certainly add exception handling and make sure all streams
884          get closed properly):</p>
885<source><![CDATA[
886InputStream in = Files.newInputStream(Paths.get("archive.tar"));
887OutputStream fout = Files.newOutputStream(Paths.get("archive.tar.gz"));
888BufferedOutputStream out = new BufferedOutputStream(fout);
889GzipCompressorOutputStream gzOut = new GzipCompressorOutputStream(out);
890final byte[] buffer = new byte[buffersize];
891int n = 0;
892while (-1 != (n = in.read(buffer))) {
893    gzOut.write(buffer, 0, n);
894}
895gzOut.close();
896in.close();
897]]></source>
898
899      </subsection>
900
901      <subsection name="LZ4">
902
903        <p>There are two different "formats" used for <a
904        href="http://lz4.github.io/lz4/">lz4</a>. The format called
905        "block format" only contains the raw compressed data while the
906        other provides a higher level "frame format" - Commons
907        Compress offers two different stream classes for reading or
908        writing either format.</p>
909
910        <p>Uncompressing a given frame LZ4 file (you would
911          certainly add exception handling and make sure all streams
912          get closed properly):</p>
913<source><![CDATA[
914InputStream fin = Files.newInputStream(Paths.get("archive.tar.lz4"));
915BufferedInputStream in = new BufferedInputStream(fin);
916OutputStream out = Files.newOutputStream(Paths.get("archive.tar"));
917FramedLZ4CompressorInputStream zIn = new FramedLZ4CompressorInputStream(in);
918final byte[] buffer = new byte[buffersize];
919int n = 0;
920while (-1 != (n = zIn.read(buffer))) {
921    out.write(buffer, 0, n);
922}
923out.close();
924zIn.close();
925]]></source>
926
927        <p>Compressing a given file using the LZ4 frame format (you would
928          certainly add exception handling and make sure all streams
929          get closed properly):</p>
930<source><![CDATA[
931InputStream in = Files.newInputStream(Paths.get("archive.tar"));
932OutputStream fout = Files.newOutputStream(Paths.get("archive.tar.lz4"));
933BufferedOutputStream out = new BufferedOutputStream(fout);
934FramedLZ4CompressorOutputStream lzOut = new FramedLZ4CompressorOutputStream(out);
935final byte[] buffer = new byte[buffersize];
936int n = 0;
937while (-1 != (n = in.read(buffer))) {
938    lzOut.write(buffer, 0, n);
939}
940lzOut.close();
941in.close();
942]]></source>
943
944      </subsection>
945
946      <subsection name="lzma">
947
948        <p>The implementation of this package is provided by the
949          public domain <a href="https://tukaani.org/xz/java.html">XZ
950          for Java</a> library.</p>
951
952        <p>Uncompressing a given lzma compressed file (you would
953          certainly add exception handling and make sure all streams
954          get closed properly):</p>
955<source><![CDATA[
956InputStream fin = Files.newInputStream(Paths.get("archive.tar.lzma"));
957BufferedInputStream in = new BufferedInputStream(fin);
958OutputStream out = Files.newOutputStream(Paths.get("archive.tar"));
959LZMACompressorInputStream lzmaIn = new LZMACompressorInputStream(in);
960final byte[] buffer = new byte[buffersize];
961int n = 0;
962while (-1 != (n = xzIn.read(buffer))) {
963    out.write(buffer, 0, n);
964}
965out.close();
966lzmaIn.close();
967]]></source>
968
969        <p>Compressing a given file using lzma (you would
970          certainly add exception handling and make sure all streams
971          get closed properly):</p>
972<source><![CDATA[
973InputStream in = Files.newInputStream(Paths.get("archive.tar"));
974OutputStream fout = Files.newOutputStream(Paths.get("archive.tar.lzma"));
975BufferedOutputStream out = new BufferedOutputStream(fout);
976LZMACompressorOutputStream lzOut = new LZMACompressorOutputStream(out);
977final byte[] buffer = new byte[buffersize];
978int n = 0;
979while (-1 != (n = in.read(buffer))) {
980    lzOut.write(buffer, 0, n);
981}
982lzOut.close();
983in.close();
984]]></source>
985
986      </subsection>
987
988      <subsection name="Pack200">
989
990        <p>The Pack200 package has a <a href="pack200.html">dedicated
991          documentation page</a>.</p>
992
993        <p>The implementation of this package is provided by
994          the <code>java.util.zip</code> package of the Java class
995          library.</p>
996
997        <p>Uncompressing a given pack200 compressed file (you would
998          certainly add exception handling and make sure all streams
999          get closed properly):</p>
1000<source><![CDATA[
1001InputStream fin = Files.newInputStream(Paths.get("archive.pack"));
1002BufferedInputStream in = new BufferedInputStream(fin);
1003OutputStream out = Files.newOutputStream(Paths.get("archive.jar"));
1004Pack200CompressorInputStream pIn = new Pack200CompressorInputStream(in);
1005final byte[] buffer = new byte[buffersize];
1006int n = 0;
1007while (-1 != (n = pIn.read(buffer))) {
1008    out.write(buffer, 0, n);
1009}
1010out.close();
1011pIn.close();
1012]]></source>
1013
1014        <p>Compressing a given jar using pack200 (you would
1015          certainly add exception handling and make sure all streams
1016          get closed properly):</p>
1017<source><![CDATA[
1018InputStream in = Files.newInputStream(Paths.get("archive.jar"));
1019OutputStream fout = Files.newOutputStream(Paths.get("archive.pack"));
1020BufferedOutputStream out = new BufferedInputStream(fout);
1021Pack200CompressorOutputStream pOut = new Pack200CompressorOutputStream(out);
1022final byte[] buffer = new byte[buffersize];
1023int n = 0;
1024while (-1 != (n = in.read(buffer))) {
1025    pOut.write(buffer, 0, n);
1026}
1027pOut.close();
1028in.close();
1029]]></source>
1030
1031      </subsection>
1032
1033      <subsection name="Snappy">
1034
1035        <p>There are two different "formats" used for <a
1036        href="https://github.com/google/snappy/">Snappy</a>, one only
1037        contains the raw compressed data while the other provides a
1038        higher level "framing format" - Commons Compress offers two
1039        different stream classes for reading either format.</p>
1040
1041        <p>Starting with 1.12 we've added support for different
1042        dialects of the framing format that can be specified when
1043        constructing the stream. The <code>STANDARD</code> dialect
1044        follows the "framing format" specification while the
1045        <code>IWORK_ARCHIVE</code> dialect can be used to parse IWA
1046        files that are part of Apple's iWork 13 format. If no dialect
1047        has been specified, <code>STANDARD</code> is used. Only the
1048        <code>STANDARD</code> format can be detected by
1049        <code>CompressorStreamFactory</code>.</p>
1050
1051        <p>Uncompressing a given framed Snappy file (you would
1052          certainly add exception handling and make sure all streams
1053          get closed properly):</p>
1054<source><![CDATA[
1055InputStream fin = Files.newInputStream(Paths.get("archive.tar.sz"));
1056BufferedInputStream in = new BufferedInputStream(fin);
1057OutputStream out = Files.newOutputStream(Paths.get("archive.tar"));
1058FramedSnappyCompressorInputStream zIn = new FramedSnappyCompressorInputStream(in);
1059final byte[] buffer = new byte[buffersize];
1060int n = 0;
1061while (-1 != (n = zIn.read(buffer))) {
1062    out.write(buffer, 0, n);
1063}
1064out.close();
1065zIn.close();
1066]]></source>
1067
1068        <p>Compressing a given file using framed Snappy (you would
1069          certainly add exception handling and make sure all streams
1070          get closed properly):</p>
1071<source><![CDATA[
1072InputStream in = Files.newInputStream(Paths.get("archive.tar"));
1073OutputStream fout = Files.newOutputStream(Paths.get("archive.tar.sz"));
1074BufferedOutputStream out = new BufferedOutputStream(fout);
1075FramedSnappyCompressorOutputStream snOut = new FramedSnappyCompressorOutputStream(out);
1076final byte[] buffer = new byte[buffersize];
1077int n = 0;
1078while (-1 != (n = in.read(buffer))) {
1079    snOut.write(buffer, 0, n);
1080}
1081snOut.close();
1082in.close();
1083]]></source>
1084
1085      </subsection>
1086
1087      <subsection name="XZ">
1088
1089        <p>The implementation of this package is provided by the
1090          public domain <a href="https://tukaani.org/xz/java.html">XZ
1091          for Java</a> library.</p>
1092
1093        <p>When you try to open an XZ stream for reading using
1094        <code>CompressorStreamFactory</code>, Commons Compress will
1095        check whether the XZ for Java library is available.  Starting
1096        with Compress 1.9 the result of this check will be cached
1097        unless Compress finds OSGi classes in its classpath.  You can
1098        use <code>XZUtils#setCacheXZAvailability</code> to overrride
1099        this default behavior.</p>
1100
1101        <p>Uncompressing a given XZ compressed file (you would
1102          certainly add exception handling and make sure all streams
1103          get closed properly):</p>
1104<source><![CDATA[
1105InputStream fin = Files.newInputStream(Paths.get("archive.tar.xz"));
1106BufferedInputStream in = new BufferedInputStream(fin);
1107OutputStream out = Files.newOutputStream(Paths.get("archive.tar"));
1108XZCompressorInputStream xzIn = new XZCompressorInputStream(in);
1109final byte[] buffer = new byte[buffersize];
1110int n = 0;
1111while (-1 != (n = xzIn.read(buffer))) {
1112    out.write(buffer, 0, n);
1113}
1114out.close();
1115xzIn.close();
1116]]></source>
1117
1118        <p>Compressing a given file using XZ (you would
1119          certainly add exception handling and make sure all streams
1120          get closed properly):</p>
1121<source><![CDATA[
1122InputStream in = Files.newInputStream(Paths.get("archive.tar"));
1123OutputStream fout = Files.newOutputStream(Paths.get("archive.tar.xz"));
1124BufferedOutputStream out = new BufferedInputStream(fout);
1125XZCompressorOutputStream xzOut = new XZCompressorOutputStream(out);
1126final byte[] buffer = new byte[buffersize];
1127int n = 0;
1128while (-1 != (n = in.read(buffer))) {
1129    xzOut.write(buffer, 0, n);
1130}
1131xzOut.close();
1132in.close();
1133]]></source>
1134
1135      </subsection>
1136
1137      <subsection name="Z">
1138
1139        <p>Uncompressing a given Z compressed file (you would
1140          certainly add exception handling and make sure all streams
1141          get closed properly):</p>
1142<source><![CDATA[
1143InputStream fin = Files.newInputStream(Paths.get("archive.tar.Z"));
1144BufferedInputStream in = new BufferedInputStream(fin);
1145OutputStream out = Files.newOutputStream(Paths.get("archive.tar"));
1146ZCompressorInputStream zIn = new ZCompressorInputStream(in);
1147final byte[] buffer = new byte[buffersize];
1148int n = 0;
1149while (-1 != (n = zIn.read(buffer))) {
1150    out.write(buffer, 0, n);
1151}
1152out.close();
1153zIn.close();
1154]]></source>
1155
1156      </subsection>
1157
1158      <subsection name="Zstandard">
1159
1160        <p>The implementation of this package is provided by the
1161          <a href="https://github.com/luben/zstd-jni">Zstandard JNI</a> library.</p>
1162
1163        <p>Uncompressing a given Zstandard compressed file (you would
1164          certainly add exception handling and make sure all streams
1165          get closed properly):</p>
1166<source><![CDATA[
1167InputStream fin = Files.newInputStream(Paths.get("archive.tar.zstd"));
1168BufferedInputStream in = new BufferedInputStream(fin);
1169OutputStream out = Files.newOutputStream(Paths.get("archive.tar"));
1170ZstdCompressorInputStream zsIn = new ZstdCompressorInputStream(in);
1171final byte[] buffer = new byte[buffersize];
1172int n = 0;
1173while (-1 != (n = zsIn.read(buffer))) {
1174    out.write(buffer, 0, n);
1175}
1176out.close();
1177zsIn.close();
1178]]></source>
1179
1180        <p>Compressing a given file using the Zstandard format (you
1181        would certainly add exception handling and make sure all
1182        streams get closed properly):</p>
1183<source><![CDATA[
1184InputStream in = Files.newInputStream(Paths.get("archive.tar"));
1185OutputStream fout = Files.newOutputStream(Paths.get("archive.tar.zstd"));
1186BufferedOutputStream out = new BufferedOutputStream(fout);
1187ZstdCompressorOutputStream zOut = new ZstdCompressorOutputStream(out);
1188final byte[] buffer = new byte[buffersize];
1189int n = 0;
1190while (-1 != (n = in.read(buffer))) {
1191    zOut.write(buffer, 0, n);
1192}
1193zOut.close();
1194in.close();
1195]]></source>
1196
1197      </subsection>
1198    </section>
1199
1200    <section name="Extending Commons Compress">
1201
1202        <p>
1203          Starting in release 1.13, it is now possible to add Compressor- and ArchiverStream implementations using the
1204          Java's <a href="https://docs.oracle.com/javase/7/docs/api/java/util/ServiceLoader.html">ServiceLoader</a>
1205          mechanism.
1206        </p>
1207
1208    <subsection name="Extending Commons Compress Compressors">
1209
1210        <p>
1211          To provide your own compressor, you must make available on the classpath a file called
1212          <code>META-INF/services/org.apache.commons.compress.compressors.CompressorStreamProvider</code>.
1213        </p>
1214        <p>
1215          This file MUST contain one fully-qualified class name per line.
1216        </p>
1217        <p>
1218          For example:
1219        </p>
1220        <pre>org.apache.commons.compress.compressors.TestCompressorStreamProvider</pre>
1221        <p>
1222          This class MUST implement the Commons Compress interface
1223          <a href="apidocs/org/apache/commons/compress/compressors/CompressorStreamProvider.html">org.apache.commons.compress.compressors.CompressorStreamProvider</a>.
1224        </p>
1225    </subsection>
1226
1227    <subsection name="Extending Commons Compress Archivers">
1228
1229        <p>
1230          To provide your own compressor, you must make available on the classpath a file called
1231          <code>META-INF/services/org.apache.commons.compress.archivers.ArchiveStreamProvider</code>.
1232        </p>
1233        <p>
1234          This file MUST contain one fully-qualified class name per line.
1235        </p>
1236        <p>
1237          For example:
1238        </p>
1239        <pre>org.apache.commons.compress.archivers.TestArchiveStreamProvider</pre>
1240        <p>
1241          This class MUST implement the Commons Compress interface
1242          <a href="apidocs/org/apache/commons/compress/archivers/ArchiveStreamProvider.html">org.apache.commons.compress.archivers.ArchiveStreamProvider</a>.
1243        </p>
1244    </subsection>
1245
1246    </section>
1247  </body>
1248</document>
1249