Tuesday, September 6, 2016

How to create a very specific zip file structure using Java

Leave a Comment

I am developing software to integrate into a legacy system, where I send zip files to an FTP server and the legacy system scans periodically dumping the files into a folder, either "Completed" or "Maybe_corrupted".

My files are systematically dumped into the "maybe_corrupted" folder. After investigation it appears that this is due the structure of the zip files.

If I download the file from the FTP onto a Windows computer, I unzip all of the zip files (there are zip files inside zip files) and rezip the files into the exact same configuration the legacy program accepts the files.

Specifically, my question is - how can I parameterize java.util.zip, and what parameters should I modify to bring it closer to the default windows zip utility ?

File comparison

As suggested by Tobias Otto, I have used a file comparator (UltraCompare) to do a comparison between 2 binary files :

  • The file originally downloaded from the FTP and rejected by the legacy system (left)
  • The decompressed-recompressed file which was accepted (right)

Simply comparing the bytes, it is apparent that the files are not the same. Note that bytes in red are differences, grey are similarities :

enter image description here

Zip details

As suggested by David Duponchel, I have used zipdetails to extract the structure of each file. Obviously this is a very powerful tool but I'm not sure how to interpret the ouput, as stated in the usage guide, the output makes reference to this document.

Original file :

00000 LOCAL HEADER #1       04034B50 00004 Extract Zip Spec      14 '2.0' 00005 Extract OS            00 'MS-DOS' 00006 General Purpose Flag  0808       [Bits 1-2]            0 'Normal Compression'       [Bit  3]              1 'Streamed'       [Bit 11]              1 'Language Encoding' 00008 Compression Method    0008 'Deflated' 0000A Last Mod Time         491E43F7 'Tue Aug 30 08:31:46 2016' 0000E CRC                   00000000 00012 Compressed Length     00000000 00016 Uncompressed Length   00000000 0001A Filename Length       0018 0001C Extra Length          0000 0001E Filename              'Mean.20160830_073000.zip' 00036 PAYLOAD  05002 STREAMING DATA HEADER 08074B50 05006 CRC                   A21A5BC4 0500A Compressed Length     00004FCC 0500E Uncompressed Length   000054D8  05012 LOCAL HEADER #2       04034B50 05016 Extract Zip Spec      14 '2.0' 05017 Extract OS            00 'MS-DOS' 05018 General Purpose Flag  0808       [Bits 1-2]            0 'Normal Compression'       [Bit  3]              1 'Streamed'       [Bit 11]              1 'Language Encoding' 0501A Compression Method    0008 'Deflated' 0501C Last Mod Time         491E43F7 'Tue Aug 30 08:31:46 2016' 05020 CRC                   00000000 05024 Compressed Length     00000000 05028 Uncompressed Length   00000000 0502C Filename Length       0018 0502E Extra Length          0000 05030 Filename              'Mean.20160830_081500.zip' 05048 PAYLOAD  08FFF STREAMING DATA HEADER 08074B50 09003 CRC                   BAE824D6 09007 Compressed Length     00003FB7 0900B Uncompressed Length   000043A3  0900F LOCAL HEADER #3       04034B50 09013 Extract Zip Spec      14 '2.0' 09014 Extract OS            00 'MS-DOS' 09015 General Purpose Flag  0808       [Bits 1-2]            0 'Normal Compression'       [Bit  3]              1 'Streamed'       [Bit 11]              1 'Language Encoding' 09017 Compression Method    0008 'Deflated' 09019 Last Mod Time         491E43F7 'Tue Aug 30 08:31:46 2016' 0901D CRC                   00000000 09021 Compressed Length     00000000 09025 Uncompressed Length   00000000 09029 Filename Length       0018 0902B Extra Length          0000 0902D Filename              'Mean.20160830_071500.zip' 09045 PAYLOAD  0E05E STREAMING DATA HEADER 08074B50 0E062 CRC                   EDC8AE4E 0E066 Compressed Length     00005019 0E06A Uncompressed Length   000054EA  0E06E LOCAL HEADER #4       04034B50 0E072 Extract Zip Spec      14 '2.0' 0E073 Extract OS            00 'MS-DOS' 0E074 General Purpose Flag  0808       [Bits 1-2]            0 'Normal Compression'       [Bit  3]              1 'Streamed'       [Bit 11]              1 'Language Encoding' 0E076 Compression Method    0008 'Deflated' 0E078 Last Mod Time         491E43F7 'Tue Aug 30 08:31:46 2016' 0E07C CRC                   00000000 0E080 Compressed Length     00000000 0E084 Uncompressed Length   00000000 0E088 Filename Length       0018 0E08A Extra Length          0000 0E08C Filename              'Mean.20160830_080000.zip' 0E0A4 PAYLOAD  15444 STREAMING DATA HEADER 08074B50 15448 CRC                   C37437FB 1544C Compressed Length     000073A0 15450 Uncompressed Length   00008054  15454 LOCAL HEADER #5       04034B50 15458 Extract Zip Spec      14 '2.0' 15459 Extract OS            00 'MS-DOS' 1545A General Purpose Flag  0808       [Bits 1-2]            0 'Normal Compression'       [Bit  3]              1 'Streamed'       [Bit 11]              1 'Language Encoding' 1545C Compression Method    0008 'Deflated' 1545E Last Mod Time         491E43F7 'Tue Aug 30 08:31:46 2016' 15462 CRC                   00000000 15466 Compressed Length     00000000 1546A Uncompressed Length   00000000 1546E Filename Length       0018 15470 Extra Length          0000 15472 Filename              'Mean.20160830_070000.zip' 1548A PAYLOAD  19E5D STREAMING DATA HEADER 08074B50 19E61 CRC                   40E52180 19E65 Compressed Length     000049D3 19E69 Uncompressed Length   00005110  19E6D CENTRAL HEADER #1     02014B50 19E71 Created Zip Spec      14 '2.0' 19E72 Created OS            00 'MS-DOS' 19E73 Extract Zip Spec      14 '2.0' 19E74 Extract OS            00 'MS-DOS' 19E75 General Purpose Flag  0808       [Bits 1-2]            0 'Normal Compression'       [Bit  3]              1 'Streamed'       [Bit 11]              1 'Language Encoding' 19E77 Compression Method    0008 'Deflated' 19E79 Last Mod Time         491E43F7 'Tue Aug 30 08:31:46 2016' 19E7D CRC                   A21A5BC4 19E81 Compressed Length     00004FCC 19E85 Uncompressed Length   000054D8 19E89 Filename Length       0018 19E8B Extra Length          0000 19E8D Comment Length        0000 19E8F Disk Start            0000 19E91 Int File Attributes   0000       [Bit 0]               0 'Binary Data' 19E93 Ext File Attributes   00000000 19E97 Local Header Offset   00000000 19E9B Filename              'Mean.20160830_073000.zip'  19EB3 CENTRAL HEADER #2     02014B50 19EB7 Created Zip Spec      14 '2.0' 19EB8 Created OS            00 'MS-DOS' 19EB9 Extract Zip Spec      14 '2.0' 19EBA Extract OS            00 'MS-DOS' 19EBB General Purpose Flag  0808       [Bits 1-2]            0 'Normal Compression'       [Bit  3]              1 'Streamed'       [Bit 11]              1 'Language Encoding' 19EBD Compression Method    0008 'Deflated' 19EBF Last Mod Time         491E43F7 'Tue Aug 30 08:31:46 2016' 19EC3 CRC                   BAE824D6 19EC7 Compressed Length     00003FB7 19ECB Uncompressed Length   000043A3 19ECF Filename Length       0018 19ED1 Extra Length          0000 19ED3 Comment Length        0000 19ED5 Disk Start            0000 19ED7 Int File Attributes   0000       [Bit 0]               0 'Binary Data' 19ED9 Ext File Attributes   00000000 19EDD Local Header Offset   00005012 19EE1 Filename              'Mean.20160830_081500.zip'  19EF9 CENTRAL HEADER #3     02014B50 19EFD Created Zip Spec      14 '2.0' 19EFE Created OS            00 'MS-DOS' 19EFF Extract Zip Spec      14 '2.0' 19F00 Extract OS            00 'MS-DOS' 19F01 General Purpose Flag  0808       [Bits 1-2]            0 'Normal Compression'       [Bit  3]              1 'Streamed'       [Bit 11]              1 'Language Encoding' 19F03 Compression Method    0008 'Deflated' 19F05 Last Mod Time         491E43F7 'Tue Aug 30 08:31:46 2016' 19F09 CRC                   EDC8AE4E 19F0D Compressed Length     00005019 19F11 Uncompressed Length   000054EA 19F15 Filename Length       0018 19F17 Extra Length          0000 19F19 Comment Length        0000 19F1B Disk Start            0000 19F1D Int File Attributes   0000       [Bit 0]               0 'Binary Data' 19F1F Ext File Attributes   00000000 19F23 Local Header Offset   0000900F 19F27 Filename              'Mean.20160830_071500.zip'  19F3F CENTRAL HEADER #4     02014B50 19F43 Created Zip Spec      14 '2.0' 19F44 Created OS            00 'MS-DOS' 19F45 Extract Zip Spec      14 '2.0' 19F46 Extract OS            00 'MS-DOS' 19F47 General Purpose Flag  0808       [Bits 1-2]            0 'Normal Compression'       [Bit  3]              1 'Streamed'       [Bit 11]              1 'Language Encoding' 19F49 Compression Method    0008 'Deflated' 19F4B Last Mod Time         491E43F7 'Tue Aug 30 08:31:46 2016' 19F4F CRC                   C37437FB 19F53 Compressed Length     000073A0 19F57 Uncompressed Length   00008054 19F5B Filename Length       0018 19F5D Extra Length          0000 19F5F Comment Length        0000 19F61 Disk Start            0000 19F63 Int File Attributes   0000       [Bit 0]               0 'Binary Data' 19F65 Ext File Attributes   00000000 19F69 Local Header Offset   0000E06E 19F6D Filename              'Mean.20160830_080000.zip'  19F85 CENTRAL HEADER #5     02014B50 19F89 Created Zip Spec      14 '2.0' 19F8A Created OS            00 'MS-DOS' 19F8B Extract Zip Spec      14 '2.0' 19F8C Extract OS            00 'MS-DOS' 19F8D General Purpose Flag  0808       [Bits 1-2]            0 'Normal Compression'       [Bit  3]              1 'Streamed'       [Bit 11]              1 'Language Encoding' 19F8F Compression Method    0008 'Deflated' 19F91 Last Mod Time         491E43F7 'Tue Aug 30 08:31:46 2016' 19F95 CRC                   40E52180 19F99 Compressed Length     000049D3 19F9D Uncompressed Length   00005110 19FA1 Filename Length       0018 19FA3 Extra Length          0000 19FA5 Comment Length        0000 19FA7 Disk Start            0000 19FA9 Int File Attributes   0000       [Bit 0]               0 'Binary Data' 19FAB Ext File Attributes   00000000 19FAF Local Header Offset   00015454 19FB3 Filename              'Mean.20160830_070000.zip'  19FCB END CENTRAL HEADER    06054B50 19FCF Number of this disk   0000 19FD1 Central Dir Disk no   0000 19FD3 Entries in this disk  0005 19FD5 Total Entries         0005 19FD7 Size of Central Dir   0000015E 19FDB Offset to Central Dir 00019E6D 19FDF Comment Length        0000 Done 

Decompressed-recompressed file :

00000 LOCAL HEADER #1       04034B50 00004 Extract Zip Spec      14 '2.0' 00005 Extract OS            00 'MS-DOS' 00006 General Purpose Flag  0000 00008 Compression Method    0000 'Stored' 0000A Last Mod Time         491E510F 'Tue Aug 30 10:08:30 2016' 0000E CRC                   218B9162 00012 Compressed Length     00005595 00016 Uncompressed Length   00005595 0001A Filename Length       0018 0001C Extra Length          0000 0001E Filename              'Mean.20160830_070000.zip' 00036 PAYLOAD  055CB LOCAL HEADER #2       04034B50 055CF Extract Zip Spec      14 '2.0' 055D0 Extract OS            00 'MS-DOS' 055D1 General Purpose Flag  0000 055D3 Compression Method    0000 'Stored' 055D5 Last Mod Time         491E5117 'Tue Aug 30 10:08:46 2016' 055D9 CRC                   180124FD 055DD Compressed Length     00005972 055E1 Uncompressed Length   00005972 055E5 Filename Length       0018 055E7 Extra Length          0000 055E9 Filename              'Mean.20160830_071500.zip' 05601 PAYLOAD  0AF73 LOCAL HEADER #3       04034B50 0AF77 Extract Zip Spec      14 '2.0' 0AF78 Extract OS            00 'MS-DOS' 0AF79 General Purpose Flag  0000 0AF7B Compression Method    0000 'Stored' 0AF7D Last Mod Time         491E511D 'Tue Aug 30 10:08:58 2016' 0AF81 CRC                   03A4486C 0AF85 Compressed Length     00005953 0AF89 Uncompressed Length   00005953 0AF8D Filename Length       0018 0AF8F Extra Length          0000 0AF91 Filename              'Mean.20160830_073000.zip' 0AFA9 PAYLOAD  108FC LOCAL HEADER #4       04034B50 10900 Extract Zip Spec      14 '2.0' 10901 Extract OS            00 'MS-DOS' 10902 General Purpose Flag  0000 10904 Compression Method    0000 'Stored' 10906 Last Mod Time         491E5124 'Tue Aug 30 10:09:08 2016' 1090A CRC                   FEE97172 1090E Compressed Length     00008818 10912 Uncompressed Length   00008818 10916 Filename Length       0018 10918 Extra Length          0000 1091A Filename              'Mean.20160830_080000.zip' 10932 PAYLOAD  1914A LOCAL HEADER #5       04034B50 1914E Extract Zip Spec      14 '2.0' 1914F Extract OS            00 'MS-DOS' 19150 General Purpose Flag  0000 19152 Compression Method    0000 'Stored' 19154 Last Mod Time         491E5129 'Tue Aug 30 10:09:18 2016' 19158 CRC                   0B38337E 1915C Compressed Length     00004713 19160 Uncompressed Length   00004713 19164 Filename Length       0018 19166 Extra Length          0000 19168 Filename              'Mean.20160830_081500.zip' 19180 PAYLOAD  1D893 CENTRAL HEADER #1     02014B50 1D897 Created Zip Spec      14 '2.0' 1D898 Created OS            00 'MS-DOS' 1D899 Extract Zip Spec      14 '2.0' 1D89A Extract OS            00 'MS-DOS' 1D89B General Purpose Flag  0000 1D89D Compression Method    0000 'Stored' 1D89F Last Mod Time         491E510F 'Tue Aug 30 10:08:30 2016' 1D8A3 CRC                   218B9162 1D8A7 Compressed Length     00005595 1D8AB Uncompressed Length   00005595 1D8AF Filename Length       0018 1D8B1 Extra Length          0000 1D8B3 Comment Length        0000 1D8B5 Disk Start            0000 1D8B7 Int File Attributes   0000   [Bit 0]               0 'Binary Data' 1D8B9 Ext File Attributes   00000020   [Bit 5]               Archive 1D8BD Local Header Offset   00000000 1D8C1 Filename              'Mean.20160830_070000.zip'  1D8D9 CENTRAL HEADER #2     02014B50 1D8DD Created Zip Spec      14 '2.0' 1D8DE Created OS            00 'MS-DOS' 1D8DF Extract Zip Spec      14 '2.0' 1D8E0 Extract OS            00 'MS-DOS' 1D8E1 General Purpose Flag  0000 1D8E3 Compression Method    0000 'Stored' 1D8E5 Last Mod Time         491E5117 'Tue Aug 30 10:08:46 2016' 1D8E9 CRC                   180124FD 1D8ED Compressed Length     00005972 1D8F1 Uncompressed Length   00005972 1D8F5 Filename Length       0018 1D8F7 Extra Length          0000 1D8F9 Comment Length        0000 1D8FB Disk Start            0000 1D8FD Int File Attributes   0000   [Bit 0]               0 'Binary Data' 1D8FF Ext File Attributes   00000020   [Bit 5]               Archive 1D903 Local Header Offset   000055CB 1D907 Filename              'Mean.20160830_071500.zip'  1D91F CENTRAL HEADER #3     02014B50 1D923 Created Zip Spec      14 '2.0' 1D924 Created OS            00 'MS-DOS' 1D925 Extract Zip Spec      14 '2.0' 1D926 Extract OS            00 'MS-DOS' 1D927 General Purpose Flag  0000 1D929 Compression Method    0000 'Stored' 1D92B Last Mod Time         491E511D 'Tue Aug 30 10:08:58 2016' 1D92F CRC                   03A4486C 1D933 Compressed Length     00005953 1D937 Uncompressed Length   00005953 1D93B Filename Length       0018 1D93D Extra Length          0000 1D93F Comment Length        0000 1D941 Disk Start            0000 1D943 Int File Attributes   0000   [Bit 0]               0 'Binary Data' 1D945 Ext File Attributes   00000020   [Bit 5]               Archive 1D949 Local Header Offset   0000AF73 1D94D Filename              'Mean.20160830_073000.zip'  1D965 CENTRAL HEADER #4     02014B50 1D969 Created Zip Spec      14 '2.0' 1D96A Created OS            00 'MS-DOS' 1D96B Extract Zip Spec      14 '2.0' 1D96C Extract OS            00 'MS-DOS' 1D96D General Purpose Flag  0000 1D96F Compression Method    0000 'Stored' 1D971 Last Mod Time         491E5124 'Tue Aug 30 10:09:08 2016' 1D975 CRC                   FEE97172 1D979 Compressed Length     00008818 1D97D Uncompressed Length   00008818 1D981 Filename Length       0018 1D983 Extra Length          0000 1D985 Comment Length        0000 1D987 Disk Start            0000 1D989 Int File Attributes   0000   [Bit 0]               0 'Binary Data' 1D98B Ext File Attributes   00000020   [Bit 5]               Archive 1D98F Local Header Offset   000108FC 1D993 Filename              'Mean.20160830_080000.zip'  1D9AB CENTRAL HEADER #5     02014B50 1D9AF Created Zip Spec      14 '2.0' 1D9B0 Created OS            00 'MS-DOS' 1D9B1 Extract Zip Spec      14 '2.0' 1D9B2 Extract OS            00 'MS-DOS' 1D9B3 General Purpose Flag  0000 1D9B5 Compression Method    0000 'Stored' 1D9B7 Last Mod Time         491E5129 'Tue Aug 30 10:09:18 2016' 1D9BB CRC                   0B38337E 1D9BF Compressed Length     00004713 1D9C3 Uncompressed Length   00004713 1D9C7 Filename Length       0018 1D9C9 Extra Length          0000 1D9CB Comment Length        0000 1D9CD Disk Start            0000 1D9CF Int File Attributes   0000   [Bit 0]               0 'Binary Data' 1D9D1 Ext File Attributes   00000020   [Bit 5]               Archive 1D9D5 Local Header Offset   0001914A 1D9D9 Filename              'Mean.20160830_081500.zip'  1D9F1 END CENTRAL HEADER    06054B50 1D9F5 Number of this disk   0000 1D9F7 Central Dir Disk no   0000 1D9F9 Entries in this disk  0005 1D9FB Total Entries         0005 1D9FD Size of Central Dir   0000015E 1DA01 Offset to Central Dir 0001D893 1DA05 Comment Length        0000 Done 

Structure of the zip file

  • MainFile.zip
    • InnerFile1.zip
      • InnerFile1.xml
    • InnerFile2.zip
      • InnerFile2.xml
    • InnerFile-N.zip
      • InnerFile-N.zip

Code to create zip file

public void addToZip(File zipFile, File... filesToAdd) {     final byte[] buffer = new byte[1024];      FileOutputStream fos = new FileOutputStream(zipFile.getAbsoluteFile());     ZipOutputStream zos = new ZipOutputStream(fos);      for (File fileToAdd : filesToAdd) {         ZipEntry entry = new ZipEntry(fileToAdd.getName());         try {             zos.putNextEntry(entry);             try (FileInputStream in = new FileInputStream(input.getAbsoluteFile())) {                 int len;                 while ((len = in.read(buffer)) > 0) {                     zos.write(buffer, 0, len);                 }             }             zos.closeEntry();         }     }     try {         zos.finish();         zos.close();     } catch (IOException ex) {         LOGGER.log(Level.SEVERE, null, ex);     } } 

Compression method

As per the primary reference for Zip files :

4.1.3 Data compression MAY be used to reduce the size of files placed into a ZIP file, but is not required. This format supports the use of multiple data compression algorithms. When compression is used, one of the documented compression algorithms MUST be used. Implementors are advised to experiment with their data to determine which of the available algorithms provides the best compression for their needs. Compression method 8 (Deflate) is the method used by default by most ZIP compatible application programs.

In this specific case, using zipEntry.setMethod(ZipEntry.DEFLATED); was not enough to have my files accepted by the legacy program.

2 Answers

Answers 1

I had (almost) the same problem with a banks legacy system. It checked the Compressed size and Uncompressed size in the archive. In your provided output the accepted one contains valid informations (diff):

00012 Compressed Length     00005595 00012 Compressed Length     00005595 

Those fields in your Java generated archive file:

00012 Compressed Length     00000000 00016 Uncompressed Length   00000000 

For more detailed header information checkout Wikipedia.

Most likely the soulution to your problem is this:

http://stackoverflow.com/a/29096370/4823977

Answers 2

I hate guessing without being able to verify my assumptions, thus the comment/question about the availability of the actual server-side file checker on the FTP server, but for what it is worth and just assuming that the checker prefers STORED entries to DEFLATED ones, here is how you can create

  • deflated ZIP files from XML files (which makes sense because XML is nicely compressible) and
  • stored ZIP files from nested ZIP files (which also makes sense because usually compressed files cannot effectively get smaller when re-compressing them).

The problem here is that when using STORED entries it is mandatory to also set (un-)compressed size and CRC for each entry. This is how you can do it (unfortunately I am reading each nested ZIP file twice, once for calculating the CRC and once for copying its contents to the ZIP output stream):

Driver application:

package de.scrum_master.app;  import java.io.File; import java.io.IOException; import java.util.SortedMap; import java.util.TreeMap;  public class Application {     public static void main(String[] args) throws IOException {         SortedMap<File, File[]> fileMappings = new TreeMap<>();         fileMappings.put(             new File("books.zip"),             new File[] { new File("books.xml") }         );         fileMappings.put(             new File("cds.zip"),             new File[] { new File("cds.xml") }         );         fileMappings.put(             new File("clothes.zip"),             new File[] { new File("clothes.xml") }         );          ZipTool zipTool = new ZipTool();         for (File zipFile : fileMappings.keySet())             zipTool.addToZip(zipFile, fileMappings.get(zipFile));         zipTool.addToZip(             new File("archive.zip"),             new File[] { new File("books.zip"), new File("cds.zip"), new File("clothes.zip") }         );     } } 

ZIP utility:

package de.scrum_master.app;  import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.IOException; import java.util.zip.CRC32; import java.util.zip.ZipEntry; import java.util.zip.ZipOutputStream;  public class ZipTool {     public void addToZip(File zipFile, File... filesToAdd) throws IOException {         try (             FileOutputStream fos = new FileOutputStream(zipFile.getAbsoluteFile());             ZipOutputStream zos = new ZipOutputStream(fos)         ) {             final byte[] buffer = new byte[1024];             for (File fileToAdd : filesToAdd) {                 ZipEntry entry = new ZipEntry(fileToAdd.getName());                 if (fileToAdd.getName().endsWith(".zip")) {                     entry.setMethod(ZipEntry.STORED);                     long fileSize = fileToAdd.length();                     entry.setSize(fileSize);                     entry.setCompressedSize(fileSize);                     CRC32 crc32 = new CRC32();                     try (FileInputStream in = new FileInputStream(fileToAdd.getAbsoluteFile())) {                         int len;                         while ((len = in.read(buffer)) > 0) {                             crc32.update(buffer, 0, len);                         }                     }                     entry.setCrc(crc32.getValue());                 }                 zos.putNextEntry(entry);                 try (FileInputStream in = new FileInputStream(fileToAdd.getAbsoluteFile())) {                     int len;                     while ((len = in.read(buffer)) > 0) {                         zos.write(buffer, 0, len);                     }                 }                 zos.closeEntry();             }         }     } } 

Maybe you can try that and provide me with feedback. If this is not good enough and the checker still complains, we can try something else to make the zip archives even more similar.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment