Wednesday, April 13, 2016

How to convert jar to rsyncable jar?

Leave a Comment

I have a fat/uber JAR generated by Gradle Shadow plugin. I often need to send the fat JAR over network and therefore, it is convenient for me to send only delta of the file instead of cca 40 MB of data. rsync is a great tool for this purpose. However, a small change in my source code leads to a large change in final fat JAR and consequently rsync is not helping as much as it could.

Can I convert the fat JAR to rsync-friendly JAR?

My ideas of a solution/workarounds:

  • Put the heavy weight on rsync and tell it somehow that it works with a compressed file (I didn't find any way to do it).
  • Convert non-rsyncable jar to rsyncable jar
  • Tell Gradle Shadow to generate rsyncable jar (not possible at the moment)

Possibly related questions:

3 Answers

Answers 1

As far as I know, rsyncable gzip works by reseting the Huffman tree and padding to byte boundaries every 8192 bytes of compressed data. This avoids long range side effect on the compression (rsync take care of shifted data blocks if they are at least byte aligned)

In this sense, a jar containing small files (less than 8192 bytes) is already rsyncable, because each file is compressed separately. As a test you could use jar's -0 option (no compression) to check if it helps rsync, but I think it won't.

To improve the rsyncability you need to (at least):

  • Make sure the files are stored in the same order.
  • Make sure the meta data associated to unchanged files are also unchanged, as each file has a local file header. For example the last modification time is problematic for .class files.
    I am not sure for jar, but zip allows extra fields, some of which may prevent rsync matches, e.g. the last acces time for the unix extension.

Edit : I did some tests with the following commands :

FILENAME=SomeJar.jar  rm -rf tempdir mkdir tempdir  unzip ${FILENAME} -d tempdir/  cd tempdir  # set the timestamp to 2000-01-01 00:00 find . -print0 | xargs --null touch -t 200001010000  # normalize file mode bits, maybe not necessary chmod -R u=rwX,go=rX .  # sort and zip files, without extra find . -type f -print | sort | zip ../${FILENAME}_normalized  -X -@  cd .. rm -rf tempdir 

rsync stats when the first file contained in the jar / zip is removed :

total: matches=1973  hash_hits=13362  false_alarms=0 data=357859 sent 365,918 bytes  received 12,919 bytes  252,558.00 bytes/sec total size is 4,572,187  speedup is 12.07 

when the first file is removed and every timestamp is modified :

total: matches=334  hash_hits=124326  false_alarms=4 data=3858763 sent 3,861,473 bytes  received 12,919 bytes  7,748,784.00 bytes/sec total size is 4,572,187  speedup is 1.18 

So there is a significant difference, but not as much as I expected.

It also seems that changing the file mode does not impact the transfert (maybe because it is stored in the central directory ?)

Answers 2

Let's take one step back; if you do not create large jars, this ceases to be a problem.

So, if you deploy your dependency jars separately, and you don't jar them into a single fat jar, you've also solved the problem here.

To do that, let's say you have:

  • /foo/yourapp.jar
  • /foo/lib/guava.jar
  • /foo/lib/h2.jar

Then, put in the META-INF/MANIFEST.MF file of yourapp.jar the following entry:

Class-Path: lib/guava.jar lib/h2.jar 

And now you can just run java -jar yourapp.jar and it'll work, picking up the dependencies. You can now transfer these files individually with rsync; yourapp.jar will be much smaller, and your dependency jars will usually not have changed, so those won't take much time when rsyncing either.

I'm aware this doesn't directly answer the actual asked question, but I bet in 90%+ of the times this question comes up, not fatjarring is the appropriate answer.

NB: Ant, Maven, Guava, etc, can take care of putting the right manifest entry in. If the intent of your jar is not to run it, but, for example, it's a war for a web servlet container, those have their own rules for how to specify where your dependency jars live.

Answers 3

Yes you can speed it up by about 40% on a new archive and by more than 200% on a jar archive you've already rsync'd. The trick is to not compress the jar so you can take advantage of rsyncs chunking algorithm.

I used the following commands to compress a directory with a log of class files...

jar cf0 uncompressed.jar . jar cf  compressed.jar   . 

This created the following two jars...

-rw-r--r--  1 rsync jar    28331212 Apr 13 14:11 ./compressed.jar -rw-r--r--  1 rsync jar    38746054 Apr 13 14:10 ./uncompressed.jar 

Note that the size of the uncompressed Jar is about 10MB larger.

I then rsync'd these files and timed them using the following commands. (Note, even turning on compression for the compressed file had little effect, I'll explain later).

Compressed Jar

time rsync -av -e ssh compressed.jar jar@rsync-server.org:/tmp/  building file list ... done compressed.jar  sent 28334806 bytes  received 42 bytes  2982615.58 bytes/sec total size is 28331212  speedup is 1.00  real  0m9.208s user  0m0.248s sys 0m0.483s 

Uncompressed Jar

time rsync -avz -e ssh uncompressed.jar jar@rsync-server.org:/tmp/  building file list ... done uncompressed.jar  sent 11751973 bytes  received 42 bytes  2136730.00 bytes/sec total size is 38746054  speedup is 3.30  real  0m5.145s user  0m1.444s sys 0m0.219s 

We have gained a speedup of nearly 50%. This at least speeds up the rsync and we get a good boost but what about subsequent rsyncs where a small change has been made.

I removed one class file from the directory that was 170 bytes in size recreated the jars mow they are this size..

-rw-r--r--  1 rsycn jar  28330943 Apr 13 14:30 compressed.jar -rw-r--r--  1 rsync jar  38745784 Apr 13 14:30 uncompressed.jar 

Now the timings are very different.

Compressed Jar

building file list ... done compressed.jar  sent 12166657 bytes  received 31998 bytes  2217937.27 bytes/sec total size is 28330943  speedup is 2.32  real  0m5.435s user  0m0.378s sys 0m0.335s 

Uncompressed Jar

building file list ... done uncompressed.jar  sent 220163 bytes  received 43624 bytes  175858.00 bytes/sec total size is 38745784  speedup is 146.88  real  0m1.533s user  0m0.363s sys 0m0.047s 

So we can speed up rsyncing large jar files a lot using this method. The reason for this is related to information theory. When you compress data it in effect removes everything that's common from the data ie what you're left with looks very much like random data, the best compressors remove more of this information. A small change to any of the data and most compression algorithms have a dramatic effect on the output of the data.

The Zip algorithm is effectively making it harder for rsync to find checksums that are the same between the server and client and this means it needs to transfer more data. When you uncompress it you're letting rsync do what it's good at, send less data to sync the two files.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment