Compressed Files

There are more combinations to consider in the case of compressed files. I have only got rsync numbers for a few of the files here so far. I have broken them down by how the file to be transferred is compressed (none, gzip, or gzip --rsync) and whether zsync's look-inside-gzip functionality was used. I have also included numbers for

I took two Debian Packages files, downloaded a day apart, as the source and target files. The target file was 12.1MB, or 3.1MB gzipped. A diff of the two (deflated) files took 58kb. I have included the transferred data as (file data + control data), where control data is just the size of the .zsync file (which clearly cannot be neglected as it must be downloaded, so it is an overhead of the algorithm); except for rsync, where the checksums are transmitted upstream and are shown separately.

Several methods were used. Firstly, for comparison, working on the full deflated 12.1MB:

Blocksize (bytes)512102420484096819216384
zsync-0.1.0 (pr)564709353690279580306050468092723102
rsync-2.6.3175004 (+142414 up)161581 (+71226 up)190176 (+35634 up)258256 (+17838 up)403904 (+8940 up)728634 (+4488 up)

Next, on the file compressed with gzip --best. For a fairer comparison with rsync, and to show the difference that the look-inside method makes, zsync without the look-inside method is shown too. As expected, without look-inside or with rsync, almost the entire 3.1MB compressed file is transferred.

Blocksize (bytes)512102420484096819216384
zsync-0.1.2 with look-inside613532339527217883190338230413failed
zsync 0.1.0 (pr) without look-inside313406130740673044591303342730339993046564
rsync-2.6.33012791 (+36636 up)3013371 (+18336 up)3014172 (+9186 up)3018156 (+4614 up)3026296 (+2328 up)3042650 (+1182 up)

Finally, the file is compressed before-and-after with gzip --best --rsync.

Blocksize (bytes)512102420484096819216384
zsync 0.1.2 with look-inside625590351942228179263135354503300098
zsync 0.1.0 (pr) without look-inside496567449475444663492377607225840588
rsync-2.6.3390270 (+37632 up)392418 (+18834 up)417550 (+9438 up)472108 (+4740 up)581312 (+2388 up)818212 (+1212 up)

Debian Package files contain textual data. This is about half and half between plain English package descriptions, and key:value pairs of text data containing package names, versions, and such. The changes week to week are widespread and very scattered. Thus the compressed transfer, which effectively has larger blocks relative to the underlying content, is less efficient here.

gzip --rsync does fairly well, with rsync transferring about 420KB and zsync transferring about 450KB. zsync with the look-inside method does much better than either of these, with as little as 190K transferred. At this optimum point, zsync transferred 75KB of (compressed) file data - close to the diff in size - and 142KB of the .zsync.

Note that the look-inside and uncompressed figures at a blocksize of 1024 bytes include 250kb of data just transferring the .zsync file (and the 512 byte blocksize transfer had a 478kb control file, representing over one third of the data transfer) - given that the underlying data is plain text, transmitting a full 20 checksum bytes per block is probably excessive (especially for smaller blocks), so significant savings could be made here. The methods just looking at the compressed data only had to transfer a 60kb .zsync file at blocksize 1024 bytes (smaller stream, so fewer blocks, so fewer checksums), but their greater inefficiency in identifying common data easily wiped out this saving.

The uncompressed data does quite well, better than most of the compressed transfers. However this good performance will only occur where the data files are very close - for updates more than a few days apart, where there is less data in common, compressed transfers can be expected to take a clear lead. Providing only a compressed stream also saves on disk space on the server.