As zsync develops, I am performing a number of test runs, and cataloguing the results here. The numbers here must be taken in the context that the current implementation is still being optimised.
Numbers given here reflect application layer traffic only - I have not attempted to account for TCP/IP headers. Generally speaking, provided the algorithm does not result in more data being transmitted, and provided it does not needlessly fragment packets or require lots of separate connections, there should be no extra overhead at the network level relative to a full download. zsync-0.0.2 and up satisfy these requirements in my view. I have done some empirical verification of this, but not to the same precision as the other numbers here.
Numbers for zsync are the figures given by zsync itself when exiting - this includes only downstream traffic (upstream traffic is typically negligible with zsync - necessarily so, as the client is doing all the work). Numbers for rsync are downstream, but with upstream traffic given in brackets afterwards, as returned by rsync -vv
(note in particular that rsync's figures appear to neglect the transport overhead or rsh/ssh, although for rsh I assume this overhead would be negligible anyway). zsync downloads the checksums and then downloads the data, whereas rsync uploads the checksums and then downloads the data, so roughly speaking the up+down data for rsync should equal the down data for zsync, if all is well.
This section deals with data files which are not generally compressed, perhaps because the data they contain is already compressed, albeit not in a form recognisable to file handling tools - e.g. ISO files containing compressed data files, or JPEG images.
The first test file is sarge-i386-netinst.iso
(Debian-Installer CD image), with the user updating from the 2004-10-29 snapshot (md5sum ca5b63d27a3bf2d30fe65429879f630b) to the 2004-10-30 snapshot (md5sum ef8bd520026cef6090a18d1c7ac66a39). Inter-day snapshots like this should have large amounts in common. Both files are around 110MB.
I tried various block sizes (rsync's default for files of this size is around 8kB). I have included zsync prior to the checksum length optimisations, for historical reference. Bear in mind that zsync-0.2.0's block sizes are not directly comparable to rsync or earlier zsync, because it requires 2 consecutive matches; hence zsync-0.2.0 with a block size of 1024 may be more directly comparable to rsync with a block size of 1024.
Block size (bytes) | 512 | 1024 | 2048 | 4096 | 8192 | 16384 |
zsync-0.0.6 | 13278966 | 11347004 | 10784543 | 10409473 | 10357172 | 10562326 |
rsync | 9479309 (+770680 up) | 9867587 (+385358 up) | 9946883 (+192697) | 10109455 (+96370) | ||
zsync-0.2.0 (pre-release) | 10420370 | 10367061 | 10093596 | 10111121 | 10250799 | 10684655 |
zsync transferred more file data as the block size was increased, as expected. At a block size of 512, the .zsync file was around 1.5MB - this fell to 660kB, 330kB and so on for the larger blocksizes. All the results were very close, however: the most obvious feature of the results is that in all cases only about 10MB was transferred, a saving of around 90% on the full download of 113MB.
Next, I tested an update from a Fedora Core 3 test2 iso image (668MB, md5sum ) to Fedora Core 3 test3 (640MB) (two Linux distribution CD images, with significant differences between them).
Blocksize (bytes) | 512 | 1024 | 2048 | 4096 | 8192 | 16384 |
rsync | 339684147 (+5224424 up) | 345822571 (+2612232 up) | 353812835 (+1306136 up) | 363311939 (+571457) | 374611439 (+285752 up) | |
zsync-0.0.6 | 366356894 | |||||
zsync-0.2.0 | 347181962 | 347151941 | 352041787 | 359541472 | 369585481 | 380574374 |
zsync closely parallels rsync's result here. Roughly 50% of the files are in common I guess from these results, and somewhere around 60% is being transferred. zsync (paired with apache 2.0.52) took about 6 minutes in a local to local transfer, while rsync took about 7 minutes (over rsh).
For reference, here are the CPU times used corresponding to the table above, in seconds. These are just indicative, as they include downloading the control files and the final checksum verification (except for rsync, which does not do this), and the machine was not idle, nor did I flush disk cache etc between runs. Nonetheless, this gives an indication of how expensive the smaller block sizes are, which is an important consideration for larger files.
Blocksize (bytes) | 512 | 1024 | 2048 | 4096 | 8192 | 16384 |
rsync | 1113 | 570 | 418 | 314 | 205 | |
zsync-0.2.0 | 1785 | 931 | 520 | 297 | 219 | 158 |
zsync appears to be very close to rsync, both in CPU usage and transfer efficiency.
Finally, here is an example with a different type of data file. I did an update between two Debian Packages files of a few days apart. These files consist of textual data (some key: value lines, and some text descriptions), with only a few entries changed each day. The files in my test were each 12.1MB; the diff between them was 59kB.
Blocksize (bytes) | 256 | 512 | 768 | 1024 | 1536 | 2048 | 4096 | 8192 |
zsync-0.1.0 (pre-release) | 564709 | 353690 | 279580 | 306050 | 468092 | |||
rsync-2.6.3 (down only) | 247512 | 175004 | 156772 | 161452 | 162108 | 190048 | 258128 | 403776 |
rsync-2.6.3 (total) | 579758 | 317418 | 251732 | 232682 | 209608 | 225686 | 275970 | 412720 |
rsync-2.6.3 (total, with compression) | 349311 | 165121 | 120567 | 102686 | 83033 | 81638 | 85591 | 117775 |
zsync-0.2.1 | 405520 | 257388 | n/a | 204028 | n/a | 204266 | 280286 | 487790 |
(rsync's default block size for this file is around 3.5kB, giving a sub-optimal 245kB transferred.) Note that zsync is ahead of rsync in total data transferred - in the default blocksizes, the optimal blocksizes, and all of the smaller blocksizes. rsync remains a clear winner for smaller blocksizes if we ignore the upstream data, and is ahead at larger blocksizes (although it mmight be fairer to compare zsync with rsync at twice the blocksize, due to the match continuation optimisation - in which case the result is reversed, with rsync better at the smaller sizes and zsync for the larger). The optimum total data transferred is similar for both. Note that zsync-0.1.0, which lacked the checksum size and match continuation optimisations, is very inefficient by comparison and particularly for small blocksizes.
With compression (the -z option - zlib compression of the comminication channel) turned on — to which zsync has no equivalent — rsync is about 60% more efficient.