Chapter 5. Empirical Results

Table of Contents

Uncompressed Data
Compressed Files

As zsync develops, I am performing a number of test runs, and cataloguing the results here. The numbers here must be taken in the context that the current implementation is not yet fully optimised.

Numbers given here reflect application layer traffic only - I have not attempted to account for TCP/IP headers. Generally speaking, provided the algorithm does not result in more data being transmitted, and provided it does not needlessly fragment packets or require lots of separate connections, there should be no extra overhead at the network level relative to a full download. zsync-0.0.2 and up satisfy these requirements in my view. I have done some empirical verification of this, but not to the same precision as the other numbers here.

Numbers for zsync are the figures given by zsync itself when exiting - this includes only downstream traffic (upstream traffic is typically negligible with zsync - necessarily so, as the client is doing all the work). Numbers for rsync are downstream, but with upstream traffic given in brackets afterwards, as returned by rsync -vv (note in particular that rsync's figures appear to neglect the transport overhead or rsh/ssh, although for rsh I assume this overhead would be negligible anyway). zsync downloads the checksums and then downloads the data, whereas rsync uploads the checksums and then downloads the data, so roughly speaking the up+down data for rsync should equal the down data for zsync, if all is well.

Uncompressed Data

This section deals with data files which are not generally compressed, perhaps because the data they contain is already compressed, albeit not in a form recognisable to file handling tools - e.g. ISO files containing compressed data files, or JPEG images.

The first test file is sarge-i386-netinst.iso (Debian-Installer CD image), with the user updating from the 2004-10-29 snapshot (md5sum ca5b63d27a3bf2d30fe65429879f630b) to the 2004-10-30 snapshot (md5sum ef8bd520026cef6090a18d1c7ac66a39). Inter-day snapshots like this should have large amounts in common. Both files are around 110MB. I tried various block sizes.

MethodBlock size (bytes)Transferred (bytes)
rsync10249479309 (+770680 up)
rsync20489867587 (+385358 up)
rsync40969946883 (+192697)
rsync819210109455 (+96370)
rsyncdefault (auto-select)10210013 (+74380)

zsync transferred more file data as the block size was increased, but this was more than offset by a smaller .zsync file to download initially. At a block size of 512, the .zsync file was over 4.4MB - this fell to 2,2MB, 1.1MB and 550kB and so on for the larger blocksizes. It is clear that rsync defaults to a much larger blocksize on files of this type, with under 100k of metadata transmitted up to the server. All the results were very close, however: the most obvious feature of the results is that in all cases only about 10MB was transferred, a saving of around 90% on the full download of 113MB.

Next, I tested an update from a Fedora Core 3 test2 iso image (668MB, md5sum ) to Fedora Core 3 test3 (640MB) (two Linux distribution CD images, with significant differences between them).

MethodBlocksize (bytes)Transferred (bytes)
rsync8192363312079 (+571453)

zsync closely parallels rsync's result here. Roughly 50% of the files are in common I guess from these results, and somewhere around 60% is being transferred. zsync (paired with apache 2.0.52) took about 6 minutes in a local to local transfer, while rsync took about 7 minutes (over rsh).