Is It Worthwhile?

Given that this is relatively complex, and could be made obsolete if --rsync or something similar were more widespread. But technologies do not exist in an ideal world; if the existing content is not adapted for rsync, then it must be allowed for. Some downloads may be more efficient using --rsync and not looking inside the compressed data, while others might be more efficient when looking inside the file. I think it is enough of an open question to warrant implementing something, and seeing whether it proves useful. The basic zsync functionality is not tied to this feature, and it could be easily dropped.

To test the usefulness of this feature, I have benchmarked zsync with some data files which are normally transferred compressed. There are more combinations to consider in the case of compressed files. I have broken them down by how the file to be transferred is compressed (none, gzip, or gzip --rsync) and whether zsync's look-inside-gzip functionality was used. I have also included numbers for zsync-0.1.x, and for rsync-2.6.3 (current at this time). I have also included numbers for rsync with the -z option, which enables compression of deltas with the deflate algorithm on the server.

I took two Debian Packages files, downloaded a day apart, as the source and target files. The target file was 12.1MB, or 3.1MB gzipped. I have included the transferred data as (file data + control data), where control data is just the size of the .zsync file (which clearly cannot be neglected as it must be downloaded, so it is an overhead of the algorithm). For rsync, I have shown numbers both for total data transferred and for just the downstream data (as noted earlier, upstream data is comparativaly cheap, so rsync has an advantage because most of the metadata goes upstream).

Debian Package files contain textual data. This is about half and half between plain English package descriptions, and key:value pairs of text data containing package names, versions, and such. The changes week to week are widespread and very scattered. The diff of the two files was about 58kB.

Several methods were used. Firstly, for comparison, working on the full 12.1MB:

Blocksize (bytes)25651276810241536204840968192
zsync-0.1.0 (pre-release) 564709 353690 279580306050468092
rsync-2.6.3 (down only)247512175004156772161452162108190048258128403776
rsync-2.6.3 (total)579758317418251732232682209608225686275970412720
rsync-2.6.3 (total, with compression)349311165121120567102686830338163885591117775
zsync-0.2.1405520257388n/a204028n/a204266280286487790

Next, on the file compressed with gzip --best. For a fairer comparison with rsync, and to show the difference that the look-inside method makes, zsync without the look-inside method is shown too. As expected, without look-inside or with rsync, almost the entire 3.1MB compressed file is transferred.

Blocksize (bytes)25651276810241536204840968192
zsync-0.1.2 with look-inside 613532 339527 217883190338230413
zsync 0.1.0 (pr) without look-inside 3134061 3074067 304459130334273033999
rsync-2.6.3 (down)30130373012749301287730132413014117301404530180293026169
rsync-2.6.3 (total)30862713048759303731930315813026361302323530226473028501
zsync-0.2.2 without look-inside30969313054736 3031354 302322830227463028652
zsync-0.2.2 with look-inside559703304058 185237 140735149467209643

Finally, the file is compressed before-and-after with gzip --best --rsync.

Blocksize (bytes)25651276810241536204840968192
zsync 0.1.2 with look-inside 625590 351942 228179263135354503
zsync 0.1.0 (pr) without look-inside 496567 449475 444663492377607225
rsync-2.6.3 (down only)400794390142394190392290407498417422471982581186
rsync-2.6.3 (total)476020427778419292411128420072426864476726583578
zsync-0.2.1 without look-inside449153415905 406514 422712485402617931
zsync-0.2.1 with look-inside571679316116 197467 151031222331 (error)343906

gzip --rsync does fairly well, with both rsync and zsync transferring about 410kB at the optimum point. zsync with the look-inside method does much better than either of these, with as little as 140K transferred. At this optimum point, zsync transferred 75kB of (compressed) file data - close to the diff in size - and 65kB of the .zsync.

For this example, where the difference between the two files is small, working on the uncompressed data does quite well. With the uncompressed files, rsync transfers about 210kB, and zsync around 200kB. zsync with look-inside on the compressed data is ahead of this - having the data to download compressed saves a lot, even if we are having to transmit a map of the compressed file with it.

The clear winner is rsync with compression (rsync -z), transferring only 80kB. Here rsync combines the advantages of all the methods — by working on the uncompressed data and then compressing the deltas, rsync gets the equivalent of zsync's look-inside method, but without having to transmit a full map of the compressed data. But this is at a high cost, since in addition to the usual overhead of reading the entire source data file and doing the checksum calculations for each client, the rsync server has to compress the deltas per client. zsync's look-inside, on the other hand, causes hardly more server load than a normal HTTP download.