Given that this is relatively complex, and could be made obsolete if --rsync or something similar were more widespread. But technologies do not exist in an ideal world; if the existing content is not adapted for rsync, then it must be allowed for. Some downloads may be more efficient using --rsync and not looking inside the compressed data, while others might be more efficient when looking inside the file. I think it is enough of an open question to warrant implementing something, and seeing whether it proves useful. The basic zsync functionality is not tied to this feature, and it could be easily dropped.
To test the usefulness of this feature, I have benchmarked zsync with some data files which are normally transferred compressed. There are more combinations to consider in the case of compressed files. I have broken them down by how the file to be transferred is compressed (none, gzip, or gzip --rsync) and whether zsync's look-inside-gzip functionality was used. I have also included numbers for zsync-0.1.x, and for rsync-2.6.3 (current at this time). I have also included numbers for rsync with the -z option, which enables compression of deltas with the deflate algorithm on the server.
I took two Debian Packages files, downloaded a day apart, as the source and target files. The target file was 12.1MB, or 3.1MB gzipped. I have included the transferred data as (file data + control data), where control data is just the size of the .zsync file (which clearly cannot be neglected as it must be downloaded, so it is an overhead of the algorithm). For rsync, I have shown numbers both for total data transferred and for just the downstream data (as noted earlier, upstream data is comparativaly cheap, so rsync has an advantage because most of the metadata goes upstream).
Debian Package files contain textual data. This is about half and half between plain English package descriptions, and key:value pairs of text data containing package names, versions, and such. The changes week to week are widespread and very scattered. The diff of the two files was about 58kB.
Several methods were used. Firstly, for comparison, working on the full 12.1MB:
Blocksize (bytes) | 256 | 512 | 768 | 1024 | 1536 | 2048 | 4096 | 8192 |
zsync-0.1.0 (pre-release) | 564709 | 353690 | 279580 | 306050 | 468092 | |||
rsync-2.6.3 (down only) | 247512 | 175004 | 156772 | 161452 | 162108 | 190048 | 258128 | 403776 |
rsync-2.6.3 (total) | 579758 | 317418 | 251732 | 232682 | 209608 | 225686 | 275970 | 412720 |
rsync-2.6.3 (total, with compression) | 349311 | 165121 | 120567 | 102686 | 83033 | 81638 | 85591 | 117775 |
zsync-0.2.1 | 405520 | 257388 | n/a | 204028 | n/a | 204266 | 280286 | 487790 |
Next, on the file compressed with gzip --best. For a fairer comparison with rsync, and to show the difference that the look-inside method makes, zsync without the look-inside method is shown too. As expected, without look-inside or with rsync, almost the entire 3.1MB compressed file is transferred.
Blocksize (bytes) | 256 | 512 | 768 | 1024 | 1536 | 2048 | 4096 | 8192 |
zsync-0.1.2 with look-inside | 613532 | 339527 | 217883 | 190338 | 230413 | |||
zsync 0.1.0 (pr) without look-inside | 3134061 | 3074067 | 3044591 | 3033427 | 3033999 | |||
rsync-2.6.3 (down) | 3013037 | 3012749 | 3012877 | 3013241 | 3014117 | 3014045 | 3018029 | 3026169 |
rsync-2.6.3 (total) | 3086271 | 3048759 | 3037319 | 3031581 | 3026361 | 3023235 | 3022647 | 3028501 |
zsync-0.2.2 without look-inside | 3096931 | 3054736 | 3031354 | 3023228 | 3022746 | 3028652 | ||
zsync-0.2.2 with look-inside | 559703 | 304058 | 185237 | 140735 | 149467 | 209643 |
Finally, the file is compressed before-and-after with gzip --best --rsync.
Blocksize (bytes) | 256 | 512 | 768 | 1024 | 1536 | 2048 | 4096 | 8192 |
zsync 0.1.2 with look-inside | 625590 | 351942 | 228179 | 263135 | 354503 | |||
zsync 0.1.0 (pr) without look-inside | 496567 | 449475 | 444663 | 492377 | 607225 | |||
rsync-2.6.3 (down only) | 400794 | 390142 | 394190 | 392290 | 407498 | 417422 | 471982 | 581186 |
rsync-2.6.3 (total) | 476020 | 427778 | 419292 | 411128 | 420072 | 426864 | 476726 | 583578 |
zsync-0.2.1 without look-inside | 449153 | 415905 | 406514 | 422712 | 485402 | 617931 | ||
zsync-0.2.1 with look-inside | 571679 | 316116 | 197467 | 151031 | 222331 (error) | 343906 |
gzip --rsync does fairly well, with both rsync and zsync transferring about 410kB at the optimum point. zsync with the look-inside method does much better than either of these, with as little as 140K transferred. At this optimum point, zsync transferred 75kB of (compressed) file data - close to the diff in size - and 65kB of the .zsync.
For this example, where the difference between the two files is small, working on the uncompressed data does quite well. With the uncompressed files, rsync transfers about 210kB, and zsync around 200kB. zsync with look-inside on the compressed data is ahead of this - having the data to download compressed saves a lot, even if we are having to transmit a map of the compressed file with it.
The clear winner is rsync with compression (rsync -z), transferring only 80kB. Here rsync combines the advantages of all the methods — by working on the uncompressed data and then compressing the deltas, rsync gets the equivalent of zsync's look-inside method, but without having to transmit a full map of the compressed data. But this is at a high cost, since in addition to the usual overhead of reading the entire source data file and doing the checksum calculations for each client, the rsync server has to compress the deltas per client. zsync's look-inside, on the other hand, causes hardly more server load than a normal HTTP download.