Moving the work to the client relieves the server, but the client then has to deal with the problem of computing the rolling checksum over the old file. As explained in [[Rsync1998]], although it is necessary to calculate the weak checksum at every possible offset in the old file, due to the choice of checksum the checksum at offset x+1 can be calculated using the checksum at offset x in combination with the bytes at x and x+blocksize.
Despite this, when working on large files, for instance ISO files, the calculation can take some time — my Athlon XP 1600+ took roughly 3 minutes to pass over an entire ISO file in zsync-0.2.0. Various optimisations in the implementation helped get it to this level. The client skips forward to the end of a block once a match is obtained (there would not be a match overlapping with an existing match except if the target file contains unusual redundancy), allowing it to parse files faster when there is a large amount in common with the target. The half-block alignment data transfer optimisation also helps speed up strong checksum checking, because often only the first of the two blocks needs to be checksummed in order to get a rejection (whereas using a larger blocksize, we would have to strong checksum the entire block to get a rejection).
To further improve performance, it was necessary to reconsider the use of a hash table to locate target blocks with a matching checksum value. Normally, the size of a hash table should be of the order of the size of the number of records to be stored. But, normally, we are more interested in the speed of sucessful lookups than failed ones. In the rsync algorithm, the vast majority of lookups are negative (that is, they are only to prove that no such entry exists); consequently, what we need is a data structure which makes negative lookups very quick.
I opted to retain the existing hash table for locating positive matches, but adding an additional compact hash table as a pre-filter. This extra, negative hash table just contains one bit per entry (so 8 entries per byte), with the bit indicating whether there is any entry with this hash value or not. Whereas the main hash table has at most 2^16 entries (which, as each is a 32-bit pointer value, is 0.25MB in size), the negative hash table is allowed 2^19 entries (so taking just 65536 bytes, but providing a higher resolution than the main hash table). I experimented with the number of extra bits that the negative hash table should have - each extra bit roughly halved the number of hash hits on large files, as expected, and on large test files it cut over a minute from the total processing time (figures for zsync-0.2.2, before the optimisation was added, shown for comparison):
Version | Extra bits resolved by negative hash | Total time (s) to process 640MB ISO file |
zsync-0.2.2 | n/a | 191 |
zsync-0.2.3 (pre-release) | 1 | 165 |
zsync-0.2.3 (pre-release) | 2 | 142 |
zsync-0.2.3 (pre-release) | 3 | 129 |
zsync-0.2.3 (pre-release) | 4 | 133 |
zsync-0.2.3 (pre-release) | 5 | 146 |
Note that this table is the CPU time used, including downloading and final SHA-1 checksum verification, so the improvement in the core rsync stage is larger proportionally than shown by the table. Elapsed time did not improve by quite as much as CPU time, as amount of time when the process was disk-bound increased.