Welcome to little lamb

Code » test-rolling-hash » next » tree

[next] / meta / desc

Test/benchmark some chunking algorithms for better data-deduplication

Test/benchmark some chunking algorithms, when needing to split a data stream
into variable-length chunks aimed for better data-deduplication.

I originally worked out an implementation based on rabin fingerprint, but I
figured I might look for alternative implementations/algorithms, and so we got:

- rabin : another pretty identical implementation based on rabin
- rabin-mask : a variant using two masks to find the next split, so when the
  main one fails to find a split, we fall back on the other one
- rabin-tttd : another variant using two masks to find the next split
- rabin-mix : this is my original impementation, only using the previous two
  variants. Specifically, mask below 128 KiB as target/average size, and tttd
  otherwise (because tests seemed to indicate that was the best)
- hashchop : another rolling-hash algorithm, based on rsync's hash
- buz : buzhash implementation originally from BorgBackup
- ae : implementation of Asymmetric Extremum chunking algorithm
- ae-32 : AE variant using 32bit values instead of 64bit ones