% limb manual
% nextsplit_ae(3)
% limb 0.1.0
% 2023-07-24
# NAME
nextsplit_ae, nextsplit_buz, nextsplit_rabin - find the next split for
variable-length chunks in a data stream
# SYNOPSIS
#include <limb/nextsplit.h>
```pre hl
size_t nextsplit_ae(size_t <em>min</em>, size_t <em>avg</em>,
const void *<em>data</em>, size_t <em>dlen</em>)
size_t nextsplit_buz(size_t <em>min</em>, size_t <em>avg</em>,
const void *<em>data</em>, size_t <em>dlen</em>)
size_t nextsplit_rabin(size_t <em>min</em>, size_t <em>avg</em>,
const void *<em>data</em>, size_t <em>dlen</em>)
```
# DESCRIPTION
These functions are used for content-based chunking, in order to find the next
breakpoint where to split the data stream into a chunk - useful for things such
as data deduplication.
Each of them will look into `data` (up to `dlen`) for position to split a chunk,
which shall be at least `min` bytes long and with average/targeted length of
`avg`. Failing to find one, they will return `dlen` for a full chunk - as such,
`dlen` should be the maximum chunk size and not the total data length.
The `nextsplit_ae`() function uses Asymmetric Extremum Content Defined Chunking
Algorithm, which is notably extremely fast.
The `nextsplit_buz`() function uses the buzhash rolling hash, which gets quite
better deduplication results whilst remaining very fast (though not as fast as
AE).
The `nextsplit_rabin`() function uses a rolling-hash algorithm based on Rabin
fingerprint, which tend to get slightly better deduplication results, albeit
being slower.
Note that it isn't uncommon for `nextsplit_buz`() to lead to better
deduplication results (than both other alternatives).
# RETURN VALUES
All those functions will return the length of the next chunk from `data` as
determined by their algorithm, or `dlen` when no such split could be determined.