Bang And Blake, or how hashes compare to one another
PostedI've been needing to use the hash of small blocks of data recently, and originally used SHA3 for the job. It did the job fine, but I wondered how did the used implementation performed against others, to see if I couldn't speed things up a bit.
Because, as it is, the SHA3 computation is pretty critical in terms of performances here, as many many hashes need to be computed during the run of the application.
So, once again I set up to test a few C implementations of SHA3, and while at it threw in a couple alternative algorithms just to see/for reference.
What is being benchmarked
As you may or may not know, I happen to use skalibs in my projects, and it turns out that it does provide functions for a few hashes, namely SHA1, SHA2 and BLAKE2s.
So, I've decided to add the first two in my tests, to see how SHA3 compared. Now because I was, as mentioned, using SHA3 already, I went on to test a few different implementations :
- SHA3 from nettle, which is what I was already using before all this;
tiny-sha3
from Markku-Juhani O. Saarinen, a pretty simple implementation;sha3-unrolled
from libkeccak-tiny/TOR, a single-file implementation by David Leon Gil with some fixes/tweaks by TOR developers;- and the implementation in limb, taken from Rhash by Aleksey Kravchenko.
BLAKE
Now as I did this, I started to read about another SHA3-finalist algorithm, BLAKE, since a version of it was in skalibs. And because of it, I decided to include it in this little benchmark as well, especially since it was said to be as secure but much faster than its competitor, starting with keccak, the SHA3 winner.
I've added a few implementations for BLAKE3, though really it's all (based on) the same one :
blake3-off
is the official C implementation, with assembly (64bit only) optimizations.blake3-intr
is the intrinsics alternative, using C intrinsics instead of assembly, meaning we get to have it on 32bit as well.blake3-noopt
is just the basic flavor, with none of the previous optimizations.blake3-limb
is what I've added in limb, basically just the official C implementation, with asm optimizations in 64bit, with intrinsics in 32bit.
The tests
So for the tests here's the code I used :
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
c#define SIZE (64 << 10)
#define BLOCKS 1
#define ITER 1000
int main(void)
{
struct timespec ts1, ts2;
char msg[SIZE];
unsigned char md[64] = { 0 };
for (int i = SIZE; i; --i)
msg[i] = i % 256;
clock_gettime(CLOCK_MONOTONIC, &ts1);
for (int iter = 0; iter < ITER; ++iter) {
init();
for (int i = 0; i < BLOCKS; ++i)
update(msg, sizeof(msg));
final(md);
}
clock_gettime(CLOCK_MONOTONIC, &ts2);
dump(md, hashlen(), 1);
ts2.tv_sec -= ts1.tv_sec;
ts2.tv_nsec -= ts1.tv_nsec;
double took = ts2.tv_sec + (ts2.tv_nsec / 1000000000.0);
double speed = ((SIZE * BLOCKS * ITER) / took) / (1 << 20);
printf("took %.09f seconds, hashing %f MiB/s\n", took, speed);
return 0;
}
Actually, I used two versions of it. One, as you can see above, where we repeatedly hash a block of 64 KiB.
Before that though, I did a test where 20 blocks of 512 KiB were added (i.e. 20
calls to the update
() function), thus calculating again and again the hash of
10 MiB of data.
Which is more of interest probably depend on how your intended use case, not that it changes much overall.
Results
Hashing 10 MiB (20 blocks of 512 KiB) of data
i686 - 32bit
x86_64 - 64bit
Hashing 64 KiB (1 block) of data
i686 - 32bit
x86_64 - 64bit
Raw values
Hashing 10 MiB (20 blocks of 512 KiB) of data
i686 - 32bit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
output- test blake2s-skalibs
b4f91f437edcb538fd9349899810e6e8bbac6ea7704b349880d5c5a6d9dbdecc
took 10.289732135 seconds, hashing 97.184260 MiB/s
- test blake3-limb
3b48bbbc5e9882dc19cc3dfefb6eb856fa681c3e20a4cde8de1dcfb47f01e876
took 3.764925357 seconds, hashing 265.609516 MiB/s
- test blake3-off-intr
3b48bbbc5e9882dc19cc3dfefb6eb856fa681c3e20a4cde8de1dcfb47f01e876
took 3.853664321 seconds, hashing 259.493281 MiB/s
- test blake3-off-noopt
3b48bbbc5e9882dc19cc3dfefb6eb856fa681c3e20a4cde8de1dcfb47f01e876
took 4.176573687 seconds, hashing 239.430709 MiB/s
- test sha1-skalibs
d6bca95f69190776042815017281649a9c9fd2a9
took 21.933883001 seconds, hashing 45.591563 MiB/s
- test sha256-skalibs
aecf3c2ab8aca74852bca07b54136cecb3fdafdc35540068ed952c0b89538e0d
took 52.364371749 seconds, hashing 19.096954 MiB/s
- test sha3-limb
e684a6935a0716813795a7324ddb6be2fe4e3d6ef4158ef7e4536bef5dfad955
took 52.944948385 seconds, hashing 18.887543 MiB/s
- test sha3-nettle
e684a6935a0716813795a7324ddb6be2fe4e3d6ef4158ef7e4536bef5dfad955
took 50.061874754 seconds, hashing 19.975281 MiB/s
- test sha3-tiny
e684a6935a0716813795a7324ddb6be2fe4e3d6ef4158ef7e4536bef5dfad955
took 116.081442927 seconds, hashing 8.614641 MiB/s
- test sha3-unrolled
e684a6935a0716813795a7324ddb6be2fe4e3d6ef4158ef7e4536bef5dfad955
took 107.837249129 seconds, hashing 9.273234 MiB/s
x86_64 - 64bit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
output- test blake2s-skalibs
b4f91f437edcb538fd9349899810e6e8bbac6ea7704b349880d5c5a6d9dbdecc
took 3.533086916 seconds, hashing 283.038607 MiB/s
- test blake3-limb
3b48bbbc5e9882dc19cc3dfefb6eb856fa681c3e20a4cde8de1dcfb47f01e876
took 0.324569374 seconds, hashing 3081.005419 MiB/s
- test blake3-off
3b48bbbc5e9882dc19cc3dfefb6eb856fa681c3e20a4cde8de1dcfb47f01e876
took 0.324608450 seconds, hashing 3080.634531 MiB/s
- test blake3-off-intr
3b48bbbc5e9882dc19cc3dfefb6eb856fa681c3e20a4cde8de1dcfb47f01e876
took 0.413121165 seconds, hashing 2420.597357 MiB/s
- test blake3-off-noopt
3b48bbbc5e9882dc19cc3dfefb6eb856fa681c3e20a4cde8de1dcfb47f01e876
took 1.751588899 seconds, hashing 570.910218 MiB/s
- test sha1-skalibs
d6bca95f69190776042815017281649a9c9fd2a9
took 7.616981365 seconds, hashing 131.285604 MiB/s
- test sha256-skalibs
aecf3c2ab8aca74852bca07b54136cecb3fdafdc35540068ed952c0b89538e0d
took 8.720734486 seconds, hashing 114.669240 MiB/s
- test sha3-limb
e684a6935a0716813795a7324ddb6be2fe4e3d6ef4158ef7e4536bef5dfad955
took 4.458048052 seconds, hashing 224.313419 MiB/s
- test sha3-nettle
e684a6935a0716813795a7324ddb6be2fe4e3d6ef4158ef7e4536bef5dfad955
took 4.638726294 seconds, hashing 215.576418 MiB/s
- test sha3-tiny
e684a6935a0716813795a7324ddb6be2fe4e3d6ef4158ef7e4536bef5dfad955
took 9.990459098 seconds, hashing 100.095500 MiB/s
- test sha3-unrolled
e684a6935a0716813795a7324ddb6be2fe4e3d6ef4158ef7e4536bef5dfad955
took 4.491659002 seconds, hashing 222.634888 MiB/s
Hashing 64 KiB (1 block) of data
i686 - 32bit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
output- test blake2s-skalibs
41a55b44349a9e5889285460a62d3d620b7fd2fc00a21ba1c26ebab009663dd8
took 0.489396944 seconds, hashing 127.708194 MiB/s
- test blake3-limb
f16c9f867cc5384c7329aa4481eba6518c50ab21bf8a615e26bad55e57569b2c
took 0.225065603 seconds, hashing 277.696810 MiB/s
- test blake3-off-intr
f16c9f867cc5384c7329aa4481eba6518c50ab21bf8a615e26bad55e57569b2c
took 0.242135457 seconds, hashing 258.119983 MiB/s
- test blake3-off-noopt
f16c9f867cc5384c7329aa4481eba6518c50ab21bf8a615e26bad55e57569b2c
took 0.494597040 seconds, hashing 126.365495 MiB/s
- test sha1-skalibs
f04977267a391b2c8f7ad8e070f149bc19b0fc25
took 1.864910869 seconds, hashing 33.513666 MiB/s
- test sha256-skalibs
7daca2095d0438260fa849183dfc67faa459fdf4936e1bc91eec6b281b27e4c2
took 2.485993027 seconds, hashing 25.140859 MiB/s
- test sha3-limb
bbf9ee3fe69e96f0399d3aeafbb073de5b3ad525c8b3a85b29a36ae5f9829e91
took 3.459725540 seconds, hashing 18.065017 MiB/s
- test sha3-nettle
bbf9ee3fe69e96f0399d3aeafbb073de5b3ad525c8b3a85b29a36ae5f9829e91
took 2.510358674 seconds, hashing 24.896840 MiB/s
- test sha3-tiny
bbf9ee3fe69e96f0399d3aeafbb073de5b3ad525c8b3a85b29a36ae5f9829e91
took 7.692908606 seconds, hashing 8.124365 MiB/s
- test sha3-unrolled
bbf9ee3fe69e96f0399d3aeafbb073de5b3ad525c8b3a85b29a36ae5f9829e91
took 7.061364767 seconds, hashing 8.850980 MiB/s
x86_64 - 64bit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
output- test blake2s-skalibs
41a55b44349a9e5889285460a62d3d620b7fd2fc00a21ba1c26ebab009663dd8
took 0.218443039 seconds, hashing 286.115778 MiB/s
- test blake3-limb
f16c9f867cc5384c7329aa4481eba6518c50ab21bf8a615e26bad55e57569b2c
took 0.020473770 seconds, hashing 3052.686437 MiB/s
- test blake3-off
f16c9f867cc5384c7329aa4481eba6518c50ab21bf8a615e26bad55e57569b2c
took 0.020731243 seconds, hashing 3014.773403 MiB/s
- test blake3-off-intr
f16c9f867cc5384c7329aa4481eba6518c50ab21bf8a615e26bad55e57569b2c
took 0.026455136 seconds, hashing 2362.490217 MiB/s
- test blake3-off-noopt
f16c9f867cc5384c7329aa4481eba6518c50ab21bf8a615e26bad55e57569b2c
took 0.110260248 seconds, hashing 566.840735 MiB/s
- test sha1-skalibs
f04977267a391b2c8f7ad8e070f149bc19b0fc25
took 0.481718669 seconds, hashing 129.743778 MiB/s
- test sha256-skalibs
7daca2095d0438260fa849183dfc67faa459fdf4936e1bc91eec6b281b27e4c2
took 0.544411686 seconds, hashing 114.802826 MiB/s
- test sha3-limb
bbf9ee3fe69e96f0399d3aeafbb073de5b3ad525c8b3a85b29a36ae5f9829e91
took 0.278175775 seconds, hashing 224.678083 MiB/s
- test sha3-nettle
bbf9ee3fe69e96f0399d3aeafbb073de5b3ad525c8b3a85b29a36ae5f9829e91
took 0.289379204 seconds, hashing 215.979584 MiB/s
- test sha3-tiny
bbf9ee3fe69e96f0399d3aeafbb073de5b3ad525c8b3a85b29a36ae5f9829e91
took 0.638976665 seconds, hashing 97.812649 MiB/s
- test sha3-unrolled
bbf9ee3fe69e96f0399d3aeafbb073de5b3ad525c8b3a85b29a36ae5f9829e91
took 0.280643295 seconds, hashing 222.702630 MiB/s
let it go...
As expected BLAKE3 is much faster than the others, especially when it comes with (asm) optimizations.
One last word, if you're curious about this and would like to ran those tests yourself/on your own hardware : it's all available on a nice little git repository. :)
The code can be browsed right here online, or you can just close
the repo : git clone git://lila.oss/test-hashes.git
And as a convenience, here's a tarball if you don't wanna use git.