High Speed Train

Posted 26 January 2023

On a project I'm working on, some data need to be stored. A lot of similar data, small packs, and most of the project works with streams. That is, file streams (FILE *) are used to read and write files, and similar memory streams are used to construct data in memory before writing them out.

Which also means a lot of calls to printf() - or fprintf() to be exact - are being made. I decided to change that, for various reasons, but let's have a quick loot at the impact on speed.

Let's see how I will store data now, being both a bit more compact as well as faster, both to read and write.

In-memory file stream

You probably already know this, but it is true that it's a nice thing, convenient way to quickly get some formatted strings done :

One can use a file stream - FILE * - to write data in memory, then get back a pointer to said data.

cFILE *f;
char *str;
size_t len;

f = open_memstream(&str, &len);
if (!f) return 0;

/* Use fwrite() or fprintf() or the likes... */

fclose(f);

That's it, and now your data is in str, of length len. Cool, so that's how things work(ed) so far.

How things (used to) work

As I said, most of things were done through fprintf so a typical set of data could look like this :

cfprintf(f, "uid %d%c", uid, 0);
fprintf(f, "gid %d%c", gid, 0);
fprintf(f, "mode %#o%c", mode, 0);
fprintf(f, "name %s%c", name, 0);
fprintf(f, "time %lld.%.09ld%c", (long long) ts.tv_sec, ts.tv_nsec, 0);
/* and for the occasional blob of data, fwrite would be used */
fwrite("blob", 1, 5, f);
fwrite(blob, 1, blen, f); /* this is a fixed/known-length blob */
/* a final NUL to indicate the end */
fwrite("", 1, 1, f);

Let's Change

What I've gone for was to store data in a different manner, and I basically went with the idea that there would only be two kinds of data :

integers, and
blobs.

That's it. It's not always true, and you might like to have e.g. a special case for (NUL-terminated) strings, but I decided against it.

The idea was, integers wouldn't be stored in such a "literal" fashion, it does make things easier to read for a human looking at things, but that's not a case that ever happens in this case, so there's really no point.

Also, to try and have a single case yet supporting all kinds of values, but without the need for a fixed number of bytes, e.g. 4 or 8, I decided to do the following :

In all 8-bit bytes used to store the integer, the last (most significant) one would be reserved. It will be a special flag indicating whether or not we need to read another byte to keep constructing the number.
That way, a tiny number (< 128) can be written as a single byte. Of course the larger the number, the more additional bytes will be needed. You can go up to 16 383 with only two bytes, and a third one lead you up to 2 097 151, and so on.
That means when dealing with mostly small-ish number, that can be stored in only a few bytes, better than the printf decimal representation, or just use say 4 bytes every time (and be limited to 32-bit values).
As an obvious side-effect, it also means it might take 9 bytes to store a 64-bit number. Or a 57-bit number even. But really, (a) it's not that bad, and (b) when do you need to store such large numbers ? I don't know, but in my case it's probably close to never, so I don't care. (Besides, it works fine even then.)

With this "encoding" of integers in mind, storing data would always be the same :

2 bytes as identifier of the data. Again, I'll special-case the most significant bit as a flag for the type : if set, it's a blob, else just an integer.
Then an integer, of variable length as described above. Either it is the value itself, or - in case of a blob - the length of the data that follows.
In case of a blob, the data. Of known length.
Repeat as needed.

How things (will now) work

Now that you know the principle, here's what things will look like :

cu64 val;
val = uid;
encdata(ID_UID, &val, 0, dst);
val = gid;
encdata(ID_GID, &val, 0, dst);
val = mode;
encdata(ID_MODE, &val, 0, dst);
encdata(ID_NAME, &name, strlen(name), dst);
/* timestamps are stored as blob: just the two integers, one after the other */
unsigned char tmp[18], *t = tmp;
int l = encint(ts.tv_sec, tmp);
l += encint(ts.tv_nsec, tmp + l);
encdata(ID_TS, &t, l, dst);
/* and our blob. Since it's a fixed-length blob, we could store it directly
 * without our encdata() function adding its size. We would do that in real
 * life, but for the sake of testing let's do it this way. */
encdata(ID_BLOB, &blob, blen, dst);

Comparisons

And the results are...

(Note that I'm using terms of reading and writing even though it's all in-memory stuff, no files involved.)

Writing

I ran the previous examples a bunch of times (about 50 million) to see the difference between the two.

The original method took 24s to write things out. It being 95 bytes long. In other words, it generated 1 932 KiB/s.
The new method only took 12s, so twice as fast, and only used up 59 bytes. Which brings us to 2 400 KiB/s.

Reading

Similarly the reading of data used to be done relying on a bunch of strtoul() calls :

cfor (;;) {
    if (!memchr(sce, 0, len)) {
        break;
    } else if (!strncmp(sce, "uid ", 4)) {
        uid = strtoul(sce + 4, 0, 0);
    } else if (!strncmp(sce, "gid ", 4)) {
        gid = strtoul(sce + 4, 0, 0);
    } else if (!strncmp(sce, "mode ", 5)) {
        mode = strtoul(sce + 5, 0, 0);
    } else if (!strncmp(sce, "time ", 5)) {
        char *st = sce + 5, *end;
        ts.tv_sec = strtoll(st, &end, 10);
        ts.tv_nsec = strtol(end + 1, 0, 10);
    } else if (!strncmp(sce, "name ", 5)) {
        sce += 5;
        name = sce;
    } else if (!strcmp(sce, "blob")) {
        sce += 5;
        break;
    }
    sce += strlen(sce) + 1;
}
blob = sce;

Obviously things are looking much different now :

cfor (size_t off = 0; off < len; ) {
    u16 id = (sa.s[o] << 8) | sa.s[o + 1];
    off += 2;
    off += decint((u8 *) str + off, &val);
    switch (id) {
        case ID_UID:
            uid = val;
            break;
        case ID_GID:
            gid = val;
            break;
        case ID_MODE:
            mode = val;
            break;
        case ID_TS:
            off += decint((u8 *) str + off, &val);
            ts.tv_sec = val;
            off += decint((u8 *) str + off, &val);
            ts.tv_nsec = val;
            break;
        case ID_NAME:
            name = str + off;
            namelen = val;
            off += val;
            break;
        case ID_BLOB:
            blob = str + off;
            off += val;
            break;
    }
}

And for reading :

The original method processed 1 784 KiB/s and took no less than 26s ! Longer than writing...
The new method managed to handle 9 602 KiB/s thus only needing no more than 3s.

In the end

So it's all going in the right direction : it should get a bit faster with this new method, and as an added bonus it'll require a little bit less space.

This is not to say anything bad about the FILE * memory stream facility and printf family of functions from stdio, it can be nice to quickly do something, but you probably want to switch to something else for final/production use. Especially if you have such happenings in critical parts of your application.