High Speed Train
PostedOn a project I'm working on, some data need to be stored. A lot of similar data,
small packs, and most of the project works with streams. That is, file streams
(FILE *
) are used to read and write files, and similar memory streams are used
to construct data in memory before writing them out.
Which also means a lot of calls to printf
() - or fprintf
() to be exact - are
being made. I decided to change that, for various reasons, but let's have a
quick loot at the impact on speed.
Let's see how I will store data now, being both a bit more compact as well as faster, both to read and write.
In-memory file stream
You probably already know this, but it is true that it's a nice thing, convenient way to quickly get some formatted strings done :
One can use a file stream - FILE *
- to write data in memory, then get back a
pointer to said data.
1
2
3
4
5
6
7
8
9
10
cFILE *f;
char *str;
size_t len;
f = open_memstream(&str, &len);
if (!f) return 0;
/* Use fwrite() or fprintf() or the likes... */
fclose(f);
That's it, and now your data is in str
, of length len
. Cool, so that's how
things work(ed) so far.
How things (used to) work
As I said, most of things were done through fprintf
so a typical set of data
could look like this :
1
2
3
4
5
6
7
8
9
10
cfprintf(f, "uid %d%c", uid, 0);
fprintf(f, "gid %d%c", gid, 0);
fprintf(f, "mode %#o%c", mode, 0);
fprintf(f, "name %s%c", name, 0);
fprintf(f, "time %lld.%.09ld%c", (long long) ts.tv_sec, ts.tv_nsec, 0);
/* and for the occasional blob of data, fwrite would be used */
fwrite("blob", 1, 5, f);
fwrite(blob, 1, blen, f); /* this is a fixed/known-length blob */
/* a final NUL to indicate the end */
fwrite("", 1, 1, f);
Let's Change
What I've gone for was to store data in a different manner, and I basically went with the idea that there would only be two kinds of data :
- integers, and
- blobs.
That's it. It's not always true, and you might like to have e.g. a special case for (NUL-terminated) strings, but I decided against it.
The idea was, integers wouldn't be stored in such a "literal" fashion, it does make things easier to read for a human looking at things, but that's not a case that ever happens in this case, so there's really no point.
Also, to try and have a single case yet supporting all kinds of values, but without the need for a fixed number of bytes, e.g. 4 or 8, I decided to do the following :
- In all 8-bit bytes used to store the integer, the last (most significant) one would be reserved. It will be a special flag indicating whether or not we need to read another byte to keep constructing the number.
- That way, a tiny number (< 128) can be written as a single byte. Of course the larger the number, the more additional bytes will be needed. You can go up to 16 383 with only two bytes, and a third one lead you up to 2 097 151, and so on.
- That means when dealing with mostly small-ish number, that can be stored in
only a few bytes, better than the
printf
decimal representation, or just use say 4 bytes every time (and be limited to 32-bit values). - As an obvious side-effect, it also means it might take 9 bytes to store a 64-bit number. Or a 57-bit number even. But really, (a) it's not that bad, and (b) when do you need to store such large numbers ? I don't know, but in my case it's probably close to never, so I don't care. (Besides, it works fine even then.)
With this "encoding" of integers in mind, storing data would always be the same :
- 2 bytes as identifier of the data. Again, I'll special-case the most significant bit as a flag for the type : if set, it's a blob, else just an integer.
- Then an integer, of variable length as described above. Either it is the value itself, or - in case of a blob - the length of the data that follows.
- In case of a blob, the data. Of known length.
- Repeat as needed.
How things (will now) work
Now that you know the principle, here's what things will look like :
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
cu64 val;
val = uid;
encdata(ID_UID, &val, 0, dst);
val = gid;
encdata(ID_GID, &val, 0, dst);
val = mode;
encdata(ID_MODE, &val, 0, dst);
encdata(ID_NAME, &name, strlen(name), dst);
/* timestamps are stored as blob: just the two integers, one after the other */
unsigned char tmp[18], *t = tmp;
int l = encint(ts.tv_sec, tmp);
l += encint(ts.tv_nsec, tmp + l);
encdata(ID_TS, &t, l, dst);
/* and our blob. Since it's a fixed-length blob, we could store it directly
* without our encdata() function adding its size. We would do that in real
* life, but for the sake of testing let's do it this way. */
encdata(ID_BLOB, &blob, blen, dst);
Comparisons
And the results are...
(Note that I'm using terms of reading and writing even though it's all in-memory stuff, no files involved.)
Writing
I ran the previous examples a bunch of times (about 50 million) to see the difference between the two.
- The original method took 24s to write things out. It being 95 bytes long. In other words, it generated 1 932 KiB/s.
- The new method only took 12s, so twice as fast, and only used up 59 bytes. Which brings us to 2 400 KiB/s.
Reading
Similarly the reading of data used to be done relying on a bunch of strtoul
()
calls :
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
cfor (;;) {
if (!memchr(sce, 0, len)) {
break;
} else if (!strncmp(sce, "uid ", 4)) {
uid = strtoul(sce + 4, 0, 0);
} else if (!strncmp(sce, "gid ", 4)) {
gid = strtoul(sce + 4, 0, 0);
} else if (!strncmp(sce, "mode ", 5)) {
mode = strtoul(sce + 5, 0, 0);
} else if (!strncmp(sce, "time ", 5)) {
char *st = sce + 5, *end;
ts.tv_sec = strtoll(st, &end, 10);
ts.tv_nsec = strtol(end + 1, 0, 10);
} else if (!strncmp(sce, "name ", 5)) {
sce += 5;
name = sce;
} else if (!strcmp(sce, "blob")) {
sce += 5;
break;
}
sce += strlen(sce) + 1;
}
blob = sce;
Obviously things are looking much different now :
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
cfor (size_t off = 0; off < len; ) {
u16 id = (sa.s[o] << 8) | sa.s[o + 1];
off += 2;
off += decint((u8 *) str + off, &val);
switch (id) {
case ID_UID:
uid = val;
break;
case ID_GID:
gid = val;
break;
case ID_MODE:
mode = val;
break;
case ID_TS:
off += decint((u8 *) str + off, &val);
ts.tv_sec = val;
off += decint((u8 *) str + off, &val);
ts.tv_nsec = val;
break;
case ID_NAME:
name = str + off;
namelen = val;
off += val;
break;
case ID_BLOB:
blob = str + off;
off += val;
break;
}
}
And for reading :
- The original method processed 1 784 KiB/s and took no less than 26s ! Longer than writing...
- The new method managed to handle 9 602 KiB/s thus only needing no more than 3s.
In the end
So it's all going in the right direction : it should get a bit faster with this new method, and as an added bonus it'll require a little bit less space.
This is not to say anything bad about the FILE *
memory stream facility and
printf
family of functions from stdio
, it can be nice to quickly do
something, but you probably want to switch to something else for
final/production use. Especially if you have such happenings in critical parts
of your application.