Welcome to little lamb

Booting a 4096-byte sectors disk with Syslinux (in QEMU?)

Posted

syslinux has always been my bootloader of choice. It's small, fast, does the job well, can be used for booting your hard drive, or a USB stick, a CD or when making an ISO file of those things. It's nice.

I'm happy not having to deal with the complicated mess that GRUB is, or at least seems to be from my ignorant POV.

Having said all that, lately I've had to look at syslinux a bit again and learn some more about it's boot process. And yes, it evolved 4Kn sectors.

How does one boot?

I'm assuming you're already familiar with a lot of this, MBR, multi-stage bootloaders and whatnot. But let's quickly recap how all those things actually work in the world of syslinux.

Master Boot Record

First off, we have the initial bootloader, or stage 1 bootloader. That's the tiny (440 bytes) piece of software you install at the very beginning of your disk, right into your MBR.

Typically with syslinux, it comes as a simple .bin file that you can dd into your drive, like so:

1
shdd if=/lib/syslinux/bios/mbr.bin bs=440 count=1 conv=notrunc of=/dev/sda

Now with only 440 bytes it can't do that much, and indeed it's only task is to load the next stage (aka stage 2 bootloader, also sometimes referred to as the VBR bootloader) into memory and pass it the torch of execution.

To do so, the standard bootloader (mbr.bin) reads the partition table looking for the one with the boot flag on, loads its first sector and life goes on.

Variants do exist, such as altmbr.bin where the partition number is added as last/440th byte instead of relying of the boot flag, but the idea is the same.

ldlinux.sys

What we've called the second stage bootloader is also know as ldlinux.sys since, as it turns out, that's what the file really is. Or part of it at least.

See, the entire bootloader is actually split into two pieces : the first 512 bytes are one thing, and then the rest of it is another.

When you run the installer, e.g. extlinux -i /boot/syslinux, it will write a specially-crafted boot sector of 512 bytes into the first sector of the partition, sometimes called the Volume Boot Sector - hence why it gets called the Volume Boot Record, or VBR.

The rest of the bootloader code is written as a simple file named ldlinux.sys in the specified directory. It will then query the file system in order to get the sectors onto which the file has been written.

Or, more specifically, the first sector where the beginning of it is. That sector number will be written back into the first sector (VBR). Because again, that code is quite small and as such can't do a whole lot in its own, so its only task will be to read the aforementioned sector.

From then on, loading the entirety of ldlinux.sys can be done using a sector map that has also been put in place by the installer. Once loaded, you finally have the actual syslinux bootloader (i.e. ldlinux.sys) in memory, ready to read its configuration file and do as is expected of it, probably with the help of some c32 modules.

Say my name

When your computer boots up, you'll get a syslinux banner as such :

SYSLINUX 6.03 EDD 6.04-something Copyright (C) 1994-2015 H. Peter Anvin et al

This banner isn't put up in one go, and actually gives an indication of where in the process things are - which is quite late, it turns out.

That is, if your MBR bootloader fails, all you might get are error messages. Some kind of "Missing operation system" if there's no active/bootable partition set, or if the VBR of said partition doesn't have its proper signature (that is, ends its 512 bytes with 0x55 0xaa) or "Operating system load error" if there was an error trying to read a sector from disk.

If all goes well, you now find yourself in the VBR bootloader.

Blank screen

Yeah, if somehow the position of the first sector is invalid, you'll probably not get anything happening, on screen or otherwise.

The only task of that stage 2/VBR bootloader is to load the actual bootloader that is syslinux, also known as ldlinux.sys. If for some reason it fails, you might get a "Boot error" message.

Assuming the first sector of ldlinux.sys was successfully loaded, the first thing it does is start writing the banner: it puts up the "SYSLINUX" string and version number (e.g. "6.04"), usually followed by EDD - which means the Enhanced Disk Drive services are supported by your BIOS, or by CHS (for good'ol Cylinder-Head-Sector) if not.

Quick side note

EDD means you can deal with disks larger than 8 GiB. Because there are all kinds of limitations when it comes to this boot process, MBR and such. In CHS mode, reading past the first 8 GiB of the disk won't be possible due to various limitations, which I'll let you read about on Wikipedia should you want to.

Only after will it actually read the sector map and load the rest of the ldlinux.sys into memory. There's also a checksum to ensure everything seems okay (else you'll get some "Load error" message).

Assuming all is well though, we have now moved on to the real meat of ldlinux.sys which can begin its work by ending our banner : adding a longer version name and copyright line.

So by the time you see that full banner on screen, it does mean indeed that syslinux is running, i.e. that both its tiny loaders (MBR then VBR) have successfully done their parts.

What about sector size? I was told there'd be some sector sizes in here...

Right, now that we've refreshed things on how our boot process works, let's talk about size limitations. Limitations that aren't necessary due to syslinux mind you.

As we've seen earlier, if the BIOS doesn't support EDD then syslinux must deal with CHS geometry crap, and it does. But, you won't be able to deal with disk larger than about 8 GiB.

Wait, isn't the limit supposed to be around 2 TiB ?

Indeed, there are limitations from all kinds of places. The CHS/EDD bit is about which sector you can ask the BIOS to read, which in CHS mode cannot be more than 16 450 560.

And that, assuming 512 byte sectors, makes for our 8 032.5 MiB limit.

But no one is hit by that one nowadays, so let's ignore it.

MBR is made of 32bit

The limit that's most commonly encountered and/or talked about comes from the MBR itself, and its partition table more specifically.

As you may know, it does support LBA, i.e. there's the number of the first sector of each partition. No (fake) geometry to deal with, just the sector to ask for. But, it is stored as a 32bit number, and that's where the limitation arises.

32 bits means the highest possible number is 4 294 967 295, which, if we're talking 512-byte sectors, gets us to around 2 TiB. Of course at this point you'll hear GPT as the answer, given it has 64bit LBAs.

But wait! Because reality is that disks have moved on to 4096-byte sectors for a while now.

512e emulation & Advanced 4Kn formats

Indeed a thing called Advanced Format has been around since 2010 or so, and disks with sectors of 4096 bytes can work in two different modes - with quite different results.

512e

In this former mode, the disk really uses 4096-byte sectors, but provides an emulation to remain compatible with everything that simply expects sectors to be 512 bytes and nothing else.

Often times you can run something like fdisk -l /dev/sda and it will tell you both its physical sector size, call it the "real" one, and its logical sector size, call it the "fake" one. The one that's emulated, so that it all keeps on working as before.

What this emulation means is that for all intents and purposes, the disk works with 512-byte sectors. With one big difference though, in that it actually doesn't.

That means it is compatible with everything, even programs assuming/hardcoding a sector size of 512 bytes, without the need for anything to be done.

The price of that though, is that one must take care of alignment. Because often your file system might use 4096 byte as block size itself, but if your partition isn't aligned with the real/physical sector alignment of the disk, then for any read or write operation a whole lot more work is gonna be required of the disk, leading to (possibly major) slowdowns.

Most tools do handle that properly though, and all is well. Save for the fact that it hasn't helped a bit with regards to our 2 TiB limitation.

4Kn

The other mode is for the disk to admit what it is, and report both its physical and logical sector sizes as 4096 bytes.

The good outcome with such a solution, is that all of a sudden our limit has gone ! It's not vanished, but - as the sectors themselves - it grew by a factor of 8. Hence, the MBR can now partition disks up to 16 TiB !

The bad one, though, is that anything that assumes a sector is 512 byte will plain and simple not work anymore, obviously.

Guess what?

Yes, syslinux is one of those. (In part.) Remember how its MBR code works ? It just reads a sector number from the MBR and asks the BIOS to read it. Now if said sector isn't 512 but 4096 bytes long, so be it.

What about CHS?

If somehow you were to put a recent 4Kn disk inside a computer with an old EDD-lacking BIOS, what would that mean ? Well, my guess is the limit stays the same : it won't allow to go past sector 16 450 560. Which, since sectors are 8 times as big, means the limit goes up from ~8 to ~64 GiB. Not that, again, there's any reason to be hit by /that/ one.

Here our story begins...

To come back to my story for a moment - as this is pretty much notes about what I've had to deal with recently, for when I'll have forgotten already but need to remember things.

So yes, I was trying to set up a system with a 4Kn disk. A disk that uses and reports its sectors as being 4096 bytes long. And I wanted to boot that baby up with my beloved syslinux. And that's a problem.

Because if its MBR code can find its VBR code just fine, things will fail at this point since the installer will have filled things like the sector map with what end up being incorrect values : they are talking 512byte sectors whilst the disk talks 4096b ones.

So I wanted to see how things work and react when confronted with such a disk, figure out how I could resolve my situation. Maybe trying to patch syslinux and make its installer talk the same language as the disk, or something else.

I needed to test & try things out, figuring things out and ensuring I wouldn't face another assuming 512-byte sectors entity.

And so, we now enter QEMU territory.

Set a 4Kn disk in QEMU

Because as seems so obvious, I fired up a VM using QEMU so I could do some testing and see how all of that works. But then came the question : can that be done ? How would one add a disk in QEMU and have it use 4096-byte sectors?

Turns out that's pretty easy (once you know how), and one could start up a VM like so :

qemu-system-x86_64 -enable-kvm -m 1G -cdrom live.iso \
  -device virtio-scsi-pci,id=scsi1,bus=pci.0 -drive file=/tmp/hdd,if=none,id=hdd \
  -device scsi-hd,drive=hdd,logical_block_size=4096,physical_block_size=4096

If you boot that thing up and ran fdisk -l /dev/sda it will indeed report a disk with both physical and logical sector sizes of 4096 bytes. Then you can partition it as you need even if it actually is around 16 TiB large without issue.

Ain't that grand ?

Boot up you say? As in...

Except. Because of course things aren't always that easy. So it turns out the above command line is perfectly fine and works as intended..

..but, you'll probably have noticed how I included a -cdrom live.iso in there? Yeah, that's not for fun, nor is it so that you can have a system to partition your disk and install syslinux onto it. Well, not just.

See, QEMU has no problems with that disk having 4096byte sectors, there's just one tiny caveat : you can't boot it.

Nope, because QEMU - or SeaBIOS, I should say - doesn't support that 4Kn mode. So the VM starts, but then the disk is skipped because it is deemed unbootable by the (Sea)BIOS.

Want to know more?

Ideally this would be the place where I would explain whatever it is I had to do to manage to get it working, and boot my VM with a 4Kn disk.

Alas, no such luck. As far as I can tell, this isn't possible(*). Maybe there is another BIOS that can be used by QEMU that wouldn't suffer such limitations, or maybe there's another approach entirely that would allow the same result, either way I don't not know of it.

* : Not in BIOS mode that is. One could move on to UEFI mess using ovmf instead of SeaBIOS, and then you might very well boot up your VM with a 4Kn disk. Only I wanted to stay in BIOS mode...

If you do, please, let me know. I would really like to be able to boot a 4Kn disk in QEMU and see what I can do with it.

Sadly, for now, we'll have to leave things here.